Site Analysis Note 19

1. Static Resource HTTP Response Header

cache-control:public, max-age=30758400
cf-cache-status:HIT
cf-ray:1afc29518836124f-HKG
content-encoding:gzip
content-type:text/css
date:Wed, 28 Jan 2015 09:28:42 GMT
expires:Tue, 19 Jan 2016 09:28:42 GMT
last-modified:Sun, 25 Jan 2015 04:38:08 GMT
server:cloudflare-nginx
status:200 OK
vary:Accept-Encoding
version:HTTP/1.1

Noteworthy:

(1) it uses 'cloudflare-nginx' for static resources.

cloudflare is free CDN, its official doc says: CloudFlare does not cache HTML, we only cache static files like images, CSS or Javascript. So if your HTML content is constantly changing, CloudFlare will not affect this content.

(2) gzip applied.

(3) cache applied.

2. Dynamic Page HTTP Response Header

Cache-Control:no-cache, no-store
Content-Encoding:gzip
Content-Length:158
Content-Type:text/html; charset=utf-8
Date:Wed, 28 Jan 2015 09:54:47 GMT
Expires:-1
Last-Modified:1/28/2015 5:54:47 AM GMT
Pragma:no-cache
Vary:Accept-Encoding
X-Powered-By:ASP.NET
Noteworthy:

(1) ASP.NET architecture.
(2) No cache for dynamic content.

3. Cookie

Site Analysis Note 19_第1张图片


4. Data Structure

Neither JSON, nor YAML, what the hell is it?

5. Dynamic Domain Name

域名泛解析, wildcard DNS, catch-all subdomain, wildcard subdomain.

6. Login Procedure

STEP 1. Obtain the login Token (First Post)

In order to get this token, you have to open the home page and parse it to extract the token. So, you can't post a login request directly, open home page is inevitable.

Input, access home page.

Output, token.

STEP 2. Post the Login Request (Second Post)

Input, all login parameters, including the token mentioned above.

Output, intermediate page containing the dynamic subdomain URL.

During this step process, the site may process our request by a dedicated login server, this server probably has some policy or strategy to dispatch the new session to theapplication server farm, some betting accounts may access to fast server, some don't.

STEP 3. Redirect to the Dynamic Subdomain (Third Post)

Input, necessary parameters 

Output, cookie and login-name(I think it's something like session-id)

你可能感兴趣的:(Web,Scrapy)