Understanding Web Internals--HTTP protocol overview

Resources:

web servers host web resources.

1.static file:the simplest kind of web resource

2.dynamic content:generated by software on demand


Because there are so many kinds of resources,HTTP carefully tags each object being transported through the Web with a data format label called a MIME(Multipurpose Internet Mail Extensions) type.

Media Types:

Web servers attach a MIME type to all HTTP object data.When a web browser gets an object back from a server,it looks at the associated MIME type to see if it knows how to handle the object.

Most browsers can handle hundreds of popular object types:

displaying image files,parsing and formatting HTML files,playing audio files through the computer's speakers,or launching external plug-in software to handle special formats.


A MIME type is a texual label,represented as a primary object type and a specific subtype,separated by a slash.

1.text/html: HTML-formatted text document

2.text/plain: plain ASCII text

3.image/jpeg: image of JPEG

4.image/gif: image of GIF

5.video/quicktime: apple quicktime movie

6.application/vnd.ms-powerpoint: microsoft powerpoint presentation


URI:

Clients want to get resources from servers,so resouces should have a unique name to identify itself.Clients can point out what resouces they are interested in.

The server resource name is called a uniform resource identifier,or URI.

1.the uniform resource locator(URL) is the most common form of URI.They can tell you exactly how to fetch  a resource from a precise, fixed location.

URLs have three parts:

http://www.joe-hardware.com/specials/saw-blade.gif

http:// is called the scheme,and it describes the protocol used to access the resouce.

www.joe-hardware.com gives the server Internet address.

/sepeials/saw-blade.gif names a resource on the web server.

Today,almost every URL is URI.

2.The second flavor of URI is the uniform resource name,or URN.

A URN serves as a unique name for a particular piece of content,independent of where the resouce currently resides.These location-indepedent URNs allow resources to move from place to place.URNs also allow resouces to be accessed by multiple network access protocols while maintaining the same name.


transactions:

An HTTP transaction consists of a request command(sent from client to server),and a response result(sent from the server back to the client).This communication happens with formatted blocks of data called HTTP messages.

HTTP supports several different request commands,called HTTP methods.

Every HTTP request message has a method which tells the server what action to perform(such as fetching a web page,running a gateway program,deleting a file,etc).

Some common HTTP methods:

HTTP method                 Description

GET                              Send named resouce the server to the client

PUT                              Store data from client into a named server resource

DELETE                         Delete the named resouce from a server

POST                             Send client data intot a server gateway application

HEAD                            Send just the HTTP headers from the response for the named resource

Every HTTP response message comes back with a status code.The status code is a three-digit numeric code that tells the client if the request succeeded,or if the action are required.

Some common status codes:

HTTP status code                       Description

200                                           OK,Document returned correctly

302                                           Redirect.Go someplace else to get the resource

404                                           Not found.Can't find this resource


Web pages can consist of multiple objects

An application often issues multiple HTTP transactiosns to accomplish a task.


Messages:

HTTP messages are simple,line-oriented sequence of characters.They are plain text,not binary.They are easy for humans to read and write.

request message:sent from web clients to web servers.

response message:sent from web servers to web clients.

The format of HTTP request or response message are very similar.

HTTP messages consist of three parts:

1.Start Line:the first line in the message,indicating what to do for a request or what happened for a reponse.

2.Header fileds:Zero or more header fields follow the start line.Each header field consists of a name and a value,separated by a colon(:) for easy parsing.The headers end with a blank line.

3.Body: After the blank line is an optional message body containing any kind of data.Request bodies carry data to the web server;response bodies carry data back to the client.Unlike the start lines or headers,which are texual and structured,the body can contain arbitrary binary data.


HTTP versions

1.HTTP/0.9:outdated

2.HTTP/1.0:widely deployed

3.HTTP/1.0+:extended version of HTTP/1.0

4.HTTP/1.1:correcting architectural flaws in the design of HTTP

5.HTTP-NG(a.k.a HTTP/2.0):


Architectural Components of the Web

Basic architecture:

client and server

realistic architecture:

clients

proxies

HTTP proxy servers,important building blocks for web security,application integeration and performance optimization.

Understanding Web Internals--HTTP protocol overview_第1张图片

A proxy sits between a client and a server,receiving all of the client's HTTP requests and relaying the requests to the server(perhaps after modifying the requests).These applications  act as a proxy for the user,accessing the server on the user's behalf.

Proxies are often used for security,acting as trusted intermediaries through which all web traffic flows.Proxies can also filter requests and responses;for example,to detect application viruses in corporate downloads or to filter adult content away from elementary-school students.

caches

A web cache or caching proxy is a special type of HTTP proxy server that keeps copies of popular documents that pass through the proxy.The next client requesting the same document can be served from the cache's personal copy.

Understanding Web Internals--HTTP protocol overview_第2张图片

gateways

gateways are special servers that act as intermediaries for other servers.They are often used to convert HTTP traffic to another protocol.A gateway always receives requests as if it was the origin server for the resource.The client may not be aware it is conmmunicating with a gateway.

For exiample,an HTTP/FTP gateway receives requests for FTP URLs via HTTP requests but fetches the documents using the FTP protocol.The resulting document is packed into an HTTP message and sent to the client.

Understanding Web Internals--HTTP protocol overview_第3张图片

tunnels

Tunnels are HTTP applications that,after setup,blindly relay raw data between two connections.

HTTP tunnels are often used to transport non-HTTP data over one or more HTTP connections,without looking at the data.

One popular use of HTTP tunnels is to carry encrypted Secure Sockets Layer(SSL) traffic through an HTTP connection,allow SSL traffic through corporate firewalls that permit only web traffic.

agents

User agents are client programs that make HTTP requests on the user's behalf.Any application that issues web requests is an HTTP agent.

The most common HTTP agent is web browser.

Some automated agents are spiders or webrobots.

servers










你可能感兴趣的:(Understanding Web Internals--HTTP protocol overview)