2.2 – The Web and HTTP

2.2.1 – Overview of HTTP

  • HyperText Transfer Protocol (HTTP) defines the structure of the HTTP messages and how the client and server exchange the messages.
  • A web page consist of objects.
    • An object is a simple file (html file, image, jave applet, video, etc) that is addressable by a single URL
    • Most web pages consists of a base HTML file and several referenced objects
      • The HTML file references to the other objects.
  • A URL consists of 2 components:
  • Popular web servers include Apache and Microsoft Internet Information Server
  • HTTP defines how web clients request web pages from web servers and how servers transfer web pages to clients.
    • When a user request messages for the objects in the page to the server. The server receives the requests and responds with http response messages that contain the objects.
    • HTTP uses TCP as its underlying transport protocol
      • The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
  • The HTTP server receives request messages from its socket interface and sends response messages into its socket interface. Once the client sends a message into its socket interface, the message is out of the clients hand and is “in the hands” of TCP.
  • HTTP doesn’t store any state information which means it is a stateless protocol.

2.2.2 – Non-Persistent and Connections

  • Non-persistent connection: Each request/response sent over a separate TCP connection.
    • Step of transferring a web page (base file with 10 images) from a server to a client on this connection:
      • The HTTP client process initiates a TCP connection to the server www.example.com on port 80, which is the default port number for HTTP. Associated with the TCP connection, there will be a socket at the client and a socket at the server
      • The HTTP client sends an HTTP request message to the server via its socket. The request message includes the path name /someDir/home.index
      • The HTTP server process receives the request message via its socket, retrieves the object /someDir/home.index from its storage (RAM or disk), encapsulates the object in an HTTP response message, and sends the response message to the client via its socket.
      • The HTTP server process tells TCP to close the TCP connection. (But TCP doesn’t actually terminate the connection until it knows for sure that the client has received the response message intact)
      • The HTTP client receives the response message. The TCP connection terminates. The message indicates that the encapsulated object is an HTML file, and finds references to the 10 JPEG objects
      • The first four steps are ten repeated for each of the references JPEG objects.
    • Since each TCP connection transfer exactly one object it means the example above opened 11 connections.
    • Modern browsers open parallel TCP connections which means the 10 JPGs got transferred with parallel connections.
    • Round-trip time (RTT) is the time it takes for a small packet to travel from client to server and then back to the client.
      • Includes packet-propagation delays, packet-queueing delays in intermediate routers and switches, and packet-processing delays.
    • Example: clicking a hyperlink
      • The browser initiates a TCP connection, this involves a “three-way handshake”.
      • The client sends a small TCP segment to the server, the server acknowledges and responds with a small TCP segment, and, finally, the client acknowledges back to the server.
      • The first two parts of the handshake takes one RTT. After completing the first two parts of the handske, the client sends the HTTP requests message combined with the third part of the three-way handshake into the TCP connection.
      • Once the request message arrives at the server, the server sends the HTML file into the TCP connection. This HTTP request/response eats up another RTT. Thus, roughly, the total response time is two RTTs plus the transmission time at the server of the HTML file.
  • Persistent connections: All the requests and their corresponding responses gets sent over the same TCP connection:
    • In a non-persisent connection you need a brand-new connection who has to be maintained for each requested object. Which means TCP buffers must be allocated and TCP variables must be kept in both the client and server. This is heavy for the server and the delivery rate suffer a delay of 2 RTTs. Persistent connections doesn’t have these problems.
    • With a HTTP 1.1 persistent connections, the servers leaves the TCP connection open after sending a response. Subsequent requests and responses between the same client and server can be sent over the same connection.
      • These requests for objects can be made back-to-back without waiting for replies to pending requests (pipelining).
      • The HTTP server closes a connection when it hasn’t been used in a certain time.
    • Most recently HTTP/2 builds on HTTP 1.1 by allowing multiple requests and replies to be interleaved in the same connection.

2.2.3 – HTTP Message Format

  • There are two types of HTTP messages, request messages and response messages.
  • HTTP request message Example:
    • The first line is called Request Line and the subsequent lines are called header lines.
    • The request line has three fields: Method field, URL field, HTTP version field
      • Method field: GET, POST, HEAD, PUT, and DELETE
        • Most requests use GET.
          • GET is used when the browser requests an object with the requested object identified in the URL field.
    • The header lines:
      • Host specifies the host.
      • Connection: close is telling the server that it doesn’t want to bother with persistent connections
      • User-agent: specifies the user agent (the browser)
      • Accept-language: indicates that the user prefers to receive a French version of the object, if such an object exists.
    • After header lines there may be a “entity body”.
      • Who is empty with GET method, but is used with the POST method.
        • With a POST message, the user is still requesting a web page from the server, but the specific contents of the web page depend on what the user entered into the form fields. The entity body then contains what the user entered into the form fields.
    • HTML forms can also use the GET method who put the data from the input fields into the URL.
    • The Head method is similar to GET, when a server receives a request with a HEAD method it responds with a HTTP message but it leaves out the requested object.
      • Application devs. Often use the HEAD method for debugging.
    • The PUT method is often used in conjunction with web publishing tools. It allows a user to upload an object to a specific path on a specific web server. It is aso used by applications that need to upload objects to the web server.
    • The DELETE method allows a user or an application to delete an object on a web server.
  • HTTP Response Message Example:
    • It got three sections:
      • Status line: Got three fields: Protocol version, a status code, and a corresponding status message
      • Six header lines:
        • Connection: close tells the client hat its going to close the TCP connection after sending the message
        • Date: indicates the time and date when the HTTP response was created or last modified; it is the time when the server retrieves the object from its file system, inserts the object into the response message and sends the response message.
        • Server: Indicates that the message was generated by an apache web server
        • Last-Modified:
        • Content-Length: indicates that the object in the entity body is HTML text
      • Entity body: Contains the object itself
  • Status codes:
    • 200 OK: Request succeeded
    • 301 Moved Permanently: Requested object has been permanently moved. The new location is found in the Location: header of the response
    • 400 Bad Request: Generic error code for that the server couldn’t understand the request
    • 404 Not Found: The requested object does not exist on the server
    • 505 HTTP Version Not Supported: The requested HTTP protocol is not supported by the server

2.2.4 – User-Server Interaction: Cookies

  • Cookies allow sites to keep track of users.
  • Cookie technology has 4 components:
    • A cookie header line in the HTTP response message
    • A cookie hader line in the HTTP request message
    • A cookie file kept on the user’s end-system and manage by the users’s browser
    • A back-end database at the web site
  • F.ex. when a user requests a web page from Amazon it will create a unique identification number which it adds to the Set-cookie line in the response message.
    • When the browser receives the response message it will append the line to a special cookie file with the hostname of the server and the identification number.
    • Each time the user requests a page from Amazon the browser will consult the cookie and extract the identification number and put it into the cookie header in the request message.

2.2.5 – Web Caching

  • A web cache is also called a proxy server
    • A network entity that satisfies HTTP requests on the behalf of an original Web server
    • It has its own disk storage and keeps copies of recently requested objects in this storage.
  • A cache is both a server and a client at the same time. When it receives requests from and sends response to a browser, it’s a server. When it sends requests to and receives responses from an origin server, it’s a client.
  • Typically a Web cache is purchased and installed by an ISP.
  • Web caching has seen deployment in the Internet for two reasons:
    • A web cache can substantially reduce the response time for a client request, particularly if the bottleneck bandwidth between the client and the original server is much less than the bottleneck bandwidth between the client and the cache.
    • Web aches can substantially reduce traffic on institutions access link to the Internet and the Internet as a whole.
  • Example of why Cache’s are important:
    • Suppose there’s two network, the institutional network and the public internet. The institutional network is a high-speed link while the link to the public internet is 15 Mbps. The origin server is connected to the internet and is located all over the world
    • Suppose the average object size is 1 Mb, the average request rate is 15 per second, and that the HTTP request messages are negligibly small.
    • Suppose the amount of time it takes from when the router on the Internet side of the access link until it receives the response is on average 2 seconds (we call this Internet delay).
      • The total response time is the sum of the Lan delay, the access delay and the Internet delay.
        • The traffic intensity on the LAN:
          • (15 request/s)*(1 Mbits/request)/(100 Mbps) = 0.15
        • Whereas the traffic intensity on the access link is:
          • (15 request/s)*(1 Mbits/request)/(15 Mbps) = 1
    • 0.15 on a Lan typically results in tens of ms, but when the intensity approach 1 it can go towards minutes.
      • One option to solve this issue is to upgrade the link to 100 Mbps, but that costs a lot of money.
      • Another option is to install a Web cache, and the fraction of requests that are cached range from 0.2-0.7, for this example lets say its 0.4
        • This means 40% will get the web page in a few ms, while the other 60% still have to get the page from the origin server, but that sets the intensity to 0.6 (an intensity >0.8 typically results in a small delay in the 10s of ms)
      • The average delay with a Web cache server then gives us:
        • 0.4(0.01s)+0.6*(2.01s)=1.2s
  • Content Distribution Networks (CDNs):
    • Geographically distributed caches throughout the internet, thereby localizing much of the traffic. There are shared and dedicated CDNs
  • The cached copy of an object might have been modified since it was cached, but HTTP has a mechanism that allows a cache to verify that its objects are up to date. The mechanism is called conditional GET.
    • An HTTP request message is a so-called conditional Get message if:
      • The request message uses the Get method
      • The request message includes an if-modified-since: header line
    • If it hasn’t been modified since it was cached then the server response with a 304 Not modified.

results matching ""

    No results matching ""