Thursday, August 4, 2016

Structure of HTTP Transactions

                                                Crimson Security Group
                                                                      we yearn security....


Structure of HTTP Transactions:

First we need to Understand what are resources....
  
What are  " resources " ?

HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL (it's the R in URL). The most common kind of resource is a file, but a resource may also be a dynamically-generated query result, the output of a CGI script, a document that is available in several languages, or something else.

While learning HTTP, it may help to think of a resource as similar to a file, but more general. As a practical matter, almost all HTTP resources are currently either files or server-side script output.  

Now see how HTTP Transaction occurs....

Like most network protocols, HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions).

The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of:  

* an initial line,

* zero or more header lines,

* a blank line (i.e. a CRLF by itself), and  

* an optional message body (e.g. a file, or query data, or query output). 

The format of an HTTP message is:  

<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3

<optional message body goes here, like file contents or query data;
 it can be many lines long, or even binary data $&*%@!^$@>

Initial lines and headers should end in CRLF, though you should gracefully handle lines ending in just LF. (More exactly, CR and LF here mean ASCII values 13 and 10, even though some platforms may use different characters.) 

Initial Request Line

The initial line is different for the request than for the response. A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is:

  1                 2                      3
GET /path/to/file/index.html HTTP/1.0
 
Method names are always uppercase (GET is a method)
 
1   GET is the most common HTTP method; it says "give me this resource". We   
     will discuss more methods later...

2   The path is the part of the URL after the host name, also called the request    
     URI (a URI is like a URL, but more general)

3   The HTTP version always takes the form "HTTP/x.x", uppercase 


Initial Response Line (Status Line)

The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are:  

HTTP/1.0 200 OK            ---> status code 200 OK  means resource available
or
HTTP/1.0 404 Not Found     

The HTTP version is in the same format as in the request line, "HTTP/x.x". 

The status code is meant to be computer-readable; the reason phrase is meant to be human-readable, and may vary.

The status code is a three-digit integer, and the first digit identifies the general category of response

      1xx indicates an informational message only
       2xx indicates success of some kind
       3xx redirects the client to another URL
       4xx indicates an error on the client's part
      5xx indicates an error on the server's part 

The most common status codes are:  

200 OK               The request succeeded, and the resulting resource (e.g. file or 
                        script output) is returned in the message body. 
404 Not Found     The requested resource doesn't exist
 
301              Moved Permanently  
 
302              Moved Temporarily  
 
303 See Other (HTTP 1.1 only)
 
The resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file
500 Server Error
An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly. 


Reference Link: https://httpstatuses.com/
 


Header Lines

Header lines provide information about the request or response, or about the object sent in the message body.The header lines are in the usual text header format, which is: one line per header, of the form "Header-Name: value", ending with CRLF.

*  As noted above, they should end in CRLF, but you should handle      LF correctly. 

The header name is not case-sensitive (though the value may be)

Any number of spaces or tabs may be between the ":" and the          value.

Header lines beginning with space or tab are actually part of the    previous header line, folded into multiple lines for easy reading.

HTTP 1.0 defines 16 headers, though none are required.          HTTP 1.1 defines 46 headers, and one (Host:) is required in requests.

The From: header gives the email address of whoever making the request, or running the program doing so. (This must be user-configurable, for privacy concerns.)  


The User-Agent: header identifies the program that's making the request, in the form "Program-name/x.xx", where x.xx is the (mostly) alphanumeric version of the program. For example, Netscape 3.0 sends the header "User-agent: Mozilla/3.0Gold"

These headers help web masters troubleshoot problems. They also reveal information about the user. When you decide which headers to include, you must balance the web masters logging needs against your users' needs for privacy. 

The Message Body

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server

If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular,


The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif.

The Content-Length: header gives the number of bytes in the body.

Sample HTTP Exchange

To retrieve the file at the URL
http://www.somehost.com/path/file.html
first open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket: 

GET /path/file.html HTTP/1.0
From: someuser@crimson.com
User-Agent: HTTPTool/1.0
[blank line here]
 
 


The server should respond with something like the following, sent back through the same socket: 

HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354

<html>
<body>
<h1>we yearn security</h1>
(more file contents)
  .
  .
  .
</body>
</html>

 After sending the response, the server closes the socket.









 















2 comments:

  1. Nicely Explained ... Great work !!
    Its absolute platform for beginners... Keep up the good work sir.

    ReplyDelete