HTTP Client with 4D and 4D Internet Commands


4D - Documentation Français English German ACI Technical Notes ACI Technical Notes, By Subject Back Previous Next Find

HTTP Client with 4D and 4D Internet Commands

By Julien Feasson, Software Engineer, 4D, Inc.

Technical Note 02-5

Technical Notes for Technical Notes for 02-02 February, 2002

Summary


The purpose of this technical note is to build an HTTP Client in 4D using 4D Internet Commands. The aim is to quickly go through the specifications of the Hyper Text Transfer Protocol and apply them in 4D. We will see how to parse a URL, build, send and receive an HTTP request, interpret an HTTP response, and encode a couple username and password in Base 64 for basic authentication.

1- The Hyper Text Transfer Protocol


HTTP is the main protocol used by the World Wide Web. HTTP is invoked in most of the web transactions. HTTP defines most of the requests for a web document or graphic, every click of a hypertext link, and every submission of a form. The Web is about distributing information over the Internet, and HTTP is the protocol used to do so.

HTTP's purpose is to provide a standardized way for computers to communicate with each other. HTTP specifies how clients request data, and how servers respond to these requests.

The URL

Given the following URL: http://www.4D.com:80/

The browser interprets the URL as follows:

http:// Use HTTP, the HyperText Transfer Protocol.

www.4D.com Contact a computer over the network with the hostname of www.4D.com

:80Connect to the computer through port 80. The port number can be any
legitimate IP port number: 1 through 65535, inclusively (version 4 IP
addressing). If the colon and the port number are both omitted, the port
number is assumed to be HTTP's default port number, which is 80.

/Anything after the hostname and optional port number is regarded as a
document path. In this example, the document path is /.

The Request:

Given the same URL, the browser connects to www.4D.com on port 80 using the HTTP protocol. An example of a message that the browser can send to the server is:

GET / HTTP/1.1

Accept-Language: en-us

User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

Host: www.4D.com

Connection: Keep-Alive

Let's have a quick look at what these lines are saying:

1. "GET / HTTP/1.1" requests a document from the server. HTTP/1.1 is given as the version of the protocol that the browser uses.

2. "Accept-Language: en-us" indicates that the preferred language is English. This header allows the client to specify a preference for one or more languages, in the event that a server has the same document in multiple languages.

3. "User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)" identifies the client as Mozilla version 1.0, running on Windows NT. Between parentheses, it mentions that it is really Microsoft Internet Explorer version 5.01.

4. "Host: www.4D.com" tells the server what the client thinks the server's hostname is. This header is mandatory in HTTP 1.1 but optional in HTTP 1.0. Since the server may have multiple hostnames, the client indicates which hostname is being requested. In this environment, a web server can have a different document tree for each hostname assigned to it. If the client hasn't specified the server's hostname, the server may be unable to determine which document tree to use.

5. "Connection: Keep-Alive" tells the server to keep the TCP connection open until explicitly told to disconnect. Under HTTP 1.1, the default server behavior is to keep the connection open until the client specifies that the connection should be closed. The standard behavior in HTTP 1.0 is to close the connection after the client's request.

Together, these five lines constitute a request. Lines two through five are request headers.

HTTP transactions do not need to use all the headers. As matter of fact, it is possible to perform some HTTP requests without supplying any header information at all. For example, in the most simple case, a request of GET / HTTP/1.0 without any headers is sufficient for most servers to understand the client.

The first line tells the client which method to use, which entity (document) to apply it to, and which version of HTTP the client is using. Possible HTTP 1.1 methods are GET, POST, HEAD, PUT, LINK, UNLINK, DELETE, OPTIONS, and TRACE.

The Client methods

GET:

The GET request is used to retrieve a resource on the server. This resource could consist of the contents of a static file or it could invoke a program that generates data.

HEAD:

The HEAD request means that you just want information about the document, but don't need the document itself.

POST:

The POST request says that you're providing some information of your own (generally used for fill-in forms). This typically changes the state of the server in some way. For example, it could create a record in a database.

PUT:

The PUT request is used to provide a new or replacement document to be stored on the server.

DELETE:

The DELETE request is used to remove a document on the server.

TRACE:

The TRACE request asks that proxies declare themselves in the headers, so the client can learn the path that the document took (and thus determine where something might have been garbled or lost). This is used for protocol debugging purposes.

OPTIONS:

The OPTIONS request is used when the client wants to know what other methods can be used for that document (or for the server at large).

CONNECT:

The CONNECT request is used when a client needs to talk to a HTTPS server through a proxy server.

Other HTTP methods that you may see (LINK, UNLINK, and PATCH) are less clearly defined.

The Client Headers

There are three types of HTTP headers:

- General headers indicate general information such as the date, or whether the connection should be maintained. Both clients and servers use them.

- Request headers are used only for client requests. They convey the client's configuration and desired document format to the server.

- Entity headers describe the document format of the data being sent between client and server. Although Entity headers are most commonly used by the server when returning a requested document, they are also used by the client when using the POST or PUT methods.

Headers from all three categories may be specified in any order. Header names are not case-sensitive, so the Content-type header is also frequently written as Content-Type.

The most common headers used over the Internet are:

- Connection: options (General header)

Options can have two values "close" or "keep-alive". "close" signifies that the client or the server wants to end the connection (i.e., this will be the last transaction). "keep-alive" signifies that the client wants the connection to persist. The default behavior of HTTP 1.1 is to use persistent connections. Persistent connections are maintained after a transaction is performed.

- Transfer-Encoding: encoding_type (General header)

This specifies the encoding used for the message. The only encoding type that HTTP 1.1 supports is "chunked." We will explain what the chunked encoding method is later in the technical note.

- Accept: type/subtype [q=qvalue] (Client request header)

This specifies the client's preferred media types. For example:
Accept: text/*, image/gif

Multiple media types can be listed separated by commas. The optional qvalue represents the acceptable quality level for accept types (0 or 1).

- Accept-Charset: Character_set [q=qvalue] (Client request header)

It specifies the character sets that the client prefers. By default, the value is ISO-8859-7 for US-ASCII.

- Accept-Encoding: encoding_types (Client request header)

It is through this header that a client may specify what encoding algorithms it understands (e.g. x-compress).

- Accept-Language: language [q=qvalue] (Client request header)

This specifies the client's preferred languages. This allows a server, which has the same document in different languages, to send the document in the language that matches the client's preference. (e.g. en for English, fr for French and so on.)

- Authorization: sheme_credentials (Client request header)

It provides the client's authorization information to access data at a specific URL. If a client's request is on a document that requires authorization, the server returns a WWW-Authenticate header that describes the type of authorization needed. The client then reiterates the request in which it includes the proper authorization information. HTTP 1.0 defines the BASIC authorization scheme, for which the authorization parameter is the string of username:password encoded in base 64. For example, for a username value of "username" and a password value of "password," the authorization header would look like this:

Authorization: BASIC dXNlcm5hbWU6cGFzc3dvcmQ=

- Cookie: name=value (Client request header)

This line contains a name/value pair stored for that URL. Multiple cookies can be specified, in which case semicolons separate them. Set-Cookie and Cookie headers both should be propagated through proxy servers, even if a page is cached or has not been modified.

- Host: hostname:port (Client request header)

This is the hostname and port number (optional) of the server contacted by the client. If the port number is the default Web port (80), both the colon and port number should be omitted. This line actually indicates to which server the client thinks it is talking.

- If-Modified-Since: date (Client request header)

This line specifies that the URL data is to be sent only if it has been modified since the date given. For example:

If-Modified-Since: Thu, 12 Dec 2001 12:10:34 GMT

- User-Agent: string (Client request header)

This line provides identification data about the client program.

- Content-Encoding: encoding_shemes (Entity headers)

This line specifies the encoding types used for the transferred data. Multiple encoding should appear in the order they were applied to the data and are separated with commas. Values can be gzip, x-gzip, compress, or x-compress.

- Content-Language: languages (Entity headers)

This line specifies the language of the transferred data.

- Content-Length: n (Entity headers)

This line specifies the length of the transferred data (in bytes).

- Content-Type: type/subtype (Entity headers)

This line describes the media type and subtype of the transferred data. For example:

Content-type: text/html

- Last-modified: date (Entity headers)

This line specifies when the URL was last modified.

The Response

Since the aim of the technical note is to build an HTTP client, we will go quickly through the structure of a server response to understand what the server sends back to the client, and to know what we will have to parse.

2- The Demo Database, The 4D HTTP Client:


This is the only form of the Database. It allows the user to build a request, send it to the server and receive the response via the 4D Internet Commands.

The purpose of the database is to demonstrate how to:

- build a request

- parse a URL

- use basic authentication (Base 64 encoding)

- send and receive requests to and from the server using the 4D Internet Commands

- interpret the HTTP response header

- parse a Chunked body (Base 16 decoding)

How to use the demo:

The demo database is composed of two forms:

The HTTP browser window:

It allows (as a common browser) to request a Web page typing the URL in the top textfield. Follow the instructions:

1. Open the demo database.

2. In the menu "start," select "compose request."

3. The form appears:

a. In the first text field, enter the URL of the document you want to request. (For example
www.4D.com)
b. Validate the form by pressing enter or return.

4. The request is built and sent, the result appears in the preview (4D Write Plug-In) window.

The HTTP Request Builder window:

In that page, you can define a custom request. You may want to:

- Change the URL.

- Try another method.

- Access a protected realm with basic authentication.

- Add your own custom header.

As you can see, with the database you can create your own HTTP requests, send them, receive an answer from the server, and display the result in a 4D Write window.

The demo database provides the following methods:

The Methods:

HTTP_URLParserparses a URL and sets the domainname, portnumber and folder variable.
HTTP_URLEncodertranslates a string into URL format
HTTP_URLDecodertranslates a URL format string into a human readable string.
HTTP_Encodeencodes a string into base 64 using the HTTP_EncBase64
HTTP_Decodedecodes a base 64 string using the HTTP_DecBase64
HTTP_SendRequestbuilds and sends the request
HTTP_ChunkedBodytranslates a chunked body using HTTP_Hexa2Dec
HTTP_Hexa2Decconverts a base hexadecimal string to a decimal longint

When the user presses the Send button, the methods launched are in the following order:

1. HTTP_URLParser is called to decompose the URL and set the variables needed to connect to the server: (domainname, portnumber, folder). To set the folder path in the variable folder, we need to use the method HTTP_URLEncoder which will translate the string into the URL format.

2. HTTP_SendRequest which will build and send the request. To build the request, we may need to provide an authentication. For that we have to encode the username and the password in Base 64 using the HTTP_Encode method, itself using the method HTTP_EncBase64.

3. While still in HTTP_SendRequest, we receive the response. If the body is chunked and the chunked translation box is checked, the HTTP_ChunkedBody method is called to parse and translate the body using the method HTTP_Hexa2Dec.

4. Then, if the server has moved the requested document, we can follow and request the document at its new location. So we need to call HTTP_URLParser again which will use once again the method HTTP_URLEncoder.

The 4DHTTPClient database provides some useful methods that can be re-used for some other types of database:

- HTTP_URLParser

- HTTP_URLEncoder

- HTTP_URLDecoder

- HTTP_Encode (with HTTP_EncBase64)

- HTTP_Decode (with HTTP_DecBase64)

- HTTP_ChunkedBody (with HTTP_Hexa2Dec)

- HTTP_EncBase64

- HTTP_DecBase64

- HTTP_SendRequest (You may need to adapt this method to fit your needs.)

- HTTP_Hexa2Dec

The first thing to do is interpret the URL.

You can notice that, the string "http://" , the colon, the port number and the last slash for a directory are not mandatory.

All the following URLs are valid URLs:

http://www.4D.com/www.4D.com:80
http://www.4D.comwww.4D.com:80/
http://www.4D.com:80http://www.4D.com/index.html
http://www.4D.com:80/ http://www.4D.com:80/index.html
www.4D.comwww.4D.com/

The first method to be called is HTTP_URLParser in order to decompose the encoded URL in three strings: "domainname", "portnumber" and "folder" as follows:

http://domainname:portnumber/folder

HTTP_URLParser

Source Code of HTTP_URLParser:

This first part of the code just checks if the user specified the protocol to be used in the URL. This statement allows the user to not specify "http://" at the beginning of the URL but does not mean that the user can specify a different protocol…

   `Check if the user specified the protocol http
   $URL:=$1
   If (Substring($URL;0;7)="http://")
      $shift:=8;
   Else 
      $shift:=1
   End if
   

The second part of the code sets the variable "domainname" and "portnumber". If "portnumber" is not specified, the default value is 80.

   
     `Retrieve the domainname and the portnumber
   If ((Position(":";Substring($URL;$shift))#0) & 
   (Position(":";Substring($URL;$shift))<Position("/";Substring($URL;$shift))))
      domainname:=Substring($URL;$shift;Position(":";Substring($URL;$shift))-1)
      If (Position("/";Substring($URL;$shift))#0)
         portnumber:=Num(Substring($URL;Position(":";Substring($URL;$shift))+$shift;
         Position("/";Substring($URL;$shift))-Position(":";Substring($URL;$shift))-1))
      Else 
         portnumber:=Num(Substring($URL;Position(":";Substring($URL;$shift))+$shift;Length($shift)-
         Position(":";Substring($URL;$shift))-1))
       End if 
   Else 
      If (Position("/";Substring($URL;$shift))#0)
         Domainname := Substring($URL;$shift;Position("/";Substring($URL;$shift))-1)
      Else 
         domainname:=Substring($URL;$shift)
      End if 
      portnumber:=80
   End if
   

The third part sets the document path in the variable folder. As you can see, HTTP_URLParser calls HTTP_URLEncoder to encode the string passed as URL into a URL format.

   If (Position("/";$URL)#0)
      folder:=HTTP_URLEncoder(Substring($URL;Position("/";$URL)))
   Else 
      folder:="/"
   End if
   

HTTP_URLEncoder

The method HTTP_URLEncoder translates the URL into a URL format. The URL format is the application/x-www.form-urlencoded type; certain characters are encoded to eliminate ambiguity. For those that are encoded, they are encoded with %xx where xx is the hexadecimal representation of the ASCII code of the character.

All the characters can be encoded but usually the characters *, -, ., 0-9, A-Z, _, a-z are not encoded.

Note: "/" is not encoded if it is considered as the folder separator, otherwise it would be represented in the URL as %2F. (ASCII Code = 47 Base Dec = 2F Base Hex)

Example:

The URL:

http://www.mydomain.com/my folder path/ my document.html

contains 3 spaces, the valid URL would be

http://www.mydomain.com:80/my%20folder%20path/my%20document.html

This is what the method will perform:

        ` Parse the string and translate the special characters
   For ($WSPI_parser;1;Length($WSPI_MyString))
      $WSPI_MyChar:=Substring($WSPI_MyString;$WSPI_parser;1)
      If ((($WSPI_MyChar>="a'") & ($WSPI_MyChar<="z'")) | (($WSPI_MyChar>="A") & ($WSPI_Mychar<="Z")) |
          (($WSPI_MyChar>="0") & ($WSPI_MyChar<="9")) | ($WSPI_MyChar="*") | ($WSPI_MyChar="-") | 
      ($WSPI_MyChar=".") | ($WSPI_MyChar="_") | ($WSPI_MyChar="/"))
         $0:=$0+$WSPI_MyChar
      Else 
         $0:=$0+"%"
         $WSPI_n:=Ascii($WSPI_MyChar)\16
         If ($WSPI_n<10)
            $0:=$0+String($WSPI_n)
         Else 
            $0:=$0+Char(Ascii("A")+$WSPI_n-10)
         End if 
         $WSPI_n:=Ascii($WSPI_MyChar)%16
         If ($WSPI_n<10)
            $0:=$0+String($WSPI_n)
         Else 
            $0:=$0+Char(Ascii("A")+$WSPI_n-10)
         End if 
      End if 
   End For
   

This method translates all the special characters into the URL format. For example, The space character would be replaced by %20, and this method would be valid only for the charset Latin-1. You may modify this method if you use a different charset.

Then, you may need to provide a username and a password to access a protected document. There are some different kinds of authentication over the Internet. In this example, the demo database allows the user to connect to a server via a basic authentication which is a base 64 encoded string sent in the HTTP Header.

HTTP_Encode

This method is used to encode in base 64 the username and the password.

The base 64 encoding (refer to RFC 1521): A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)

The aim of the encoding process is to represent 24-bit groups of input bits as output strings of 4 encoded characters:

- Proceeding from left to right, a 24-bit input group is formed by concatenating three 8-bit input groups.

- These 24 bits are then treated as 4 concatenated 6-bit groups.

- Each of the 4 groups is translated into a single digit in the base 64 alphabet.

HTTP_Encode uses the method HTTP_EncBase64. The string that should be passed to this method is "username:password".

For example:

"username:password" base 64 encoded is: "dXNlcm5hbWU6cGFzc3dvcmQ="

Note: A method HTTP_Decode is also provided.

Now, HTTP_SendRequest is able to build the request it is going to send.

HTTP_SendRequest

The HTTP_SendRequest method builds the request and establishes the connection with the server. It updates the HTTP response window in real time, thus providing information on the connection status. To do so, we created a new process so that the HTTP response window can be updated while the connection is running.

Tip about building an HTTP Request:

The variable $C_CR and $C_LF are declared as follows:

   $C_CR:=Char (Carriage return)
   $C_LF:=Char (Line feed)
   

You need to add two $C_CR+$C_LF at the end of the HTTP Request headers so that the server understands this is the end of the headers. If your request has a body (for example if you want to send some data with a POST method) you need to leave a blank line between the header and the body and then add the two $C_CR+$C_LF at the end of the body.

The connection with the server:

Sending an HTTP request to a server using the 4D Internet Commands is really easy. You just need to open a connection using the command TCP_Open, with the domain name, the port number, and a variable for the session ID as parameters. Once the connection is established, you can send the request using TCP_Send. This command sends the Blob containing the HTTP request. Then, you just need to wait for the Web server's answer by looping and listening to all TCP packets until the connection is closed or until a TCP packet asks to close the connection.

Here is an example of a basic algorithm that connects to a server and receives a response:

   $err:=TCP_Open ($domainname;$portnumber;$MySessionID)
   If ($err=0)
      $err:=TCP_SendBLOB ($MySessionID;$Blob_Send)  ` Send the request
      If ($err=0)
         Repeat   ` Loop to retrieve the answer
            $err:=TCP_ReceiveBLOB ($MySessionID;$Blob_Received)
            $err:=TCP_State ($MySessionID;$State)
            $srcpos:=0
            $dstpos:=BLOB size($Blob_All)
            `Concatenating received Blobs
            COPY BLOB($Blob_Received;$Blob_All;$srcpos;$dstpos;
            BLOB size($Blob_Received))
         Until (($State=0) | ($err#0))
            ` Blob received    
         $err:=TCP_Close ($MySessionID)
      End if 
   End if
   

As soon as you get the response, the first thing is to parse the header you received to check the status code. If everything is OK, you usually received a status code of 200.

Server Response Codes

The initial line of the server's response includes the HTTP version, a three-digit status code, and a human-readable description of the result. Status codes are grouped as follows:

100-199

Informational

200-299

Client request successful

300-399

Client request redirected, further action necessary

400-499

Client request incomplete

500-599

Server errors

The most common status codes are:

100 Continue

The initial part of the request has been received, and the client may continue with the request

200 OK

The client's request was successful.

301 Moved Permanently

The requested URL is no longer used by the server, and the operation specified in the request was not performed. The new location for the requested document is specified in the Location header. All future requests for the document should use the new URL.

302 File Not Found

Same purpose as 307.

307 Moved Temporarily

The requested URL has moved temporarily. The location header specifies the new location, but this is only temporary, and the user may revisit the original URL in the future.

400 Bad Request

This response code indicates that the server detected a syntax error in the client's request.

401 Unauthorized

The client should supply proper authorization when requesting this URL again.

404 Not Found

The document at the specified URL does not exist.

Once you have interpreted the status code, you may react as you want. An example provided in the demo database is the AutoFollow. If the user checked the AutoFollow box, and if the status code is 301, 302, or 307, which means that the document has moved, the HTTP Client re-builds a request using the new URL specified in the Location header. This is implemented just by adding a test on the status code in the response and a loop around the sending-receiving of the request.

HTTP_Chunked

In HTTP/1.1 the transfer encoding can be chunked. The chunked transfer-encoding encodes the message as a series of chunks followed by entity-headers. Each chunk contains a chunk size specified in base 16, followed by a CRLF. After that, the chunk body is presented, whose length is specified in the chunk size and is followed by CRLF. Consecutive chunks are specified one after another, with the last chunk having a length of zero followed by CRLF.

The following code translates a chunked body into "non-chunked" body. It is launched if the headers contain the Transfer-Encoding header set to "chunked", and if the chunked body translation box is checked.

   $ChunkSizeB16:="-1"
   $offset:=0
   $0:=""
   
   While ((($ChunkSizeB16#"0") & ($ChunkSizeB16#($C_LF+"0"))) & ($ChunkSizeB16#""))
      $ChunkSizeB16:=Substring($1;$offset;Position($C_CR+$C_LF;Substring($1;$offset))-1)
      $ChunkSize:=HTTP_Hexa2Dec ($ChunkSizeB16)
      $0:=$0+Substring($1;Position($C_CR+$C_LF;Substring($1;$offset))+1+$offset;$ChunkSize)
      $offset:=$offset+$ChunkSize+2+Length($ChunkSizeB16)+2
   End while
   

Note: As the chunked body and the resulting translated body are passed as 4D text variables, you will get some errors if the text is bigger than 32ko.

3- Conclusion


This technical note was developed to introduce the HTTP protocol and to explain how to implement it in 4D with the 4D Internet Commands. You may need to go through the source code to add your own code. Since we use 4D Write to display the HTML content, some HTML objects such as pictures are not displayed. This is where a browser would request the picture from the server using the picture's URL. A good exercise would be to parse the HTML document to retrieve the pictures URL, and to query the server to display them in the 4D Write window. Also this HTTP Client could be extended to WebDAV


4D - Documentation Français English German ACI Technical Notes ACI Technical Notes, By Subject Back Previous Next Find