HTTP Client with 4D and 4D Internet Commands
By Julien Feasson, Software Engineer, 4D, Inc.
Technical Note 02-5
Technical Notes for Technical Notes for 02-02 February, 2002
Summary
The purpose of this technical note is to build an HTTP Client in 4D using 4D Internet Commands. The aim is to quickly go through the specifications of the Hyper Text Transfer Protocol and apply them in 4D. We will see how to parse a URL, build, send and receive an HTTP request, interpret an HTTP response, and encode a couple username and password in Base 64 for basic authentication.
1- The Hyper Text Transfer Protocol
HTTP is the main protocol used by the World Wide Web. HTTP is invoked in most of the web transactions. HTTP defines most of the requests for a web document or graphic, every click of a hypertext link, and every submission of a form. The Web is about distributing information over the Internet, and HTTP is the protocol used to do so.
HTTP's purpose is to provide a standardized way for computers to communicate with each other. HTTP specifies how clients request data, and how servers respond to these requests.
The URL
Given the following URL: http://www.4D.com:80/
The browser interprets the URL as follows:
http:// Use HTTP, the HyperText Transfer Protocol.
www.4D.com Contact a computer over the network with the hostname of www.4D.com
| :80 | Connect to the computer through port 80. The port number can be any |
| legitimate IP port number: 1 through 65535, inclusively (version 4 IP | |
| addressing). If the colon and the port number are both omitted, the port | |
| number is assumed to be HTTP's default port number, which is 80. |
| / | Anything after the hostname and optional port number is regarded as a |
| document path. In this example, the document path is /. |
The Request:
Given the same URL, the browser connects to www.4D.com on port 80 using the HTTP protocol. An example of a message that the browser can send to the server is:
GET / HTTP/1.1
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
Host: www.4D.com
Connection: Keep-Alive
Let's have a quick look at what these lines are saying:
1. "GET / HTTP/1.1" requests a document from the server. HTTP/1.1 is given as the version of the protocol that the browser uses.
2. "Accept-Language: en-us" indicates that the preferred language is English. This header allows the client to specify a preference for one or more languages, in the event that a server has the same document in multiple languages.
3. "User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)" identifies the client as Mozilla version 1.0, running on Windows NT. Between parentheses, it mentions that it is really Microsoft Internet Explorer version 5.01.
4. "Host: www.4D.com" tells the server what the client thinks the server's hostname is. This header is mandatory in HTTP 1.1 but optional in HTTP 1.0. Since the server may have multiple hostnames, the client indicates which hostname is being requested. In this environment, a web server can have a different document tree for each hostname assigned to it. If the client hasn't specified the server's hostname, the server may be unable to determine which document tree to use.
5. "Connection: Keep-Alive" tells the server to keep the TCP connection open until explicitly told to disconnect. Under HTTP 1.1, the default server behavior is to keep the connection open until the client specifies that the connection should be closed. The standard behavior in HTTP 1.0 is to close the connection after the client's request.
Together, these five lines constitute a request. Lines two through five are request headers.
HTTP transactions do not need to use all the headers. As matter of fact, it is possible to perform some HTTP requests without supplying any header information at all. For example, in the most simple case, a request of GET / HTTP/1.0 without any headers is sufficient for most servers to understand the client.
The first line tells the client which method to use, which entity (document) to apply it to, and which version of HTTP the client is using. Possible HTTP 1.1 methods are GET, POST, HEAD, PUT, LINK, UNLINK, DELETE, OPTIONS, and TRACE.
The Client methods
GET:
The GET request is used to retrieve a resource on the server. This resource could consist of the contents of a static file or it could invoke a program that generates data.
HEAD:
The HEAD request means that you just want information about the document, but don't need the document itself.
POST:
The POST request says that you're providing some information of your own (generally used for fill-in forms). This typically changes the state of the server in some way. For example, it could create a record in a database.
PUT:
The PUT request is used to provide a new or replacement document to be stored on the server.
DELETE:
The DELETE request is used to remove a document on the server.
TRACE:
The TRACE request asks that proxies declare themselves in the headers, so the client can learn the path that the document took (and thus determine where something might have been garbled or lost). This is used for protocol debugging purposes.
OPTIONS:
The OPTIONS request is used when the client wants to know what other methods can be used for that document (or for the server at large).
CONNECT:
The CONNECT request is used when a client needs to talk to a HTTPS server through a proxy server.
Other HTTP methods that you may see (LINK, UNLINK, and PATCH) are less clearly defined.
The Client Headers
There are three types of HTTP headers:
- General headers indicate general information such as the date, or whether the connection should be maintained. Both clients and servers use them.
- Request headers are used only for client requests. They convey the client's configuration and desired document format to the server.
- Entity headers describe the document format of the data being sent between client and server. Although Entity headers are most commonly used by the server when returning a requested document, they are also used by the client when using the POST or PUT methods.
Headers from all three categories may be specified in any order. Header names are not case-sensitive, so the Content-type header is also frequently written as Content-Type.
The most common headers used over the Internet are:
- Connection: options (General header)
Options can have two values "close" or "keep-alive". "close" signifies that the client or the server wants to end the connection (i.e., this will be the last transaction). "keep-alive" signifies that the client wants the connection to persist. The default behavior of HTTP 1.1 is to use persistent connections. Persistent connections are maintained after a transaction is performed.
- Transfer-Encoding: encoding_type (General header)
This specifies the encoding used for the message. The only encoding type that HTTP 1.1 supports is "chunked." We will explain what the chunked encoding method is later in the technical note.
- Accept: type/subtype [q=qvalue] (Client request header)
| This specifies the client's preferred media types. For example: | |
| Accept: text/*, image/gif |
Multiple media types can be listed separated by commas. The optional qvalue represents the acceptable quality level for accept types (0 or 1).
- Accept-Charset: Character_set [q=qvalue] (Client request header)
It specifies the character sets that the client prefers. By default, the value is ISO-8859-7 for US-ASCII.
- Accept-Encoding: encoding_types (Client request header)
It is through this header that a client may specify what encoding algorithms it understands (e.g. x-compress).
- Accept-Language: language [q=qvalue] (Client request header)
This specifies the client's preferred languages. This allows a server, which has the same document in different languages, to send the document in the language that matches the client's preference. (e.g. en for English, fr for French and so on.)
- Authorization: sheme_credentials (Client request header)
It provides the client's authorization information to access data at a specific URL. If a client's request is on a document that requires authorization, the server returns a WWW-Authenticate header that describes the type of authorization needed. The client then reiterates the request in which it includes the proper authorization information. HTTP 1.0 defines the BASIC authorization scheme, for which the authorization parameter is the string of username:password encoded in base 64. For example, for a username value of "username" and a password value of "password," the authorization header would look like this:
Authorization: BASIC dXNlcm5hbWU6cGFzc3dvcmQ=
- Cookie: name=value (Client request header)
This line contains a name/value pair stored for that URL. Multiple cookies can be specified, in which case semicolons separate them. Set-Cookie and Cookie headers both should be propagated through proxy servers, even if a page is cached or has not been modified.
- Host: hostname:port (Client request header)
This is the hostname and port number (optional) of the server contacted by the client. If the port number is the default Web port (80), both the colon and port number should be omitted. This line actually indicates to which server the client thinks it is talking.
- If-Modified-Since: date (Client request header)
This line specifies that the URL data is to be sent only if it has been modified since the date given. For example:
If-Modified-Since: Thu, 12 Dec 2001 12:10:34 GMT
- User-Agent: string (Client request header)
This line provides identification data about the client program.
- Content-Encoding: encoding_shemes (Entity headers)
This line specifies the encoding types used for the transferred data. Multiple encoding should appear in the order they were applied to the data and are separated with commas. Values can be gzip, x-gzip, compress, or x-compress.
- Content-Language: languages (Entity headers)
This line specifies the language of the transferred data.
- Content-Length: n (Entity headers)
This line specifies the length of the transferred data (in bytes).
- Content-Type: type/subtype (Entity headers)
This line describes the media type and subtype of the transferred data. For example:
Content-type: text/html
- Last-modified: date (Entity headers)
This line specifies when the URL was last modified.
The Response
Since the aim of the technical note is to build an HTTP client, we will go quickly through the structure of a server response to understand what the server sends back to the client, and to know what we will have to parse.
2- The Demo Database, The 4D HTTP Client:
This is the only form of the Database. It allows the user to build a request, send it to the server and receive the response via the 4D Internet Commands.
The purpose of the database is to demonstrate how to:
- build a request
- parse a URL
- use basic authentication (Base 64 encoding)
- send and receive requests to and from the server using the 4D Internet Commands
- interpret the HTTP response header
- parse a Chunked body (Base 16 decoding)
How to use the demo:
The demo database is composed of two forms:
The HTTP browser window:
It allows (as a common browser) to request a Web page typing the URL in the top textfield. Follow the instructions:
1. Open the demo database.
2. In the menu "start," select "compose request."
3. The form appears:
| a. In the first text field, enter the URL of the document you want to request. (For example | |
| www.4D.com) | |
| b. Validate the form by pressing enter or return. |
4. The request is built and sent, the result appears in the preview (4D Write Plug-In) window.
The HTTP Request Builder window:
In that page, you can define a custom request. You may want to:
- Change the URL.
- Try another method.
- Access a protected realm with basic authentication.
- Add your own custom header.
As you can see, with the database you can create your own HTTP requests, send them, receive an answer from the server, and display the result in a 4D Write window.
The demo database provides the following methods:
The Methods:
| HTTP_URLParser | parses a URL and sets the domainname, portnumber and folder variable. |
| HTTP_URLEncoder | translates a string into URL format |
| HTTP_URLDecoder | translates a URL format string into a human readable string. |
| HTTP_Encode | encodes a string into base 64 using the HTTP_EncBase64 |
| HTTP_Decode | decodes a base 64 string using the HTTP_DecBase64 |
| HTTP_SendRequest | builds and sends the request |
| HTTP_ChunkedBody | translates a chunked body using HTTP_Hexa2Dec |
| HTTP_Hexa2Dec | converts a base hexadecimal string to a decimal longint |
When the user presses the Send button, the methods launched are in the following order:
1. HTTP_URLParser is called to decompose the URL and set the variables needed to connect to the server: (domainname, portnumber, folder). To set the folder path in the variable folder, we need to use the method HTTP_URLEncoder which will translate the string into the URL format.
2. HTTP_SendRequest which will build and send the request. To build the request, we may need to provide an authentication. For that we have to encode the username and the password in Base 64 using the HTTP_Encode method, itself using the method HTTP_EncBase64.
3. While still in HTTP_SendRequest, we receive the response. If the body is chunked and the chunked translation box is checked, the HTTP_ChunkedBody method is called to parse and translate the body using the method HTTP_Hexa2Dec.
4. Then, if the server has moved the requested document, we can follow and request the document at its new location. So we need to call HTTP_URLParser again which will use once again the method HTTP_URLEncoder.
The 4DHTTPClient database provides some useful methods that can be re-used for some other types of database:
- HTTP_URLParser
- HTTP_URLEncoder
- HTTP_URLDecoder
- HTTP_Encode (with HTTP_EncBase64)
- HTTP_Decode (with HTTP_DecBase64)
- HTTP_ChunkedBody (with HTTP_Hexa2Dec)
- HTTP_EncBase64
- HTTP_DecBase64
- HTTP_SendRequest (You may need to adapt this method to fit your needs.)
- HTTP_Hexa2Dec
The first thing to do is interpret the URL.
You can notice that, the string "http://" , the colon, the port number and the last slash for a directory are not mandatory.
All the following URLs are valid URLs:
| http://www.4D.com/ | www.4D.com:80 | |
| http://www.4D.com | www.4D.com:80/ | |
| http://www.4D.com:80 | http://www.4D.com/index.html | |
| http://www.4D.com:80/ | http://www.4D.com:80/index.html | |
| www.4D.com | www.4D.com/ |
The first method to be called is HTTP_URLParser in order to decompose the encoded URL in three strings: "domainname", "portnumber" and "folder" as follows:
http://domainname:portnumber/folder
HTTP_URLParser
Source Code of HTTP_URLParser:
This first part of the code just checks if the user specified the protocol to be used in the URL. This statement allows the user to not specify "http://" at the beginning of the URL but does not mean that the user can specify a different protocol…
`Check if the user specified the protocol http $URL:=$1 If (Substring($URL;0;7)="http://") $shift:=8; Else $shift:=1 End if
The second part of the code sets the variable "domainname" and "portnumber". If "portnumber" is not specified, the default value is 80.
`Retrieve the domainname and the portnumber
If ((Position(":";Substring($URL;$shift))#0) &
(Position(":";Substring($URL;$shift))<Position("/";Substring($URL;$shift))))
domainname:=Substring($URL;$shift;Position(":";Substring($URL;$shift))-1)
If (Position("/";Substring($URL;$shift))#0)
portnumber:=Num(Substring($URL;Position(":";Substring($URL;$shift))+$shift;
Position("/";Substring($URL;$shift))-Position(":";Substring($URL;$shift))-1))
Else
portnumber:=Num(Substring($URL;Position(":";Substring($URL;$shift))+$shift;Length($shift)-
Position(":";Substring($URL;$shift))-1))
End if
Else
If (Position("/";Substring($URL;$shift))#0)
Domainname := Substring($URL;$shift;Position("/";Substring($URL;$shift))-1)
Else
domainname:=Substring($URL;$shift)
End if
portnumber:=80
End if
The third part sets the document path in the variable folder. As you can see, HTTP_URLParser calls HTTP_URLEncoder to encode the string passed as URL into a URL format.
If (Position("/";$URL)#0)
folder:=HTTP_URLEncoder(Substring($URL;Position("/";$URL)))
Else
folder:="/"
End if
HTTP_URLEncoder
The method HTTP_URLEncoder translates the URL into a URL format. The URL format is the application/x-www.form-urlencoded type; certain characters are encoded to eliminate ambiguity. For those that are encoded, they are encoded with %xx where xx is the hexadecimal representation of the ASCII code of the character.
All the characters can be encoded but usually the characters *, -, ., 0-9, A-Z, _, a-z are not encoded.
Note: "/" is not encoded if it is considered as the folder separator, otherwise it would be represented in the URL as %2F. (ASCII Code = 47 Base Dec = 2F Base Hex)
Example:
The URL:
http://www.mydomain.com/my folder path/ my document.html
contains 3 spaces, the valid URL would be
http://www.mydomain.com:80/my%20folder%20path/my%20document.html
This is what the method will perform:
` Parse the string and translate the special characters
For ($WSPI_parser;1;Length($WSPI_MyString))
$WSPI_MyChar:=Substring($WSPI_MyString;$WSPI_parser;1)
If ((($WSPI_MyChar>="a'") & ($WSPI_MyChar<="z'")) | (($WSPI_MyChar>="A") & ($WSPI_Mychar<="Z")) |
(($WSPI_MyChar>="0") & ($WSPI_MyChar<="9")) | ($WSPI_MyChar="*") | ($WSPI_MyChar="-") |
($WSPI_MyChar=".") | ($WSPI_MyChar="_") | ($WSPI_MyChar="/"))
$0:=$0+$WSPI_MyChar
Else
$0:=$0+"%"
$WSPI_n:=Ascii($WSPI_MyChar)\16
If ($WSPI_n<10)
$0:=$0+String($WSPI_n)
Else
$0:=$0+Char(Ascii("A")+$WSPI_n-10)
End if
$WSPI_n:=Ascii($WSPI_MyChar)%16
If ($WSPI_n<10)
$0:=$0+String($WSPI_n)
Else
$0:=$0+Char(Ascii("A")+$WSPI_n-10)
End if
End if
End For
This method translates all the special characters into the URL format. For example, The space character would be replaced by %20, and this method would be valid only for the charset Latin-1. You may modify this method if you use a different charset.
Then, you may need to provide a username and a password to access a protected document. There are some different kinds of authentication over the Internet. In this example, the demo database allows the user to connect to a server via a basic authentication which is a base 64 encoded string sent in the HTTP Header.
HTTP_Encode
This method is used to encode in base 64 the username and the password.
The base 64 encoding (refer to RFC 1521): A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)
The aim of the encoding process is to represent 24-bit groups of input bits as output strings of 4 encoded characters:
- Proceeding from left to right, a 24-bit input group is formed by concatenating three 8-bit input groups.
- These 24 bits are then treated as 4 concatenated 6-bit groups.
- Each of the 4 groups is translated into a single digit in the base 64 alphabet.
HTTP_Encode uses the method HTTP_EncBase64. The string that should be passed to this method is "username:password".
For example:
"username:password" base 64 encoded is: "dXNlcm5hbWU6cGFzc3dvcmQ="
Note: A method HTTP_Decode is also provided.
Now, HTTP_SendRequest is able to build the request it is going to send.
HTTP_SendRequest
The HTTP_SendRequest method builds the request and establishes the connection with the server. It updates the HTTP response window in real time, thus providing information on the connection status. To do so, we created a new process so that the HTTP response window can be updated while the connection is running.
Tip about building an HTTP Request:
The variable $C_CR and $C_LF are declared as follows:
$C_CR:=Char (Carriage return) $C_LF:=Char (Line feed)
You need to add two $C_CR+$C_LF at the end of the HTTP Request headers so that the server understands this is the end of the headers. If your request has a body (for example if you want to send some data with a POST method) you need to leave a blank line between the header and the body and then add the two $C_CR+$C_LF at the end of the body.
The connection with the server:
Sending an HTTP request to a server using the 4D Internet Commands is really easy. You just need to open a connection using the command TCP_Open, with the domain name, the port number, and a variable for the session ID as parameters. Once the connection is established, you can send the request using TCP_Send. This command sends the Blob containing the HTTP request. Then, you just need to wait for the Web server's answer by looping and listening to all TCP packets until the connection is closed or until a TCP packet asks to close the connection.
Here is an example of a basic algorithm that connects to a server and receives a response:
$err:=TCP_Open ($domainname;$portnumber;$MySessionID) If ($err=0) $err:=TCP_SendBLOB ($MySessionID;$Blob_Send) ` Send the request If ($err=0) Repeat ` Loop to retrieve the answer $err:=TCP_ReceiveBLOB ($MySessionID;$Blob_Received) $err:=TCP_State ($MySessionID;$State) $srcpos:=0 $dstpos:=BLOB size($Blob_All) `Concatenating received Blobs COPY BLOB($Blob_Received;$Blob_All;$srcpos;$dstpos; BLOB size($Blob_Received)) Until (($State=0) | ($err#0)) ` Blob received $err:=TCP_Close ($MySessionID) End if End if
As soon as you get the response, the first thing is to parse the header you received to check the status code. If everything is OK, you usually received a status code of 200.
Server Response Codes
The initial line of the server's response includes the HTTP version, a three-digit status code, and a human-readable description of the result. Status codes are grouped as follows:
100-199
Informational
200-299
Client request successful
300-399
Client request redirected, further action necessary
400-499
Client request incomplete
500-599
Server errors
The most common status codes are:
100 Continue
The initial part of the request has been received, and the client may continue with the request
200 OK
The client's request was successful.
301 Moved Permanently
The requested URL is no longer used by the server, and the operation specified in the request was not performed. The new location for the requested document is specified in the Location header. All future requests for the document should use the new URL.
302 File Not Found
Same purpose as 307.
307 Moved Temporarily
The requested URL has moved temporarily. The location header specifies the new location, but this is only temporary, and the user may revisit the original URL in the future.
400 Bad Request
This response code indicates that the server detected a syntax error in the client's request.
401 Unauthorized
The client should supply proper authorization when requesting this URL again.
404 Not Found
The document at the specified URL does not exist.
Once you have interpreted the status code, you may react as you want. An example provided in the demo database is the AutoFollow. If the user checked the AutoFollow box, and if the status code is 301, 302, or 307, which means that the document has moved, the HTTP Client re-builds a request using the new URL specified in the Location header. This is implemented just by adding a test on the status code in the response and a loop around the sending-receiving of the request.
HTTP_Chunked
In HTTP/1.1 the transfer encoding can be chunked. The chunked transfer-encoding encodes the message as a series of chunks followed by entity-headers. Each chunk contains a chunk size specified in base 16, followed by a CRLF. After that, the chunk body is presented, whose length is specified in the chunk size and is followed by CRLF. Consecutive chunks are specified one after another, with the last chunk having a length of zero followed by CRLF.
The following code translates a chunked body into "non-chunked" body. It is launched if the headers contain the Transfer-Encoding header set to "chunked", and if the chunked body translation box is checked.
$ChunkSizeB16:="-1" $offset:=0 $0:="" While ((($ChunkSizeB16#"0") & ($ChunkSizeB16#($C_LF+"0"))) & ($ChunkSizeB16#"")) $ChunkSizeB16:=Substring($1;$offset;Position($C_CR+$C_LF;Substring($1;$offset))-1) $ChunkSize:=HTTP_Hexa2Dec ($ChunkSizeB16) $0:=$0+Substring($1;Position($C_CR+$C_LF;Substring($1;$offset))+1+$offset;$ChunkSize) $offset:=$offset+$ChunkSize+2+Length($ChunkSizeB16)+2 End while
Note: As the chunked body and the resulting translated body are passed as 4D text variables, you will get some errors if the text is bigger than 32ko.
3- Conclusion
This technical note was developed to introduce the HTTP protocol and to explain how to implement it in 4D with the 4D Internet Commands. You may need to go through the source code to add your own code. Since we use 4D Write to display the HTML content, some HTML objects such as pictures are not displayed. This is where a browser would request the picture from the server using the picture's URL. A good exercise would be to parse the HTML document to retrieve the pictures URL, and to query the server to display them in the 4D Write window. Also this HTTP Client could be extended to WebDAV