HTTP
The Hypertext Transfer Protocol (HTTP) is used to transport virtually all traffic on the World-Wide
Web (WWW). HTTP is a client-server model: a client submits a request to the server, which in turn
sends a response.
SOAP makes extensive use of the following HTTP features: HTTP headers (including Content-Type), POST,
and HTTP return codes (2nn for success, 3nn for redirection, 4nn for client errors, and 5nn for
server errors). See http://www.w3.org/Protocols/Specs.html for further details.
The HTTP protocol specifies the format of the request and the response:
- The first line
- Zero or more header lines
- A Carriage Return-Linefeed (CRLF) by itself
- An optional body
The following code snippet shows an example of an HTTP request.
GET /Authors/soap.html HTTP/1.1
Host: www.wrox.com
Content-Type: text/html; charset=utf-8
Content-Length: 0
The first line of the HTTP header carries the verb (more on that later), the path (URL portion after the
host name), and the version of HTTP that the client understands (if it is a request). The Content-Type
defines the Multipurpose Internet Mail Extensions (MIME) type of the request. The type of the
previous request is text/html, which is used for most web pages. It simply specifies that the data being
transmitted is text and that the text is an HTML document.
As a SOAP developer, you will mostly deal with text/xml: the data is text and contains an XML
document. As its name suggests, the Content-Length defines the number of bytes in the
request. In this particular case the content length is 0, which is typical for a web page request that does
not need to submit any data to the server.
The Multipurpose Internet Mail Extensions (MIME) extend the format of Internet
mail to allow non-ASCII information to be transmitted in e-mail headers and messages. As usual with Internet standards, MIME is used for many more
applications than its intended target – for instance, you can use MIME to add non-textual information (such as JPEG pictures) to SOAP packets.
See RFC 1521 at http:// www. cis. ohio-state. edu/ cgi-bin/ rfc/ rfc1521. html for more details.
If all goes well, the server response will start with HTTP/1.0 200 OK, where 1.0 is replaced by
the version of HTTP that is supported by the server. This is followed by the content of the
requested resource.
The following HTTP response contains a simple HTML document.
HTTP/1.1. 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 25
<html> Hello World!</ html>
Note that the example in the previous HTTP response contains the encoding of the document. The
value of Content-Length is in bytes (after the empty CRLF). There are other header fields such as Date
and Expiration.
The Unicode Transformation Format (UTF) is an algorithmic mapping from a UNICODE character to
a unique sequence of bytes (one to four bytes). There are actually seven different forms of encoding for
UNICODE characters (UTF-16, UTF-32, etc.). The major advantage of UTF-8 is that it is compact and
therefore conserves precious network bandwidth. See http://www.unicode.org for more information.
However, things do not always go smoothly. To handle the more difficult cases, HTTP defines ranges of
return codes:
1nn
This status is informational. It is typically sent by the server to indicate some kind of status.
For instance, 100 means that the server is willing to accept the request and the client may
proceed with the rest of the request.
2nn
The request succeeded – 200 means the request was OK.
3nn
The request has been moved. This response is accompanied with the new URL, telling the
user where to get the data. This is not really an error, but an indication that the document
should be retrieved from an alternate location.
4nn
The request submitted by the client is in error. For instance, 401 means that the client does
not have access to the resource and 404 means that the resource does not exist (presumably,
the client requested the wrong URL).
5nn
The server is in error. This is usually a sign that something went wrong on the server side. For
instance, as a SOAP developer, you will typically encounter a 500 error when an uncaught exception is thrown by a service.
In the code snippet below, the response indicates that the requested URL cannot be found.
HTTP/1.1 404 Object Not Found
Server: Microsoft-IIS/5.0
Date: Wed, 12 Sep 2001 23:57:41 GMT
Connection: close
Content-Length: 3252
Content-Type: text/html
We can also see more header entries in that figure. Their meaning is obvious, except for the Connection: close
that signifies that the server explicitly requests that the client close the HTTP
connection (a browser, most of the time). HTTP is a stateless protocol; unless otherwise instructed by
the server, the connection is closed once the request has been satisfied. Web browsers will keep a connection open as long as they are displaying a page for efficiency reasons, since a page is usually
made of multiple resources (frames, bitmaps, etc.).
Most of the time, the HTTP protocol uses TCP/ IP sockets to handle the connection between the client
and the server. In TCP/ IP sockets, the client and the server agree on a port number to use to start the
connection. Different protocols based on TCP/ IP use different port numbers. The standard port for HTTP is port 80, although any port can be used, if the client and the server agree on an alternate
port number.
As we discussed earlier, HTTP is used to retrieve any data (resource) from a server. The resource can be
a text file as we saw earlier, but it can also be a binary file, or a remote executable. Resources are
identified by Universal Resource Locators (URLs), which define not only where to get the resource, but
also how to get it. A URL starts with the protocol that is used to retrieve the data. For instance, ftp:// indicates that the data can be retrieved using the File Transfer Protocol (FTP).
With HTTP, a URL typically looks like the following:
http:// server-name: port-number/ file-path
The port number is assumed to be 80, when it is not present. Arguments can be added to the URL if
needed. For instance, the following URLs are valid for HTTP:
http:// www. wrox. com
http:// myserver: 1234/ mystuff
http:// myserver: 1234/ mysservlet? value= private
To allow the client to have a meaningful dialog with the server, HTTP defines a set of methods for
requesting information from a server. The principal methods are:
-
GET
This method is typically used to retrieve a file, or trigger the execution of some code on the server. The arguments, if any, are part of the URL requested. GET is not safe when used with
HTTPS (secure HTTP) since only the header and the body are encrypted, and not the URL.
In other words, even if you are using HTTPS, everyone will be able to see the entire URL. If
you look at the example above, this means that value= private would be in the open and
therefore not very private.
-
POST
This method is similar to GET but the arguments for the requests are included as part of the
body of the request. POST is safe when used with HTTPS.
-
PUT
This allows a client to upload a file to the server.
-
DELETE
This allows a client to delete a file from the server. Most installations do not support PUT
and DELETE because of the inherent risk in these verbs. PUT and DELETE do not play a role
in SOAP.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.