Years ago the WWW was made up of (mostly) static documents. Each URL corresponded to a single file stored on some hard disk. HTML documents were created manually using simple text editors. Global network communication was constantly interrupted as the shifting of ice-age glaciers tore communication lines apart...
Today many (in fact most) of the documents on the WWW are built at request time. This means that the documents do not physially exist as a single file on some hard disk, but are instead created whenever requested. Typically much of the content comes from files, but the actual document is nothing more than the output of a program (that gets pieces of the document from various files and glues them together). A URL does not typically correspond to a single file.
When a program is responsible for creating a document on the fly - we call this a dynamic document. Just about any web document that contains an advertisement is probably a dynamic document - the actual advertisement placed in the document is determined when you request the document (sometimes in a random fashion - sometimes in a very non-random fashion). Any document that depends on user action for some of the document content (for example a document that shows you your current shopping cart) is a dynamic document.
Writing programs that create dynamic documents has become very important, we will spend much of this course working on this skill. Although there are a variety of approaches/languages, the basic concepts are the same so we will focus on these basic concepts. Below are listed a few of the popular mechanism used for web programming:
Requires writing a standalone program that accepts
network connections, parses HTTP requests and can
generate HTTP replies (and construct the content
of the reply).
Uses the web server to take care of the
networking and handling HTTP. The idea is to have
the external program deal only with construction
of the content of the reply.
CGI is set up this way - the idea is to define a common interface between the web server and the external programs so that the same programs can be used with any server.
The web server includes software that can understand (interpret/execute) some programming language. Now each URL can reference a program written in a language the server understands. Typically the programming is placed in a document that also includes much of the HTML that will be produced.
There a many examples of this today, including Server Side Includes (SSI), Server-side Javascript, Active Server Pages (ASP), Java Server Pages (JSP) and PHP.
Writing a custom server is outside the scope of this course, but it's worth considering what we would need to do. Typically a custom server is written using a library to provide access to TCP/IP services - sockets is probably the most often used library for this purpose. Assuming we know how to use the sockets library to deal with the network, we would need to write the following code:
Write a TCP server that watches a well known port for requests.
Develop a mapping from HTTP requests to
service requests. In other words, some way to translate
HTTP requests in to requests for our particular service.
Remember that no matter what type of service we want to provide - the
only way to get requests is via HTTP, so all service
requests must come in the format supported by HTTP. If we
don't provide this software we won't be able to talk to
browsers!
Send back HTML (or whatever) that is created/selected by the
server process. This is where we create a dynamic document and send
the document back as part of an HTTP response (we send this back
using the sockets library to take care of the networking).
Handle HTTP errors, headers, etc. We need to
handle unpredictable situations without crashing or messing up the
clients
We want to provide a time and date service, so that anyone in the
world can find out the date and time (according to our computer). Our
program will talk HTTP so that browsers can userstand the
reply - this reply will include an HTML document that contains the
current date and time.
NOTE: WWW based Time and Date service is Copyright DLH Enterprises,
1999,2000!
(This is my future dot com, keep your hands off!)
Since the general idea is that anyone in the world can find out the
date and time, we don't care what is in the HTTP request -
our reply doesn't depend on it.
We will make the assumption that whatever makes the request
can deal with HTML, so we just generate a valid
HTML document and send it back. We can now provide a somewhat
detailed description of the program:
HTTP response line followed by
some HTTP headers (Content-Type is an important one!)We can publish the URL to our server, or embed links to the server
in other HTML documents. Since our server is not really an HTTP
server, we will probably not run the server on port 80, but instead
pick some other port number (URLs can include the port number of the
server). We need to make sure the server is always running (on the
published host and port). For example, if our server is running on
the host www.timedate.com on port 180, a
URL pointing to our server would look like this:
http://www.timedate.com:180/
We can extend our service by having out custom server keep a database of hits. We keep track of the number of times our server is accessed each day, this simple database will allow us to provide hit reports to folks who visit our site.
Since we now want to provide hit reports for any arbitrary
date in the past, there must be some mechanism for having
the user (browser) specify the desired date. Remember
that the only requests that the browser can create are HTTP
requests, so we must make sure that the specification of the date
comes as a valid HTTP request.
HTTP Refresher:
Each request line includes a method (typically
the string "GET"), followed by a URI. a
URI starts with a "/" and is followed
by some words/namesand can have more "/"s seperating
components of the URI
If we embed a hyperlink (using the <A> tag) in
a document that will take the user to a hit report for a specific date,
the browser will use GET as the request method - we can't
change this. We can, however, specify any URI we want - all we need is
some scheme that allows us to easily translate the URI
in to a date so we can look up the hits for that day in the database.
The following format for URIs will do:
/mm/dd/yyyy
In such a URI we expect 2 digits specifying the month, another 2 for the day and 4 for the year (Y2K compliant!). An example URL for our service:
http://www.timedate.com:180/09/25/2000
We will expect an HTTP request that looks like this:
|
We should figure out what our homepage will look like, since now we will need to show the current time and date, but also include some hyperlinks to allow the user to easily view the hit report for recent days. Below is one possible example of such a homepage - remember that this document needs to be built dynamically to include the current date and time.
|
|
We want our hit report to provide a table that lists the number of hits received each hour of the day in question. Here is an example of what our program should generate:
|
|
There are a number of problems with using custom servers for web programming, the central issue is the duplication of code/resources necessary if we have lots of custom services.
Take a general purpose Web server (that can handle static documents) and have it process requested documents as it sends them to the client. The documents could contain commands that the server understands (the server includes some kind of interpreter).
Have the server read each HTML file as it sends it to the client.
The server could look for a tag named SERVERCODE and
treat the stuff between the start and end tags as special
instructions. If the server sees something like this:
|
The server doesn't send the stuff in the SERVERCODE
tag to the client, instead it interprets the command and sends the
result to the client. Everything else is sent normally (without
modification).
Here are some possible commands that the server might recognize and a description of what the server would generate in place of the commands.
SERVERCODE command |
What the command does |
|---|---|
Time |
The server replaces this command with the current time |
Date |
The server replaces this command with the date |
Hitlist |
The server replaces this command with a hit report |
Include file |
The server replaces this command with the contents of a file |
Randomfile directory |
The server replaces this command with contents of one of the files in the named directory (folder). |
If we had a server that could understand these commands then
the home page for timedate.com might look like this:
|
Many real web servers support this idea (but not the syntax I've shown). Server Side Includes (SSI) defines a set of commands that a server will interpret when sending an HTML document to a client. Typically the server is configured to look for commands only in specially marked documents (so normal documents aren't slowed down).
The commands supported by SSI are called SSI directives.
Each directive is embedded inside an HTML comment. Comments are
regions within an HTML document between a . Here is an example HTML document containing
a comment:
|
An SSI directive looks like this:
<!--#command parameter=“arg”-->
As you can see, an SSI directive is a special kind of HTML comment. This helps in those situations when for some reason the server does not interpret the directive but instead sends it to the browser unchanged. The browser won't display the directive since it's inside an HTML comment. If you were to load an HTML document directly into your browser instead of going through a server, any SSI directives in the document will not be processed!
Not all HTTP servers have the capability of interpreting SSI directives, although most of the popular servers can be configured to do so (Apache, Microsoft and Netscape servers do handle SSI).
Listed below are some of the SSI directives. Some of these directives can be quite dangerous (in terms of security) and are not supported by all servers.
Some SSI Directives
| ||||||||
| ||||||||
| ||||||||
|
|
Some servers support elaborate scripting languages. Scripts are embedded in HTML documents, the server interprets the script while sending the document to the client. Below are listed some of the more popular scripting languages:
Microsoft Active Server Pages (ASP)
Netscape LiveWire
There are others...
Some servers include a programming interface that allows us to extend the capabilities of the server by writing modules. Specific URLs are mapped to specific modules instead of to files.
Example: We could write our timedate.com server as a module and merge it with the web server
Another approach is to provide a standard interface between external programs and web servers. The general idea is that by specifying a standard interface, we can develop external programs that can be run from any web server. The web server handles all the http, we focus on the special service only.
Advantages to using external programs:
It doesn't matter what language we use to write the external program.
It doesn't matter what kind of machine we use to develop the external program (or what specific web server we use to test the external program).
CGI is a standard interface to external programs supported by most (if not all) web servers. The interface that is defined by CGI includes:
Identification of the service (external program).
How the browser can specify what external program should be run.
A mechanism for passing the request to the external program.
This is a necessary part of the standard, since otherwise we would need to rewrite each external program to work with each web server.
How the reply gets from the external program to the client
CGI programs are often written in scripting languages (perl, tcl), although you can write a CGI program in any language. We will concentrate on Perl, but the concepts we will study apply when using any language.
Using perl has some advantages over some other languages (for writing CGI programs):