EIW Fall 2000 Lecture Notes - Dynamic Documents


In the good old days...

Years ago the WWW was made up of (mostly) static documents. Each URL corresponded to a single file stored on some hard disk. HTML documents were created manually using simple text editors. Global network communication was constantly interrupted as the shifting of ice-age glaciers tore communication lines apart...

Today many (in fact most) of the documents on the WWW are built at request time. This means that the documents do not physially exist as a single file on some hard disk, but are instead created whenever requested. Typically much of the content comes from files, but the actual document is nothing more than the output of a program (that gets pieces of the document from various files and glues them together). A URL does not typically correspond to a single file.

When a program is responsible for creating a document on the fly - we call this a dynamic document. Just about any web document that contains an advertisement is probably a dynamic document - the actual advertisement placed in the document is determined when you request the document (sometimes in a random fashion - sometimes in a very non-random fashion). Any document that depends on user action for some of the document content (for example a document that shows you your current shopping cart) is a dynamic document.

Dynamic Documents

Dynamic Documents can provide:
Automation of web site maintenance
All documents in a web site are delivered through a filter that can add in a common header and footer or menu.
Customized advertising
A web site can track your usage and attempt to match advertisements to your interests.
Database access
The web can be used as an interface to a database retrieval system, so users can look up records from anywhere there is a browser.
Shopping carts
Users can store some state information on the server - this information can then be used to provide shopping cart functionality (typically by automating the process of extracting records from some server database).
Date and time service
Users can find out the current date and time!!!
High paying jobs for IT students.
no explanation needed

Web Programming

Writing programs that create dynamic documents has become very important, we will spend much of this course working on this skill. Although there are a variety of approaches/languages, the basic concepts are the same so we will focus on these basic concepts. Below are listed a few of the popular mechanism used for web programming:

Writing a Custom Server

Writing a custom server is outside the scope of this course, but it's worth considering what we would need to do. Typically a custom server is written using a library to provide access to TCP/IP services - sockets is probably the most often used library for this purpose. Assuming we know how to use the sockets library to deal with the network, we would need to write the following code:

An Example Custom Server

We want to provide a time and date service, so that anyone in the world can find out the date and time (according to our computer). Our program will talk HTTP so that browsers can userstand the reply - this reply will include an HTML document that contains the current date and time.

NOTE: WWW based Time and Date service is Copyright DLH Enterprises, 1999,2000!
(This is my future dot com, keep your hands off!)

Since the general idea is that anyone in the world can find out the date and time, we don't care what is in the HTTP request - our reply doesn't depend on it.

We will make the assumption that whatever makes the request can deal with HTML, so we just generate a valid HTML document and send it back. We can now provide a somewhat detailed description of the program:

  1. Listen on a well known TCP port.
  2. Accept a connection.
  3. Find out the current time and date
  4. Convert time and date to a string
  5. Send back a valid HTTP response line followed by some HTTP headers (Content-Type is an important one!)
  6. Send the time/date string wrapped in HTML formatting.
  7. Close the connection.

Accessing our custom server.

We can publish the URL to our server, or embed links to the server in other HTML documents. Since our server is not really an HTTP server, we will probably not run the server on port 80, but instead pick some other port number (URLs can include the port number of the server). We need to make sure the server is always running (on the published host and port). For example, if our server is running on the host www.timedate.com on port 180, a URL pointing to our server would look like this:

http://www.timedate.com:180/

Once we are famous we can include advertisements and make money!


Another Money Making Scheme Example

We can extend our service by having out custom server keep a database of hits. We keep track of the number of times our server is accessed each day, this simple database will allow us to provide hit reports to folks who visit our site.

Since we now want to provide hit reports for any arbitrary date in the past, there must be some mechanism for having the user (browser) specify the desired date. Remember that the only requests that the browser can create are HTTP requests, so we must make sure that the specification of the date comes as a valid HTTP request.

HTTP Refresher: Each request line includes a method (typically the string "GET"), followed by a URI. a URI starts with a "/" and is followed by some words/namesand can have more "/"s seperating components of the URI

If we embed a hyperlink (using the <A> tag) in a document that will take the user to a hit report for a specific date, the browser will use GET as the request method - we can't change this. We can, however, specify any URI we want - all we need is some scheme that allows us to easily translate the URI in to a date so we can look up the hits for that day in the database. The following format for URIs will do:

/mm/dd/yyyy

In such a URI we expect 2 digits specifying the month, another 2 for the day and 4 for the year (Y2K compliant!). An example URL for our service:

http://www.timedate.com:180/09/25/2000

We will expect an HTTP request that looks like this:

GET /01/17/1999 HTTP/1.1
User-Agent: Netscape 4.7
other headers...

The Time/Date Homepage

We should figure out what our homepage will look like, since now we will need to show the current time and date, but also include some hyperlinks to allow the user to easily view the hit report for recent days. Below is one possible example of such a homepage - remember that this document needs to be built dynamically to include the current date and time.

<H2 style="text-align:center; color:black">TimeDate.com</H2>
<HR>

<DIV style="text-align:center;font-size:28pt;font-weight:bold;color:black">
<P>The current time is 
<SPAN style="font-family:monospace;font-size:32pt;color:red;">
02:33:14 PM
</SPAN></P>

<P>Today is 
<SPAN style="font-family:monospace;font-size:32pt;color:red">
September 25, 2000
</SPAN>
</DIV>
</DIV>

<HR>
Check out our hits!:

<UL>
<LI><A HREF=/09/25/2000>Today</A>
<LI><A HREF=/09/24/2000>Yesterday</A>
<LI><A HREF=/09/23/2000>The day before yesterday</A>
</UL>

TimeDate.com


The current time is 02:33:14 PM

Today is September 25, 2000


Check out our hits!:

Fancy means $$$

We want our hit report to provide a table that lists the number of hits received each hour of the day in question. Here is an example of what our program should generate:

<H2 style='text-align:center'>
timedate.com hit report for 09/24/2000
</h2>

<TABLE BORDER=2>
 <TR>
   <TH>Hour</TH>
   <TH># of Hits</TH>
 </TR>

 <TR>
   <TD>12-1AM</TD>
   <TD>8132</TD>
 </TR>

 <TR>
   <TD>1-2AM</TD>
   <TD>4873</TD>
 </TR>

 <TR>
   <TD>2-3AM</TD>
   <TD>17</TD>
 </TR>

   ... and the rest of the hours ...
 </TABLE>

timedate.com hit report for 09/24/2000

Hour # of Hits
12-1AM 8132
1-2AM 4873
2-3AM 17

New Custom Server Code Required:

Drawbacks to Custom Server Approach

There are a number of problems with using custom servers for web programming, the central issue is the duplication of code/resources necessary if we have lots of custom services.

Another Approach - Smart Server

Take a general purpose Web server (that can handle static documents) and have it process requested documents as it sends them to the client. The documents could contain commands that the server understands (the server includes some kind of interpreter).

Example Smart Server

Have the server read each HTML file as it sends it to the client. The server could look for a tag named SERVERCODE and treat the stuff between the start and end tags as special instructions. If the server sees something like this:

blah blah blah, blah.
<SERVERCODE>some command</SERVERCODE>
blah, blah blah.

The server doesn't send the stuff in the SERVERCODE tag to the client, instead it interprets the command and sends the result to the client. Everything else is sent normally (without modification).

Here are some possible commands that the server might recognize and a description of what the server would generate in place of the commands.

SERVERCODE command What the command does
Time The server replaces this command with the current time
Date The server replaces this command with the date
Hitlist The server replaces this command with a hit report
Include file The server replaces this command with the contents of a file
Randomfile directory The server replaces this command with contents of one of the files in the named directory (folder).

If we had a server that could understand these commands then the home page for timedate.com might look like this:

<H1 ALIGN=CENTER>Welcome to timedate.com</H1>
<SERVERCODE> Include fancygraphic </SERVERCODE>

<P>The current time is :
<SERVERCODE> Time </SERVERCODE>.</P>

<P>Today is <SERVERCODE> Date </SERVERCODE>.</P>

Visit our sponser: <SERVERCODE> Random sponsors </SERVERCODE> </P>


Real Life Smart Servers - Server Side Includes

Many real web servers support this idea (but not the syntax I've shown). Server Side Includes (SSI) defines a set of commands that a server will interpret when sending an HTML document to a client. Typically the server is configured to look for commands only in specially marked documents (so normal documents aren't slowed down).

The commands supported by SSI are called SSI directives. Each directive is embedded inside an HTML comment. Comments are regions within an HTML document between a . Here is an example HTML document containing a comment:

<P>This is a normal HTML paragraph that will be displayed by
a browser.</P>

<!-- this is an HTML comment -->

<P>Comments don't need to be on a single line</P>

<!-- I
am
a 
comment
-->

An SSI directive looks like this:

<!--#command parameter=“arg”-->

As you can see, an SSI directive is a special kind of HTML comment. This helps in those situations when for some reason the server does not interpret the directive but instead sends it to the browser unchanged. The browser won't display the directive since it's inside an HTML comment. If you were to load an HTML document directly into your browser instead of going through a server, any SSI directives in the document will not be processed!

Not all HTTP servers have the capability of interpreting SSI directives, although most of the popular servers can be configured to do so (Apache, Microsoft and Netscape servers do handle SSI).

Listed below are some of the SSI directives. Some of these directives can be quite dangerous (in terms of security) and are not supported by all servers.

Some SSI Directives

Directive: echo
Parameters: var
What is does: inserts the value of an environment variable into the page. SSI servers keep a number of useful things in environment variables, including DOCUMENT_NAME and DOCUMENT_URL.
Sample Usage: This page is located at <!--#echo var="DOCUMENT_URL"-->
Directive: include
Parameters: file
What is does: inserts the contents of a file.
Sample Usage: <!--#include file="banner.html">
Directive: flastmod
Parameters: file
What is does: inserts the time and date that a file was last modified.
Sample Usage: Last modified <!--#flastmod file="foo.html">
Directive: exec
Parameters: cmd
What is does: runs an external program and inserts the output of the program.
Sample Usage: Current users: <!--#exec cmd="/usr/bin/who">

SSI Example document

<!--#INCLUDE FILE="header"-->

It is now:
<!--#config timefmt="%I:%M 0 (%Z)"-->
<!--#echo var="DATE_LOCAL"-->
<BR>
Today is: 
<!--#config timefmt="%A, %B 0.000000e+00, %Y"-->
<!--#echo var="DATE_LOCAL"--><BR>

<!--#INCLUDE FILE="footer"-->

<!--#config timefmt="0"-->
This file last modified
<!--#echo var="LAST_MODIFIED"-->

More Power

Some servers support elaborate scripting languages. Scripts are embedded in HTML documents, the server interprets the script while sending the document to the client. Below are listed some of the more popular scripting languages:

Server Mapping and APIs

Some servers include a programming interface that allows us to extend the capabilities of the server by writing modules. Specific URLs are mapped to specific modules instead of to files.

Example: We could write our timedate.com server as a module and merge it with the web server

External Programs

Another approach is to provide a standard interface between external programs and web servers. The general idea is that by specifying a standard interface, we can develop external programs that can be run from any web server. The web server handles all the http, we focus on the special service only.

Advantages to using external programs:

CGI: Common Gateway Interface

CGI is a standard interface to external programs supported by most (if not all) web servers. The interface that is defined by CGI includes:

CGI Programming

CGI programs are often written in scripting languages (perl, tcl), although you can write a CGI program in any language. We will concentrate on Perl, but the concepts we will study apply when using any language.

Using perl has some advantages over some other languages (for writing CGI programs):