CSCI.4220 Network Programming
Class 12, Thursday, March 3, 2005
CGI and Cookies

CGI (Common Gateway Interface)

HTTP was originally designed for ``static documents''

The GET command does this.

Many (most) documents are now interactive (dynamic content).

Client sends info to the server, server tailors a document to return to the client. (google, mapquest) The simplest way for the client to send data to a server is with the form tag of html

<form action=  method=>
First name: 
<input type=''text'' name=''firstname''>

<input type=''radio'' name=''sex'' value=''male''> Male
<input type=''radio'' name=''sex'' value=''female''> Female
The value of action has to be url of a script. A cgi script can be written in any language; these days cgi scripts are mostly written in Perl

If the value of method is POST, the data from the form are sent after the http headers, and the cgi script reads the data from standard input. If the value of method is GET, the input is appended to the GET line and the cgi script will read it as the value of the environment variable QUERY_STRING

The input consists of a single string with no white spaces. There is some encoding. Spaces are converted to pluses

Standard output is redirected back to the browser, so your script has to write the appropriate http and html

At least one HTTP header field is required

Content-type: text/html

When the user submits the form, your script receives the form data as a set of name-value pairs. The names are what you defined in the INPUT tags (or SELECT or TEXTAREA tags), and the values are whatever the user typed in or selected.

This set of name-value pairs is given to you as one long string, which you need to parse. It's not very complicated, and there are plenty of existing routines to do it for you.

Here is a form. You can see how it is displayed and look at the source code. You should also submit the data to see what is returned.

Here is the C source code for generic.cgi

Cookies

HTTP is a stateless protocol, but vendors need to keep some sort of state information (i.e. shopping cart info). Most major sites use cookies.

Cookie technology has four components

  1. A cookie header line in the HTTP response messge
  2. A cookie header line in the HTTP request message
  3. A cookie file kept on the end user's system and managed by the user's browser
  4. A backend database at the server

IP addresses don't work as well because of NAT, DHCP, etc.

A user contacts a commercial web site for the first time. The web site creates a unique id for her and creates an entry in the database. In its initial reply, it has a header

Set-cookie: ID=123456

Her brower creates a new line in the client cookie database

Each subsequent request to the same site contains this header

Cookie: ID=123456

If the user returns to the site a week later, the brower will continue to send the Cookie header. This allows the server to recommend merchandise or provide one-click shopping or otherwise customize the window,

Can be used by portal designers to see how many visitors go to which pages, and from where they come from Distinct visitors vs simply hits

A cookie can contain up to five fields

Domain (www.yahoo.com)
Path (/)
Content (UserID=123456;team=jets)
Expires (28-2-05 23:59)
Secure (yes or no)
If the expires field is absent, the cookie expires when the browser exits (a non-persistent cookie)

The shopping cart info can be stored in the cookie itself (a list of things bot).

The problem with cookies

Cookies cannot contain viruses, they cannot erase stuff on your hard drive, they cannot read files on your hard drive. However, they are stored and used without the user's consent or knowledge

However, cookies allow sites to track not only their own users, but also visitors to other sites. The technology for this is sometimes called a Web Beacon or a tracking bug. The company which is best known for this is DoubleClick.

Other commercial sites are doubleclick enabled. When you go to a site, there is an invisible image which sends a request to DoubleClick with the doubleclick cookie. Now doubleclick is able to track your web browsing on many sites, and this information about your shopping preferences is shared among doubleclick customers. All of this is invisible to the user.

If you and I log into the same web site and we have never been there before, we might see different ads. I could see cameras, you could see sports paraphenalia.