EIW Fall 2000 Lecture Notes - HTTP Cookies


Motivation

Many web-based systems provide the illusion that user has "entered the system" as if there was a single program running that responds to each mouse click. In reality we know that the user is simply clicking on embedded links or submitting forms, each results in the browser sending a request to a web server which starts a new copy of a CGI program.

We've looked at a couple of ways to provide this illusion - the basic idea is to include a hidden field in forms (or specified in an embedded link) that are created when the user submits a login form. This hidden field can be as simple as the user's name, or something more sophisticated like a session key.

There are limitations to the above mentioned approach:

It would make things much easier if there was some part of every HTTP request that identifies the user. We could consider asking the WWW consortium change the HTTP protocol so that a user name is sent with every HTTP request (but they would certainly say NO)! Even if we could simply change the HTTP protocol so that every browser is required to send a username along with each request - this would not work since it would be easy to pretend to be someone else (all you need to know is their username).

A more general solution is the following:

Now our CGI program is in control of the exact nature of the string - it could be a username or something less predictable like a session key.

Additionally we can tell the browser that it should save the string so that even if the user's computer is turned off the browser can remember what string it should send to our CGI system.

HTTP Cookies

HTTP Cookies are basically the idea mentioned above - a web based system (perhaps a CGI program) can ask the browser to remember some string and send it along with future requests.

The string is called (I love saying this) a Cookie! The name "cookie" has some history behind it - the name has been around before the WWW and implies something like "a chunk of stuff that means nothing to the client, but is required to complete a transaction". Other protocols have used things named cookies to provide secure transactions...

The cookies are transferred from a CGI program to the client as part of the HTTP response headers. Specifically, to tell a browser to save a cookie the server would send back a Set-Cookie header. Each Set-Cookie header includes a cookie name and value (there can actually be many cookies sent at once - each has a different name). Here is the simplest form of a Set-Cookie HTTP header line:

Set-Cookie: CookieName=CookieValue

For example, if we want to tell the browser to send a cookie named UserName and the name it has sent us is Fred, the Set-Cookie header would look like this:

Set-Cookie: UserName=Fred

Once a CGI program sends a Set-Cookie header to the client, the client will (if cookies are enabled) always send the name and value back to our CGI program as part of the HTTP request headers. The HTTP Cookie request header looks like this:

Cookie: CookieName=CookieValue

We haven't yet discussed how our CGI programs can create Set-Cookie headers, or how to get at the HTTP request headers - we will get to that in a bit.

Cookie Options and Rules

There is one important general rule that browsers are supposed to follow when deciding whether or not to send a cookie along as part of an HTTP request:

Cookies should only be sent to the server they came from.

As we will see - this is actually an oversimplified stating of the rules that govern cookies, but it is worth keeping in mind. Let's quickly review what we now know about cookies:

There are a number of options that can control how a browser uses a cookie - these options are specified as part of the Set-Cookie HTTP response header. These options include:

A Set-Cookie header can include multiple options, a semicolon is used to seperate things. For example we could have the following:

Set-Cookie: Prefs=NoImages; path=/cgi-bin; expires=Monday 31-Dec-2003 00:00:00 GMT; domain=.altavista.com

Common Cookie Usage (aside from eating)

Cookies are commonly used in the following applications:

Cookies and Privacy

As we have seen, cookies can't do any of the following: However, there are some issues associated with the use of cookies that are worth thinking about:

CGI Programming and Cookies

Creating a Cookie

Creation of a cookie by a CGI program is simple - the CGI simply prints out the Set-Cookie header before the Content-type header as the first part of any response. For example, the following CGI program will create two cookies on the client, one named Color with the value Red and another cookie named BeenHere with the value YES:

#!/perl/bin/perl
#

# first send the set-cookie headers
print "Set-Cookie: Color=Red\n";
print "Set-Cookie: BeenHere=YES\n";

# now the content-type header and the blank line that
# ends the header section of the response.
print "Content-type: text/html\n\n";

# now send back some document content

print "<H2>I just gave you some cookies!</H2>\n";

NOTE: The Set-Cookie header(s) must come before the end of the headers (before the first blank line in the response)!

Getting Cookie Values

When a CGI program runs, we want to find out if any cookies were sent. It is possible that multiple cookies have been sent - we would like to get a list of the name/value pairs just like we get from the query string.

Part of the CGI protocol between the web server and the CGI program is that a string containing all cookie name/value pairs is put in the environment variable named HTTP_COOKIE. This string has the form:

name1=value1; name2=value2; name3=value3 ...
The web server does part of the work - it combines all cookie headers lines that come as part of the request and puts all the name value pairs in to an environment variable. Parsing this string and extracting the name/value pairs is similar to what we did with the query string - here is some code that will create an associative array containing all the cookie name/value pairs:

# GetCookies subroutine
#
# This subroutine gets all cookies sent with the request and
# puts them in to an associate array. 
#
# Example usage:
#
#  %cookies = GetCookies();

sub GetCookies {
    local($cookies) = $ENV{"HTTP_COOKIE"};
    local(@pairs);
    local($name,$value);
    local($i);
    local(%cookies);

    # The cookie string is a sequence of "name=value;name=value;..."
    # Split on the semicolons

    @pairs = split(";\s+",$cookies);
    foreach $i (@pairs) {
	# split on "=" 
	($name,$value) = split("=",$i);
	# add to the associative array
	$cookies{$name} = $value;
    }
    return(%cookies);
}

Using this subroutine is just like the GetQuery subroutine, here is an example CGI program that prints out all the cookies received (as an HTML table):

#!/perl/bin/perl

require "eiw-cgi.pl";

%cookies = GetCookies();

http_header();

print "<H2>Here are the cookies received</H2>\n";

print "<TABLE BORDER=1><TR><TH>Cookie Name</TH><TH>Value</TH></TR>\n";

foreach $i (keys %cookies) {
	print "<TR><TD>$i</TD><TD>$cookies{$i}</TD></TR>\n";
}

print "</TABLE>\n";

Looking at the cookies stored by your browser

You can find out what cookies your browser has stored by looking at: Unless an expires cookie option specifies an expiration date the browser will just store cookies in memory - so they won't show up in the above files (as far as I know).