![]() |
EIW Fall 2000 Lecture Notes - HTTP Cookies |
![]() |
Many web-based systems provide the illusion that user has "entered the system" as if there was a single program running that responds to each mouse click. In reality we know that the user is simply clicking on embedded links or submitting forms, each results in the browser sending a request to a web server which starts a new copy of a CGI program.
We've looked at a couple of ways to provide this illusion - the basic idea is to include a hidden field in forms (or specified in an embedded link) that are created when the user submits a login form. This hidden field can be as simple as the user's name, or something more sophisticated like a session key.
There are limitations to the above mentioned approach:
It would make things much easier if there was some part of every HTTP request that identifies the user. We could consider asking the WWW consortium change the HTTP protocol so that a user name is sent with every HTTP request (but they would certainly say NO)! Even if we could simply change the HTTP protocol so that every browser is required to send a username along with each request - this would not work since it would be easy to pretend to be someone else (all you need to know is their username).
A more general solution is the following:
Now our CGI program is in control of the exact nature of the string - it could be a username or something less predictable like a session key.
Additionally we can tell the browser that it should save the string so that even if the user's computer is turned off the browser can remember what string it should send to our CGI system.
HTTP Cookies are basically the idea mentioned above - a web based system (perhaps a CGI program) can ask the browser to remember some string and send it along with future requests.
The string is called (I love saying this) a Cookie! The name "cookie" has some history behind it - the name has been around before the WWW and implies something like "a chunk of stuff that means nothing to the client, but is required to complete a transaction". Other protocols have used things named cookies to provide secure transactions...
The cookies are transferred from a CGI program to the client as part
of the HTTP response headers. Specifically, to tell a browser to save
a cookie the server would send back a Set-Cookie
header. Each Set-Cookie header includes a cookie name and
value (there can actually be many cookies sent at once - each has a
different name). Here is the simplest form of a
Set-Cookie HTTP header line:
Set-Cookie: CookieName=CookieValue
For example, if we want to tell the browser to send a cookie named
UserName and the name it has sent us is Fred,
the Set-Cookie header would look like this:
Set-Cookie: UserName=Fred
Once a CGI program sends a Set-Cookie header to the client,
the client will (if cookies are enabled) always send the name and value
back to our CGI program as part of the HTTP request headers. The HTTP
Cookie request header looks like this:
Cookie: CookieName=CookieValue
We haven't yet discussed how our CGI programs can create Set-Cookie headers, or how to get at the HTTP request headers - we will get to that
in a bit.
There is one important general rule that browsers are supposed to follow when deciding whether or not to send a cookie along as part of an HTTP request:
As we will see - this is actually an oversimplified stating of the rules that govern cookies, but it is worth keeping in mind. Let's quickly review what we now know about cookies:
There are a number of options that can control how a browser uses a
cookie - these options are specified as part of the
Set-Cookie HTTP response header. These options include:
expires
option is specified the default lifetime is "until the browser is
closed". This means that by default a cookie lives as long as the
current browser "session". Once you close Netscape or IE the cookie will
be gone.
The expires option includes a date and time in the following form:
Weekday, Day-Month-Year Hour:Minute:Second GMT
Here is an example of a Set-Cookie response header
that includes an the expires option:
Set-Cookie: Homework=Messageboard; expires=Friday 09-Nov-1999 00:00:00 GMT
The format for the date and time is very specific - it must be in the above form!
If the expiration date specified in a Set-Cookie header is
in the past - the result is that the cookie expires on the client and
is deleted. A server can delete a cookie by sending a new Set-Cookie header with the cookie name and an expiration date in the past.
path option can be used to specify the set of
URIs to which the cookie should be sent. URIs are treated as a path
containing a number of directories and a file name, the path
option allows the server to specify that any URI within a path should
receive the cookie.
The path / is the most general path - so it can be used to
tell the browser to send the cookie along as part of any request to the
server host. Here is an example that tells the browser to send the cookie
named "Preferences" with value "NoFrames" to any URI on this server:
Set-Cookie: Preferences=NoFrames; path=/
For the following example assume that the CGI program that creates the
cookie has the URI: /cgi-bin/pizza/pizza.cgi. This cookie
will be sent to any URI that is in the pizza directory:
Set-Cookie: Pizza=CookieCrumb; path=/cgi-bin/pizza
Note that the path option can only be used to set a path that contains the current URI, so the pizza program could not set the path to this:
Set-Cookie: Pizza=CookieCrumb; path=/cgi-bin/messageboard
If this was actually sent the browser would ignore the path option (it won't let the CGI specify a path that does not contain the CGI itself).
If there is no path option, the client will send the cookie
to any URI in the same directory as the CGI that created the cookie.
domain option allows the server to specify a
domain name (remember the "Domain Name System" and that a
domain is a naming hierarchy?) that may include multiple hostnames
running web servers. In this case the client will send the cookie to
any server within the domain. This allows sharing of cookies among a
related set of web server (related by hostname - all the hostnames
have to have the same ending). For example - we could tell the browser
to send the cookie to any server in the domain
".cs.rpi.edu" and the cookie would be sent to
www.cs.rpi.edu or to cgi.cs.rpi.edu or to
eiw.cs.rpi.edu, ...
Here is a Set-Cookie header that includes a domain option. This
example tells the browser to send the cookie to any server in the domain
yahoo.com:
Set-Cookie: UserName=DaveHollinger; domain=.yahoo.com
Notice the name of the domain is ".yahoo.com" and not
"yahoo.com", the domain must start with a "."
and must have at least one more "." in it (or the domain
option is invalid and ignored by the browser).
NOTE: The path option works together with the
domain option - both restrictions are in place at all
times.
A Set-Cookie header can include multiple options, a
semicolon is used to seperate things. For example we could have the
following:
Set-Cookie: Prefs=NoImages; path=/cgi-bin; expires=Monday 31-Dec-2003 00:00:00 GMT; domain=.altavista.com
Cookies are commonly used in the following applications:
Amazon.com does this and tries to suggest books you might be interested in based on what books you've purchased in the past.
Some web search engines keep track of the number of hits you want to see, or whether you want verbose listings or concise listings.
Yahoo gives you this option when you log in with your yahoo login name, on your next visit you can post messages, chat, check your calendar, check your favorite stock quotes, personalized news, etc without logging in.
Creation of a cookie by a CGI program is simple - the CGI simply
prints out the Set-Cookie header before the
Content-type header as the first part of any response.
For example, the following CGI program will create two cookies on the
client, one named Color with the value Red and
another cookie named BeenHere with the value YES:
|
NOTE: The Set-Cookie header(s) must come before the
end of the headers (before the first blank line in the response)!
Part of the CGI protocol between the web server and the CGI program is
that a string containing all cookie name/value pairs is put in the
environment variable named HTTP_COOKIE. This string has the
form:
name1=value1; name2=value2; name3=value3 ...The web server does part of the work - it combines all cookie headers lines that come as part of the request and puts all the name value pairs in to an environment variable. Parsing this string and extracting the name/value pairs is similar to what we did with the query string - here is some code that will create an associative array containing all the cookie name/value pairs:
|
Using this subroutine is just like the GetQuery subroutine, here is an example CGI program that prints out all the cookies received (as an HTML table):
|
C:\Program Files\Netscape\Navigator\cookies.txt
C:\Windows\Cookies
expires cookie option specifies an expiration date the
browser will just store cookies in memory - so they won't show up in the
above files (as far as I know).