CGI stands for Common Gateway Interface, a standard mechanism used by many Web servers to support the creation of dynamic documents by external programs.
There are many issues involved with the creation of CGI programs:
/~yourname to a special
place within the home directory for the user whose name is
yourname. Typically the special place is a directory named
public.html or public_html. If you want a
file (or CGI program, or image, or whatever) to be available through
your Web server, you have to put the file in this special place.Most Web server's are configured to automatically respond to a request that maps to a directory name with either:
index.html the contents of the file are sent to the browser.
~/username/public.html/index.html, put some HTML
"stuff" in it and make sure that the file (and the directory
public.html) are readable by anyone.
> cd ~ > mkdir public.html > emacs public.html/index.html > chmod go+r public.html public.html/index.htmlNow anyone in the world can view this home page by requesting the document at
http://yourmachine/~username/
OK, back to the original question. A browser will send your Web server
an HTTP request (GET or POST) in which the resource name specified
corresponds to your executable CGI program. So if you have a CGI
program in your public.html directory named
mycgi.cgi the browser would send a request that asks for
/~username/mycgi.cgi. Some folks configure their Web server
to only allow requests for CGI programs in the directory
/cgi-bin. In this case you need to be able to put your
program there, otherwise the Web server will simply send back the
contents of your program (the file itself) rather than running your program and sending
back it's output.
Most server's that are configured to allow users to have CGI programs
in their home directory (really somewhere in
~/public.html) require that the file name of a CGI
program ends in the suffix .cgi. The Web server looks for
this suffix and decides whether to run the program in the requested
file, or whether to simply return the contents of the file.
| 1) |
If the query is created by the browser based on an ISINDEX tag
(where the user can enter a single line of text and press Enter to
submit the request), the browser submits a GET request specifying
a resource (filename) that is either :
In both cases the browser will append a '?' to the resource name, followed by the string the user typed in (possibly encoded - see the next section for details on the encoding). Examples: The following HTML contains an ISINDEX tag with an ACTION property: <H2>Enter a search string and I'll find what you are looking for</H2> <ISINDEX ACTION=http://foo.com/search.cgi><BR> <CENTER>press Enter to submit your query</CENTER>If the user types in "blahblah" and presses Enter the browser will connect to the web server on foo.com and submit something like this: GET /search.cgi?blahblah HTTP/1.0 The following HTML contains an ISINDEX tag with no ACTION property. The generated request will use the same resource name that the document itself came from. The document containing this HTML was retrieved from the URL http://foo.org/count.
<H2>Enter a string and I'll count the letters for you</H2> <ISINDEX ><BR> <CENTER>press Enter to submit your query</CENTER>If the user types in "abcdef" and presses Enter the browser will connect to the web server on foo.com and submit something like this: GET /count?blahblah HTTP/1.0In this case the resource /count seems to refer to both a
document and to a CGI program. This can be accomplished by having a
CGI program that simply returns a document if no query is submitted
(an empty query).
Q: I thought you said my CGI program had to be named A: I said "Most server's that are configured ...", not all servers. Many CGI programs are named something else. |
| 2) |
If the query is constructed based on the content of an HTML form, the
form itself specifies whether the request will be a GET or a POST.
GET is usually used only for small requests, this is because the
mechanism used by the web server to send the query from a GET request
to the CGI program has size limitations (more on this later). If a GET method is specified in the HTML form, the browser creates a query string based on the values the user typed in the form fields and appends it to the resource name just like we saw with an ISINDEX tag. If a POST method is used, the browser creates a query string based on the values the user typed in the form fields and sends this string (which may be large) as the content part of an HTTP POST command.
The query string itself is more complicated than with an ISINDEX based
query since there may be many fields in the FORM. Each field in an
HTML form has a name which is specified in the form itself (whoever
created the form has to specify a name for each field in the
form). Each field also has a value that the user can change by typing
in a new value or by clicking on checkboxes or radio buttons or
whatever. Once the user presses on the SUBMIT button, the browser
constructs a query string that contains a sequence of
NOTE: The encoding described above is done by default, you can
override this encoding by specifying an alternative encoding type in
the form itself. To do this you set a value for the ENCTYPE attribute
of a FORM tag. As far as I known the only other encoding supported is
the type (this is a MIME type) Examples: The following HTML form contains 2 fields, one named fname that we hope the user will use to submit his first name, and a field named lname for his last name. <FORM METHOD=GET ACTION=http://www.foo.com/register.cgi> First Name: <INPUT TYPE=TEXT NAME=fname><BR> Last Name: <INPUT TYPE=TEXT NAME=lname><BR> <INPUT TYPE=SUBMIT VALUE="press to submit"> </FORM>If the user types "dave or joe" as the first name and enters "lastname=foo" as the last name (remember that users can and will enter anything!), the browser will connect to the web server on www.foo.com and submit something like this: GET /register.cgi?fname=dave+or+joe&lname=lastname%3Dfoo HTTP/1.0 The following HTML form contains the same 2 fields, but the method specified in the form is POST. <FORM METHOD=POST ACTION=http://www.foo.com/register.cgi> First Name: <INPUT TYPE=TEXT NAME=fname><BR> Last Name: <INPUT TYPE=TEXT NAME=lname><BR> <INPUT TYPE=SUBMIT VALUE="press to submit"> </FORM>If the user types "John" as the first name and enters "Doe a Deer" as the last name the browser will connect to the web server on www.foo.com and submit something like this: GET /register.cgi HTTP/1.0 content-length: 26 http-headers: whatever fname=John&lname=Doe+a+Deer In this case the same encoding takes place, but the query string is submitted as the content of the request, not as part of the resource name. You might also notice the request includes an HTTP header specifying the length of the content - this is important as we'll see soon... |
GET method: When a GET request is received by the web server and the resource specified (everything before the '?') is your CGI program, the web server will grab everything after the '?' and stuff it in to the environment variable named QUERY_STRING. The web server will also set the environment variable REQUEST_METHOD to the value "GET". Then the web server will start up your CGI program connecting STDOUT of your program to a pipe the server can read. Your program should get the query by reading the environment variable QUERY_STRING, and then process the query and send the results to the web server by simply writing to STDOUT.
Many operating systems have limitations on the size of the environment variables - this might get in the way if you have large queries, since the entire query must be able to fit in QUERY_STRING. So most non-trival queries are submitted using the POST method.
POST method: When a POST request is received by the web server and the resource specified is your CGI program, the web server will read the HTTP headers (including the one specifying the content length) and set the environment variable CONTENT_LENGTH. The REQUEST_METHOD environment variable will be set to POST. Now your CGI program will be started up with STDIN and STDOUT attached to pipes going back to the web server. The server will now write the entire query string (the content of the POST) to the pipe connected to your STDIN.
Your program should get the length of the query string from the CONTENT_LENGTH environment variable so it knows how much to read from STDIN (coming from the web server). BE CAREFUL! Don't use a static array unless you are willing to refuse to read the entire query (it might be larger than your array and could screw up your program and make it possible for bad guys to break into your machine, delete all your files, send mail from you to the FBI suggesting that you might be someone they are looking for, and worst of all - create a really lousy project 4 and submit it to netprog-submit@cs.rpi.edu...).
Content-type: text/htmlbefore sending the content. This is actually a HTTP header you are sending, so assuming it is the only header you want to send you need to also send a blank line (all header lines should end with \r\n).
To send this header and the content back to the browser you simply write to STDOUT, which actually goes back to the web server via a pipe, and the web server forwards it to the browser. The web server will probably add a bunch of headers as well, generally we don't need to worry about this although there are ways to configure the server to not send any extra headers.
In short - just use printf to send the content back to
the browser.
Just about the worst thing you can do is to
blindly construct a Unix command line based on a request, and
give the command to a Unix shell to run (using popen, system, etc).
In the example shown in class, the CGI program expects a keyword as a
request, the CGI program greps a dictionary for all words the contain
the keyword. So the intent was that if a user send the request "foo",
the Unix command grep foo /usr/dict/words is constructed
and run (using popen). However, if a user enters the query "; rm *"
the resulting command would look like this:
grep; rm * /usr/dict/wordsand you might lose a bunch of files...
One common theme among well know cracks (of many Unix services) it to overflow an input buffer in the server. If all server were written correctly, this would not be a problem. You must make sure that read is never called with an input buffer smaller than the maximum size given to the read system call.
Here is a classic example of the problem. This code (from a CGI program) is handling a POST request, so it checks to see how large the request is (by getting the environment variable CONTENT_LENGTH) and then reads that much stuff from STDIN:
char buff[1000];
int len;
char *cl;
cl = getenv("CONTENT_LENGTH");
if (cl==NULL) {
/* Error */
exit(1);
}
len = atoi(cl);
read(buff,0,len);
This code never makes sure that len is less than 1000!!!!
At a minimum this makes it easy to crash your server, in the worst
case some clever hacker (with lots of spare time) could use this to
break in to your computer.For more information about CGI security check out:
Setting Cookies: To set a cookie, you need to include an HTTP header
line in the response. The header field name is
Set-cookie. The entire header line should look something
like this:
Set-cookie: SESSION_ID=018365; path=/; domain=.rpi.edu expires=Sunday, 12-April-98 12:00:00 GMTThe first part of the cookie specifies the cookie name (in this case SESSION_ID) and value (in this case 018365). The domain and path control when the client will send the cookie along with a request. The domain is a DNS domain name (or hostname) and identifies those servers that should be sent the cookie. Once the browser has received the cookie it will only be sent along with requests to web servers in the specified domain. The path allows us to further specify that only some of the entities in the domain indicated should receive the cookie. The expires field specifies how long the client should hold on to the cookie before tossing it. If there is no expires field the cookie will never be saved to disk (by the browser) and will be gone once the browser exits. In other words the browser will toss it's cookies. You should remember that all cookies are subject to removal by the user, and your service should never _require_ that cookies persist between sessions.
Since you need to use an HTTP header to set a cookie, remember that this header must come before any content is sent back by your browser, and must be before a blank line is sent back (because a blank line tells the browser there are no more headers).
Getting Cookies First you must get the kids outside or otherwise occupied. It is also usually good to make sure the dog is not watching since she might later attempt to get the cookies herself. You'll need a brown paper bag to hide the cookies, I suggest a grocery bag for large cookie excursions, although a small lunch style bag is perfect if you only plan on nabbing a few cookies at once. OK - head into the kitchen, and...       Oops, sorry!.
Getting Cookies from the browser The browser sends it's cookies as an HTTP header. Your web server will store the cookies in the environment variable "HTTP_COOKIE" before starting your CGI program. Since there may be many cookies, this environment variable will hold them all in the form "name1=value1; name2=value2; ...". We can see right away that the cookie name and value can't include the characters '=' or ';', you also can't include whitespace in the name or value of a cookie. Anything else is OK.
You need to parse the HTTP_COOKIE string to get at the cookie you are interested in - check out the code here for a simple example.
For more information on cookies, here are some links:
A: The best way to get started is to copy one of the example CGI programs to your CS public.html directory, build it and run it. Then play around making changes to the HTML and CGI program. This might help:
BUT: cgi.cs.rpi.edu is a Sun, so your cgi programs must be executable on a Sun. If you build your C program on monica or another BSD machine it won't run on cgi.cs.rpi.edu or any other Sun (nor will Microsoft Internet Explorer, but then - that's a feature...).