Web Programming
Based on Notes by D. Hollinger
Also Java Network Programming and Distributed Computing, Chs. 9,10
Also Online Java Tutorial, Sun.

World-Wide Web
(Tim Berners-Lee & Cailliau ’92)

Topics
HTTP – HyperText Transfer Protocol
HTML – HyperText Markup Language
URI – Uniform Resource Identifiers
URL – Uniform Resource Locators
URN – Uniform Resource Names
URC – Uniform Resource Citations
Server-Side Programming
HTML Forms

HTTP
Hypertext Transfer Protocol
Refs:
RFC 1945 (HTTP 1.0)
RFC 2616 (HTTP 1.1)

HTTP Usage
HTTP is the protocol that supports communication between web browsers and web servers.
A “Web Server” is a HTTP server
We will look at HTTP Version 1.0 +

From the RFC
“HTTP is an application-level protocol with the lightness and speed necessary for distributed, hypermedia information systems.”

Transport Independence
The RFC states that the HTTP protocol generally takes place over a TCP connection, but the protocol itself is not dependent on a specific transport layer.

Request - Response
HTTP has a simple structure:
client sends a request
server returns a reply.
HTTP can support multiple request-reply exchanges over a single TCP connection.

Well Known Address
The “well known” TCP port for HTTP servers is port 80.
Other ports can be used as well...

HTTP Versions
The original version now goes by the name “HTTP Version 0.9”
HTTP 0.9 was used for many years.
Starting with HTTP 1.0 the version number is part of every request.
HTTP is still changing...

HTTP 1.0+ Request
Lines of text (ASCII).
Lines end with CRLF   “\r\n”
First line is called “Request-Line”

Request Line
Method URI HTTP-Version \r\n
The request line contains 3 tokens (words).
space characters “ “ separate the tokens.
Newline (\n) seems to work by itself (but the protocol requires CRLF)

Request Method
The Request Method can be:
GET HEAD PUT
POST DELETE TRACE
OPTIONS
future expansion is supported

Methods
GET: retrieve information identified by the URI.
HEAD: retrieve meta-information about the URI.
POST: send information to a URI and retrieve result.

Methods (cont.)
PUT: Store information in location named by URI.
DELETE: remove entity identified by URI.

More Methods
TRACE: used to trace HTTP forwarding through proxies, tunnels, etc.
OPTIONS: used to determine the capabilities of the server, or characteristics of a named resource.

Common Usage
GET, HEAD and POST are supported everywhere.
HTTP 1.1 servers often support PUT, DELETE, OPTIONS & TRACE.

URI: Uniform Resource Identifier
URIs defined in RFC 2396.
Absolute URI: scheme://hostname[:port]/path
http://www.cs.rpi.edu:80/blah/foo
Relative URI: /path
/blah/foo

URI Usage
When dealing with a HTTP 1.1 server, only a path is used (no scheme or hostname).
HTTP 1.1 servers are required to be capable of handling an absolute URI, but there are still some out there that won’t…
When dealing with a proxy HTTP server, an absolute URI is used.
client has to tell the proxy where to get the document!
more on proxy servers in a bit….

HTTP Version Number
“HTTP/1.0”    or   “HTTP/1.1”
HTTP 0.9 did not include a version number in a request line.
If a server gets a request line with no HTTP version number, it assumes 0.9

The Header Lines
After the Request-Line come a number (possibly zero) of HTTP headers.
Each header line contains an attribute name followed by a “:” followed by the attribute value.

Headers
Request Headers provide information to the server about the client
what kind of client
what kind of content will be accepted
who is making the request
There can be 0 headers

Example HTTP Headers
Accept: text/html
From: neytmann@cybersurg.com
User-Agent: Mozilla/4.0
Referer: http://foo.com/blah

End of the Headers
Each header ends with a CRLF
The end of the header section is marked with a blank line.
just CRLF
For GET and HEAD requests, the end of the headers is the end of the request!

POST
A POST request includes some content (some data) after the headers (after the blank line).
There is no format for the data (just raw bytes).
A POST request must include a Content-Length line in the headers:
Content-Length: 267

Example GET Request
GET /~hollingd/testanswers.html HTTP/1.0
Accept: */*
User-Agent: Internet Explorer
From: cheater@cheaters.org
Referer: http://foo.com/

Slide 27

Typical Method Usage
GET used to retrieve an HTML document.
HEAD used to find out if a document has changed.
POST used to submit a form.

HTTP Response
ASCII Status Line
Headers Section
Content can be anything (not just text)
typically is HTML document or some kind of image.

Response Status Line
HTTP-Version    Status-Code    Message
Status Code is 3 digit number (for computers)
Message is text (for humans)

Status Codes
1xx Informational
2xx Success
3xx Redirection
4xx Client Error
5xx Server Error

Example Status Lines
HTTP/1.0 200 OK
HTTP/1.0 301 Moved Permanently
HTTP/1.0 400 Bad Request
HTTP/1.0 500 Internal Server Error

Response Headers
Provide the client with information about the returned entity (document).
what kind of document
how big the document is
how the document is encoded
when the document was last modified
Response headers end with blank line

Response Header Examples
Date: Wed, 30 Jan 2002 12:48:17 EST
Server: Apache/1.17
Content-Type: text/html
Content-Length: 1756
Content-Encoding: gzip

Content
Content can be anything (sequence of raw bytes).
Content-Length header is required for any response that includes content.
Content-Type header also required.

Single Request/Reply
The client sends a complete request.
The server sends back the entire reply.
The server closes it’s socket.
If the client needs another document it must open a new connection.

Persistent Connections
HTTP 1.1 supports persistent connections (this is supposed to be the default).
Multiple requests can be handled.
Most servers seem to close the connection after the first response…

Try it with telnet
> telnet www.cs.rpi.edu 80
GET / HTTP/1.0
HTTP/1.0 200 OK
Server: Apache
...

HTTP Proxy Server

Tyba: A simple (and incomplete) HTTP Server Implementation in Java
See:
http://yangtze.cs.uiuc.edu/~cvarela/tyba/

Server-Side Programming

Web Server Architecture
(Berners-Lee & Cailliau ’92)

Request Method: Get
GET requests can include a query string as part of the URL:
GET /program/finger?hollingd HTTP/1.0

/program/finger?hollingd
The web server treats everything before the ‘?’ delimiter as the resource name
In this case the resource name is the name of a program. (could be a CGI script, a servlet, or your own HTTP server)
Everything after the ‘?’ is a string that is passed to the server program (in the case of CGI and servlets)

Simple GET queries - ISINDEX
You can put an <ISINDEX> tag inside an HTML document.
The browser will create a text box that allows the user to enter a single string.
If an ACTION is specified in the ISINDEX tag, when the user presses Enter, a request will be sent to the server specified as the ACTION.

ISINDEX Example
Enter a string:
<ISINDEX ACTION=http://foo.com/search>
Press Enter to submit your query.
If you enter the string “blahblah”, the browser will send a request to the http server at foo.com that looks like this:
GET /search?blahblah HTTP/1.1

URL-encoding
Browsers use an encoding when sending query strings that include special characters.
Most nonalphanumeric characters are encoded as a ‘%’ followed by 2 ASCII encoded hex digits.
‘=‘ (which is hex 3D) becomes “%3D”
‘&’ becomes “%26”

More URL encoding
The space character ‘ ‘ is replaced by ‘+’.
Why?
The ‘+’ character is replaced by “%2B”
Example:
“foo=6 + 7” becomes “foo%3D6+%2B+7”

URL Encoding in Java
java.net.URLEncoder class
String original = “foo=6 + 7”;
System.out.println(
URLEncoder.encode(original));
foo%3D6+%2B+7

URL Decoding in Java
java.net.URLDecoder class
String encoded = “foo%3D6+%2B+7”;
System.out.println(
URLDecoder.decode(encoded));
foo=6 + 7

Beyond ISINDEX - Forms
Many Web services require more than a simple field in the web form.
HTML includes support for forms:
lots of field types
user answers all kinds of annoying questions
entire contents of form must be stuck together and put in the query by the web client.

Form Fields
Each field within a form has a name and a value.
The browser creates a query that includes a sequence of “name=value” sub-strings and sticks them together separated by the ‘&’ character.

Form fields and encoding
2 fields - name and occupation.
If user types in “Dave H.” as the name and “none” for occupation, the query would look like this:
“name=Dave+H%2E&occupation=none”

HTML Forms
Each form includes a METHOD that determines what http method is used to submit the request.
Each form includes an ACTION that determines where the request is made.

An HTML Form
<FORM METHOD=GET ACTION=http://foo.com/signup>
Name:
<INPUT TYPE=TEXT NAME=name><BR>
Occupation:
<INPUT TYPE=TEXT NAME=occupation><BR>
<INPUT TYPE=SUBMIT>
</FORM>

What the server will get
The query will be a URL-encoded string containing the name,value pairs of all form fields.
The server program (or a CGI script, or a servlet) must decode the query and separate the individual fields.

HTTP Method: POST
The HTTP POST method delivers data from the browser as the content of the request.
The GET method delivers data (query) as part of the URI.

GET vs. POST
When using forms it’s generally better to use POST:
 there are limits on the maximum size of a GET query string (environment variable)
a post query string doesn’t show up in the browser as part of the current URL.

HTML Form using POST
Set the form method to POST instead of GET.
<FORM METHOD=POST ACTION=…>
The browser will take care of the details...

Server reading POST
If the request is a POST, the query is coming in the body of the HTTP request.
The “Content-length” header tells us how much data to read.

HTML Forms (in more detail)

Form Elements
Each HTML form contains the following:
<FORM>, </FORM> tags
The <FORM> tag has two required attributes:
METHOD specifies the HTTP method used to send the request to the server (when the user submits the form).
ACTION specifies the URL the request is sent to.

FORM Method
We have seen the two common methods used:
GET: any user input is submitted as part of the URI following a “?”.
GET foo?name=joe&cookie=oreo HTTP/1.0
POST: any user input is submitted as the content of the request (after the HTTP headers).

Sample POST
Request
POST /dir/foo HTTP/1.0
User-Agent: Netscape
Content-Length: 20
Cookie: favorite=chocolatechip
ECACChamps: RPI
name=joe&cookie=oreo

Form ACTION
attribute
The ACTION attribute specifies the URL to which the request is sent. Some examples:
ACTION=“http://www.cs.rpi.edu/CGI_BIN/foo”
ACTION=“myprog”
ACTION=“mailto:hollingd@cs.rpi.edu”

<FORM>
Tag Examples
<FORM  METHOD=“POST”
 ACTION=“http://www.foo.com/cgi-bin/myprog”>
<FORM  METHOD=“GET” ACTION=“/cgi-bin/myprog”>
<FORM  METHOD=“POST”
 ACTION=“mailto:shirley@pres.rpi.edu”>

Inside a form
Between the <FORM> and </FORM> tags you define the text and fields that make up the form.
You can use normal HTML tags to format the text however you want.
The fields are defined using tags as well.

Form Fields
There are a variety of types of form fields:
text fields: text, password, textarea
radio buttons
checkboxs
buttons: user defined, submit, reset (clear)
hidden fields

Input Fields
There are a number of field types that allow the user to type in a string value as input.
Each field is created using an <INPUT> tag with the attribute TYPE.

Input Attributes
The TYPE attribute is used to specify what kind of input is allowed: TEXT, PASSWORD, FILE, ...
Every INPUT tag must have a NAME attribute.

TEXT Fields
TEXT is the most common type of input:
user can enter a single line of text.
Additional attributes can specify:
the maximum string length - MAXLENGTH
the size of the input box drawn by the browser - SIZE
a default value - VALUE

TEXT INPUT
 Examples
<INPUT TYPE=TEXT NAME=FOO>
<INPUT TYPE=“TEXT”
       NAME=“PIZZA”
       SIZE=10
       MAXLENGTH=20
       VALUE=“Pepperoni”>

An example form
<FORM METHOD=POST ACTION=cgi-bin/foo>
Your Name:
<INPUT TYPE=TEXT NAME=“Name”><BR>
Your Age:
<INPUT TYPE=TEXT NAME’”Age”><BR>
</FORM>

Submission
Buttons
Another type of INPUT field is the submission button.
When a user clicks on a submit button the browser submits the contents of all other fields to a web server using the METHOD and ACTION attributes.
<INPUT TYPE=SUBMIT VALUE=“press me”>

Reset Buttons
An INPUT of type RESET tells the browser to display a button that will clear all the fields in the form.
<INPUT TYPE=RESET
       VALUE=“press me to clear form”>

A Complete
Form Example
<FORM METHOD=POST ACTION=cgi-bin/foo>
Your Name:
<INPUT TYPE=TEXT NAME=“Name”><BR>
Your Age: <INPUT TYPE=TEXT NAME=”Age”><BR>
<INPUT TYPE=SUBMIT VALUE=“Submit”>
<INPUT TYPE=RESET>
</FORM>

Tables and Forms
Tables are often used to make forms look pretty - remember that you can use any HTML tags to control formatting of a form.

Table/Form example
<FORM METHOD=POST ACTION=cgi-bin/foo>
<TABLE><TR>
  <TD>Your Name: </TD>
  <TD><INPUT TYPE=TEXT NAME=“Name”></TD>
</TR><TR>
  <TD>Your Age:</TD>
  <TD> <INPUT TYPE=TEXT NAME=”Age”></TD>
</TR><TR>
  <TD><INPUT TYPE=SUBMIT VALUE=“Submit”></TD>
  <TD><INPUT TYPE=RESET></TD>
</TR></TABLE>
</FORM>

Other Inputs
Checkboxes
present user with items that can be selected or deselected. Each checkbox has a name and a value and can be initially selected/deselected
Example checkbox definitions:
<INPUT TYPE=checkbox name=chocchip value=1>
<INPUT TYPE=checkbox name=oreo value=1>

Checkbox example
<FORM METHOD=POST ACTION=cgi-bin/foo>
Select all the cookies you want to order:<BR>
<INPUT TYPE=CHECKBOX NAME=Oreo Value=1>
   Oreo<BR>
<INPUT TYPE=CHECKBOX NAME=Oatmeal Value=1>
  Oatmeal<BR>
<INPUT TYPE=CHECKBOX CHECKED NAME=ChocChip Value=1>
  Chocolate Chip<BR>
<INPUT TYPE=SUBMIT VALUE=Submit>
</FORM>

Radio Buttons
Radio Buttons are like checkbox except that the user can select only one item at a time.
All radio buttons in a group have the same NAME.
<INPUT TYPE=radio name=cookie value=chocchip>
<INPUT TYPE=radio name=cookie value=oreo>
<INPUT TYPE=radio name=cookie value=oatmeal>

Radio Button
Example
<FORM METHOD=POST ACTION=cgi-bin/foo>
Select all the cookies you want to order:<BR>
<INPUT TYPE=RADIO NAME=Cookie Value=Oreo> Oreo <BR>
<INPUT TYPE=RADIO NAME=Cookie Value=Oatmeal> Oatmeal <BR>
<INPUT TYPE=RADIO CHECKED NAME=Cookie Value=ChocChip> ChocolateChip<BR>
<INPUT TYPE=SUBMIT VALUE=Submit>
</FORM>

Multiline Text
The TEXTAREA tag creates an area where the user can submit multiple lines of text.
This is not another type of <INPUT> tag!

TEXTAREA
Attributes
Each TEXTAREA tag has attributes NAME, COLS and ROWS.
<TEXTAREA name=address rows=5 cols=40>
default text goes here (or can be empty)
</TEXTAREA>

TEXTAREA example
<FORM METHOD=POST ACTION=cgi-bin/foo>
Please enter your address in the space provided:<BR>
<TEXTAREA NAME=address COLS=40 ROWS=5>
</TEXTAREA>
<BR>
<INPUT TYPE=SUBMIT VALUE=Submit>
</FORM>

Form Submission
When the user presses on a SUBMIT button the following happens:
browser uses the FORM method and action attributes to construct a request.
A query string is built using the (name,value) pairs from each form element.
Query string is URL-encoded.

Input
Submissions
For each checkbox selected the name,value pair is sent.
For all checkboxes that are not selected - nothing is sent.
A single name,value pair is sent for each group of radio buttons.

Other Form
Field Types
There are other form field types:
SELECT - pulldown menu or scrolled list of choices.
Image Buttons
Push Buttons (choice of submit buttons)

Hidden
Fields
Nothing is displayed by the browser.
The name,value are sent along with the submission request.
<INPUT TYPE=HIDDEN
               NAME=SECRET
               VALUE=AGENT>

Hidden does not
mean secure!
Anyone can look at the source of an HTML document.
hidden fields are part of the document!
If a form uses GET, all the name/value pairs are sent as part of the URI
URI shows up in the browser as the location of the current page