|
|
|
Based on Notes by D. Hollinger |
|
Also Java Network Programming and Distributed
Computing, Chs. 9,10 |
|
Also Online Java Tutorial, Sun. |
|
|
|
|
|
|
|
|
HTTP – HyperText Transfer Protocol |
|
HTML – HyperText Markup Language |
|
URI – Uniform Resource Identifiers |
|
URL – Uniform Resource Locators |
|
URN – Uniform Resource Names |
|
URC – Uniform Resource Citations |
|
|
|
Server-Side Programming |
|
HTML Forms |
|
|
|
|
Refs: |
|
RFC 1945 (HTTP 1.0) |
|
RFC 2616 (HTTP 1.1) |
|
|
|
|
HTTP is the protocol that supports communication
between web browsers and web servers. |
|
|
|
A “Web Server” is a HTTP server |
|
|
|
We will look at HTTP Version 1.0 + |
|
|
|
|
“HTTP is an application-level protocol with the
lightness and speed necessary for distributed, hypermedia information
systems.” |
|
|
|
|
|
|
|
|
The RFC states that the HTTP protocol generally
takes place over a TCP connection, but the protocol itself is not dependent
on a specific transport layer. |
|
|
|
|
|
|
|
HTTP has a simple structure: |
|
client sends a request |
|
server returns a reply. |
|
|
|
HTTP can support multiple request-reply
exchanges over a single TCP connection. |
|
|
|
|
The “well known” TCP port for HTTP servers is
port 80. |
|
|
|
Other ports can be used as well... |
|
|
|
|
|
The original version now goes by the name “HTTP
Version 0.9” |
|
HTTP 0.9 was used for many years. |
|
|
|
Starting with HTTP 1.0 the version number is
part of every request. |
|
HTTP is still changing... |
|
|
|
|
Lines of text (ASCII). |
|
|
|
Lines end with CRLF “\r\n” |
|
|
|
First line is called “Request-Line” |
|
|
|
|
Method URI HTTP-Version \r\n |
|
|
|
The request line contains 3 tokens (words). |
|
|
|
space characters “ “ separate the tokens. |
|
|
|
Newline (\n) seems to work by itself (but the
protocol requires CRLF) |
|
|
|
|
The Request Method can be: |
|
|
|
GET HEAD PUT |
|
POST DELETE TRACE |
|
OPTIONS |
|
|
|
future expansion is supported |
|
|
|
|
GET: retrieve information identified by the URI. |
|
|
|
HEAD: retrieve meta-information about the URI. |
|
|
|
POST: send information to a URI and retrieve
result. |
|
|
|
|
PUT: Store information in location named by URI. |
|
|
|
DELETE: remove entity identified by URI. |
|
|
|
|
|
|
TRACE: used to trace HTTP forwarding through
proxies, tunnels, etc. |
|
|
|
OPTIONS: used to determine the capabilities of
the server, or characteristics of a named resource. |
|
|
|
|
|
|
GET, HEAD and POST are supported everywhere. |
|
|
|
HTTP 1.1 servers often support PUT, DELETE,
OPTIONS & TRACE. |
|
|
|
|
|
URIs defined in RFC 2396. |
|
|
|
Absolute URI: scheme://hostname[:port]/path |
|
http://www.cs.rpi.edu:80/blah/foo |
|
|
|
Relative URI: /path |
|
/blah/foo |
|
|
|
|
|
When dealing with a HTTP 1.1 server, only a path
is used (no scheme or hostname). |
|
HTTP 1.1 servers are required to be capable of
handling an absolute URI, but there are still some out there that won’t… |
|
|
|
When dealing with a proxy HTTP server, an
absolute URI is used. |
|
client has to tell the proxy where to get the
document! |
|
more on proxy servers in a bit…. |
|
|
|
|
“HTTP/1.0”
or “HTTP/1.1” |
|
|
|
HTTP 0.9 did not include a version number in a
request line. |
|
|
|
If a server gets a request line with no HTTP
version number, it assumes 0.9 |
|
|
|
|
After the Request-Line come a number (possibly
zero) of HTTP headers. |
|
|
|
Each header line contains an attribute name
followed by a “:” followed by the attribute value. |
|
|
|
|
|
Request Headers provide information to the
server about the client |
|
what kind of client |
|
what kind of content will be accepted |
|
who is making the request |
|
|
|
There can be 0 headers |
|
|
|
|
Accept: text/html |
|
From: neytmann@cybersurg.com |
|
User-Agent: Mozilla/4.0 |
|
Referer: http://foo.com/blah |
|
|
|
|
|
|
|
Each header ends with a CRLF |
|
The end of the header section is marked with a
blank line. |
|
just CRLF |
|
|
|
For GET and HEAD requests, the end of the
headers is the end of the request! |
|
|
|
|
A POST request includes some content (some data)
after the headers (after the blank line). |
|
|
|
There is no format for the data (just raw
bytes). |
|
|
|
A POST request must include a Content-Length
line in the headers: |
|
Content-Length: 267 |
|
|
|
|
GET /~hollingd/testanswers.html HTTP/1.0 |
|
Accept: */* |
|
User-Agent: Internet Explorer |
|
From: cheater@cheaters.org |
|
Referer: http://foo.com/ |
|
|
|
|
|
|
|
GET used to retrieve an HTML document. |
|
|
|
HEAD used to find out if a document has changed. |
|
|
|
POST used to submit a form. |
|
|
|
|
|
ASCII Status Line |
|
|
|
Headers Section |
|
|
|
Content can be anything (not just text) |
|
typically is HTML document or some kind of
image. |
|
|
|
|
HTTP-Version
Status-Code Message |
|
|
|
Status Code is 3 digit number (for computers) |
|
|
|
Message is text (for humans) |
|
|
|
|
1xx Informational |
|
2xx Success |
|
3xx Redirection |
|
4xx Client Error |
|
5xx Server Error |
|
|
|
|
HTTP/1.0 200 OK |
|
|
|
HTTP/1.0 301 Moved Permanently |
|
|
|
HTTP/1.0 400 Bad Request |
|
|
|
HTTP/1.0 500 Internal Server Error |
|
|
|
|
|
Provide the client with information about the
returned entity (document). |
|
what kind of document |
|
how big the document is |
|
how the document is encoded |
|
when the document was last modified |
|
|
|
Response headers end with blank line |
|
|
|
|
Date: Wed, 30 Jan 2002 12:48:17 EST |
|
Server: Apache/1.17 |
|
Content-Type: text/html |
|
Content-Length: 1756 |
|
Content-Encoding: gzip |
|
|
|
|
Content can be anything (sequence of raw bytes). |
|
|
|
Content-Length header is required for any
response that includes content. |
|
|
|
Content-Type header also required. |
|
|
|
|
The client sends a complete request. |
|
The server sends back the entire reply. |
|
The server closes it’s socket. |
|
|
|
If the client needs another document it must
open a new connection. |
|
|
|
|
HTTP 1.1 supports persistent connections (this
is supposed to be the default). |
|
Multiple requests can be handled. |
|
Most servers seem to close the connection after
the first response… |
|
|
|
|
> telnet www.cs.rpi.edu 80 |
|
GET / HTTP/1.0 |
|
|
|
HTTP/1.0 200 OK |
|
Server: Apache |
|
... |
|
|
|
|
|
|
|
|
|
|
See: |
|
http://yangtze.cs.uiuc.edu/~cvarela/tyba/ |
|
|
|
|
|
|
|
|
|
|
GET requests can include a query string as part
of the URL: |
|
|
|
GET /program/finger?hollingd HTTP/1.0 |
|
|
|
|
The web server treats everything before the ‘?’
delimiter as the resource name |
|
|
|
In this case the resource name is the name of a
program. (could be a CGI script, a servlet, or your own HTTP server) |
|
|
|
Everything after the ‘?’ is a string that is
passed to the server program (in the case of CGI and servlets) |
|
|
|
|
You can put an <ISINDEX> tag inside an
HTML document. |
|
The browser will create a text box that allows
the user to enter a single string. |
|
If an ACTION is specified in the ISINDEX tag,
when the user presses Enter, a request will be sent to the server specified
as the ACTION. |
|
|
|
|
Enter a string: |
|
<ISINDEX ACTION=http://foo.com/search> |
|
Press Enter to submit your query. |
|
|
|
If you enter the string “blahblah”, the browser
will send a request to the http server at foo.com that looks like this: |
|
|
|
GET /search?blahblah HTTP/1.1 |
|
|
|
|
|
Browsers use an encoding when sending query
strings that include special characters. |
|
Most nonalphanumeric characters are encoded as a
‘%’ followed by 2 ASCII encoded hex digits. |
|
‘=‘ (which is hex 3D) becomes “%3D” |
|
‘&’ becomes “%26” |
|
|
|
|
|
The space character ‘ ‘ is replaced by ‘+’. |
|
Why? |
|
|
|
The ‘+’ character is replaced by “%2B” |
|
|
|
Example: |
|
“foo=6 + 7” becomes “foo%3D6+%2B+7” |
|
|
|
|
|
java.net.URLEncoder class |
|
|
|
String original = “foo=6 + 7”; |
|
System.out.println( |
|
URLEncoder.encode(original)); |
|
|
|
foo%3D6+%2B+7 |
|
|
|
|
|
java.net.URLDecoder class |
|
|
|
String encoded = “foo%3D6+%2B+7”; |
|
System.out.println( |
|
URLDecoder.decode(encoded)); |
|
|
|
foo=6 + 7 |
|
|
|
|
|
Many Web services require more than a simple
field in the web form. |
|
HTML includes support for forms: |
|
lots of field types |
|
user answers all kinds of annoying questions |
|
entire contents of form must be stuck together
and put in the query by the web client. |
|
|
|
|
Each field within a form has a name and a value. |
|
|
|
The browser creates a query that includes a
sequence of “name=value” sub-strings and sticks them together separated by
the ‘&’ character. |
|
|
|
|
2 fields - name and occupation. |
|
If user types in “Dave H.” as the name and
“none” for occupation, the query would look like this: |
|
|
|
“name=Dave+H%2E&occupation=none” |
|
|
|
|
Each form includes a METHOD that determines what
http method is used to submit the request. |
|
|
|
Each form includes an ACTION that determines
where the request is made. |
|
|
|
|
<FORM METHOD=GET
ACTION=http://foo.com/signup> |
|
Name: |
|
<INPUT TYPE=TEXT NAME=name><BR> |
|
Occupation: |
|
<INPUT TYPE=TEXT
NAME=occupation><BR> |
|
<INPUT TYPE=SUBMIT> |
|
</FORM> |
|
|
|
|
The query will be a URL-encoded string
containing the name,value pairs of all form fields. |
|
|
|
The server program (or a CGI script, or a
servlet) must decode the query and separate the individual fields. |
|
|
|
|
|
|
The HTTP POST method delivers data from the
browser as the content of the request. |
|
|
|
The GET method delivers data (query) as part of
the URI. |
|
|
|
|
|
|
|
When using forms it’s generally better to use
POST: |
|
there
are limits on the maximum size of a GET query string (environment variable) |
|
a post query string doesn’t show up in the
browser as part of the current URL. |
|
|
|
|
|
|
Set the form method to POST instead of GET. |
|
|
|
<FORM METHOD=POST ACTION=…> |
|
|
|
The browser will take care of the details... |
|
|
|
|
If the request is a POST, the query is coming in
the body of the HTTP request. |
|
|
|
The “Content-length” header tells us how much
data to read. |
|
|
|
|
|
|
|
Each HTML form contains the following: |
|
<FORM>, </FORM> tags |
|
The <FORM> tag has two required
attributes: |
|
METHOD specifies the HTTP method used to send
the request to the server (when the user submits the form). |
|
ACTION specifies the URL the request is sent to. |
|
|
|
|
|
|
We have seen the two common methods used: |
|
GET: any user input is submitted as part of the
URI following a “?”. |
|
GET foo?name=joe&cookie=oreo HTTP/1.0 |
|
POST: any user input is submitted as the content
of the request (after the HTTP headers). |
|
|
|
|
POST /dir/foo HTTP/1.0 |
|
User-Agent: Netscape |
|
Content-Length: 20 |
|
Cookie: favorite=chocolatechip |
|
ECACChamps: RPI |
|
|
|
name=joe&cookie=oreo |
|
|
|
|
The ACTION attribute specifies the URL to which
the request is sent. Some examples: |
|
|
|
ACTION=“http://www.cs.rpi.edu/CGI_BIN/foo” |
|
|
|
ACTION=“myprog” |
|
|
|
ACTION=“mailto:hollingd@cs.rpi.edu” |
|
|
|
|
<FORM
METHOD=“POST” |
|
ACTION=“http://www.foo.com/cgi-bin/myprog”> |
|
|
|
<FORM
METHOD=“GET” ACTION=“/cgi-bin/myprog”> |
|
|
|
<FORM
METHOD=“POST” |
|
ACTION=“mailto:shirley@pres.rpi.edu”> |
|
|
|
|
|
|
Between the <FORM> and </FORM> tags
you define the text and fields that make up the form. |
|
You can use normal HTML tags to format the text
however you want. |
|
The fields are defined using tags as well. |
|
|
|
|
|
There are a variety of types of form fields: |
|
text fields: text, password, textarea |
|
radio buttons |
|
checkboxs |
|
buttons: user defined, submit, reset (clear) |
|
hidden fields |
|
|
|
|
|
|
There are a number of field types that allow the
user to type in a string value as input. |
|
|
|
Each field is created using an <INPUT> tag
with the attribute TYPE. |
|
|
|
|
The TYPE attribute is used to specify what kind
of input is allowed: TEXT, PASSWORD, FILE, ... |
|
|
|
Every INPUT tag must have a NAME attribute. |
|
|
|
|
|
|
|
|
TEXT is the most common type of input: |
|
user can enter a single line of text. |
|
Additional attributes can specify: |
|
the maximum string length - MAXLENGTH |
|
the size of the input box drawn by the browser -
SIZE |
|
a default value - VALUE |
|
|
|
|
<INPUT TYPE=TEXT NAME=FOO> |
|
|
|
<INPUT TYPE=“TEXT” |
|
NAME=“PIZZA” |
|
SIZE=10 |
|
MAXLENGTH=20 |
|
VALUE=“Pepperoni”> |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
Your Name: |
|
<INPUT TYPE=TEXT NAME=“Name”><BR> |
|
|
|
Your Age: |
|
<INPUT TYPE=TEXT NAME’”Age”><BR> |
|
|
|
</FORM> |
|
|
|
|
Another type of INPUT field is the submission
button. |
|
When a user clicks on a submit button the
browser submits the contents of all other fields to a web server using the
METHOD and ACTION attributes. |
|
|
|
<INPUT TYPE=SUBMIT VALUE=“press me”> |
|
|
|
|
An INPUT of type RESET tells the browser to
display a button that will clear all the fields in the form. |
|
|
|
<INPUT TYPE=RESET |
|
VALUE=“press me to clear form”> |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
Your Name: |
|
<INPUT TYPE=TEXT NAME=“Name”><BR> |
|
|
|
Your Age: <INPUT TYPE=TEXT
NAME=”Age”><BR> |
|
|
|
<INPUT TYPE=SUBMIT VALUE=“Submit”> |
|
<INPUT TYPE=RESET> |
|
</FORM> |
|
|
|
|
|
|
|
|
Tables are often used to make forms look pretty
- remember that you can use any HTML tags to control formatting of a form. |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
<TABLE><TR> |
|
<TD>Your Name: </TD> |
|
<TD><INPUT TYPE=TEXT NAME=“Name”></TD> |
|
</TR><TR> |
|
<TD>Your Age:</TD> |
|
<TD> <INPUT TYPE=TEXT NAME=”Age”></TD> |
|
</TR><TR> |
|
<TD><INPUT TYPE=SUBMIT VALUE=“Submit”></TD> |
|
<TD><INPUT TYPE=RESET></TD> |
|
</TR></TABLE> |
|
</FORM> |
|
|
|
|
|
Checkboxes |
|
present user with items that can be selected or
deselected. Each checkbox has a name and a value and can be initially
selected/deselected |
|
Example checkbox definitions: |
|
<INPUT TYPE=checkbox name=chocchip
value=1> |
|
<INPUT TYPE=checkbox name=oreo value=1> |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
Select all the cookies you want to
order:<BR> |
|
|
|
<INPUT TYPE=CHECKBOX NAME=Oreo Value=1> |
|
Oreo<BR> |
|
<INPUT TYPE=CHECKBOX NAME=Oatmeal Value=1> |
|
Oatmeal<BR> |
|
<INPUT TYPE=CHECKBOX CHECKED NAME=ChocChip
Value=1> |
|
Chocolate Chip<BR> |
|
|
|
<INPUT TYPE=SUBMIT VALUE=Submit> |
|
</FORM> |
|
|
|
|
|
|
|
Radio Buttons are like checkbox except that the
user can select only one item at a time. |
|
All radio buttons in a group have the same NAME. |
|
|
|
<INPUT TYPE=radio name=cookie
value=chocchip> |
|
<INPUT TYPE=radio name=cookie value=oreo> |
|
<INPUT TYPE=radio name=cookie
value=oatmeal> |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
Select all the cookies you want to
order:<BR> |
|
|
|
<INPUT TYPE=RADIO NAME=Cookie Value=Oreo>
Oreo <BR> |
|
<INPUT TYPE=RADIO NAME=Cookie
Value=Oatmeal> Oatmeal <BR> |
|
<INPUT TYPE=RADIO CHECKED NAME=Cookie
Value=ChocChip> ChocolateChip<BR> |
|
|
|
<INPUT TYPE=SUBMIT VALUE=Submit> |
|
</FORM> |
|
|
|
|
The TEXTAREA tag creates an area where the user
can submit multiple lines of text. |
|
This is not another type of <INPUT> tag! |
|
|
|
|
Each TEXTAREA tag has attributes NAME, COLS and
ROWS. |
|
|
|
<TEXTAREA name=address rows=5 cols=40> |
|
default text goes here (or can be empty) |
|
</TEXTAREA> |
|
|
|
|
<FORM METHOD=POST ACTION=cgi-bin/foo> |
|
Please enter your address in the space
provided:<BR> |
|
<TEXTAREA NAME=address COLS=40 ROWS=5> |
|
</TEXTAREA> |
|
<BR> |
|
<INPUT TYPE=SUBMIT VALUE=Submit> |
|
</FORM> |
|
|
|
|
|
|
|
When the user presses on a SUBMIT button the
following happens: |
|
browser uses the FORM method and action
attributes to construct a request. |
|
A query string is built using the (name,value)
pairs from each form element. |
|
Query string is URL-encoded. |
|
|
|
|
For each checkbox selected the name,value pair
is sent. |
|
For all checkboxes that are not selected -
nothing is sent. |
|
A single name,value pair is sent for each group
of radio buttons. |
|
|
|
|
|
There are other form field types: |
|
SELECT - pulldown menu or scrolled list of
choices. |
|
Image Buttons |
|
Push Buttons (choice of submit buttons) |
|
|
|
|
|
|
Nothing is displayed by the browser. |
|
The name,value are sent along with the
submission request. |
|
<INPUT TYPE=HIDDEN |
|
NAME=SECRET |
|
VALUE=AGENT> |
|
|
|
|
|
Anyone can look at the source of an HTML
document. |
|
hidden fields are part of the document! |
|
If a form uses GET, all the name/value pairs are
sent as part of the URI |
|
URI shows up in the browser as the location of
the current page |
|