EIW Fall 2003 Lecture Notes

Internet Application Protocols


Application Level Protocols

There are many applications/services in wide use on The Internet, we will look at the application level protocol that supports some of these services. These protocols are defined at the application layer in the OSI reference model (the highest layer) and involve the actual data exchanged by applications. There is no mention of how individual chunks of data are delivered on a network, or routed among networks - these issues are dealt with at lower level layers. Here we are just concerned with the structure and content of the data that is sent from one process to another, this data is the "payload" of the lower level layers.

Datagram (message oriented) vs. Stream

We know TCP/IP supports two types of inter-process communication:

In situations that involve complex requests that cannot be structured to fit within a single datagram, a stream based protocol will be easier to use. If requests and replies are small, then a datagram protocol might simplify things and result in faster communication (less overhead).


Trivial File Transfer Protocol (TFTP) ref: RFC 783

TFTP is a simple file transfer protocol that is designed to be easy to implement. At one time TFTP was often used to bootstrap diskless workstations, so the client program needed to be resident in ROM. TFTP supports basic file transfer and nothing else - it won't support user authentication and isn't particularly efficient. Although this protocol is not used much today it's a good example of a datagram-oriented protocol, and provides an example of how we can build reliability in to a UDP based service (UDP doesn't provide reliability).

TFTP supports two types of requests: Read File and Write File. A read request means that the client is asking the server to send a file, a write request indicates that the client would like to send a file to the server. Since this protocol uses UDP to deliver datagrams, each individual message is delivered unreliably (the sender doesn't know if the message was received) and the messages are not necessarily delivered in the order they were sent. TFTP supports transfer of large files, so the file transfer involves many messages (it can't all fit in to a single datagram).

There are 5 different message types:

Read Request - sent by the client when asking the server to send a file.

Write request - sent by the client when asking the server to accept a file.

Data - the message contains part of a file being transferred.

ACK - acknowledgment of a data message received

Error - the message contains an error code.

TFTP messages are NOTASCII encoded messages (although some parts of messages are ASCII strings). Data is sent as raw bytes.

Each time a process receives a message (datagram) it must look at the first 2 bytes to determine which kind of message is contained in the datagram. Shown below are graphical representations of the five message types:


TFTP READ REQUEST


TFTP WRITE REQUEST


TFTP DATA MESSAGE


TFTP ACKNOWLEDGEMENT MESSAGE


TFTP ERROR MESSAGE


TFTP Error Codes
00 Not defined
01 File not found
02 Access Violation
03 Disk Full
04 Illegal TFTP message type
05 Unknown port
06 File already exists

Each datagram sent during a TFTP session must be on of the 5 types shown above or an error results and the receiving system will terminate the session (will ignore subsequent messages). A typical exchange of messages involves an initial message from the client that is either a read request or a write request. We will look at a sample message trace for a read request (a write request is similar, although the data messages are sent from the client to the server and the server must send an ACK for each Data message received).

Client   Server
Send Read Request  
  Receive Read Request
  Send Data block #0
Receive Data block #0  
Send ACK block #0  
  Receive ACK block #0
  Send DATA block #1
Receive DATA block #1  
Send ACK block #1 (is lost)  
   
    Timeout waiting for ACK!
  Resend Data block #1
Receive DATA block #1  
Send ACK block #1  
  Receive ACK block #1
  Send Data block #2 (<512 bytes)
Receive block #2  
Send ACK block #2  
Done Receive ACK block #2
    Done

TCP Based Application Protocols

Most application protocols used on The Internet are based on TCP, so applications don't need to take care of message ordering or reliability - the transport layer takes care of this. Many of the TCP applications are based on exchanges of ASCII requests and replies, which makes it easy for humans to "play" with servers directly by using a generic TCP client that forwards keystrokes to the server (as ASCII text) and sends everything sent by the server to the screen. One such generic TCP client is the telnet program.

telnet - a generic TCP client.

telnet is available on most Unix systems and is part of Windows 95/98/2000/NT. The telnet program allows the user to specify the address of a server (as an IP address and port number) and attempts to open a TCP connection to the specified process. Once a connection is established the user can send a request by typing commands and view anything sent back by the server. telnet is useful only for services that involve ASCII strings, binary data cannot typically be entered via a keyboard or displayed in a window. You may have used telnet to login to a remote Unix computer - in this case you are simply accessing the "default" port on the server machine you tell telnet to connect to. This default port is connected to a remote session server that starts up a shell so you can type commands as if you were sitting at the console. Although this is the most common use of telnet, it is no different than connecting to any other types of network service that is available via TCP. We will play with some other types of network services using the telnet program, but first we need to learn the application level protocols so we know what to type.

HTTP – Hyper-Text Transfer Protocol ref: RFC 2068

HTTP is the application level protocol used to transfer hypertext documents on the WWW. The protocol itself is fairly simple, a client (typically a browser) establishes a TCP connection to a HTTP server, sends a request in the form of an ASCII string and expects a reply. The reply is often also formatted as an ASCII string, although, many other data formats can be returned by the server (for example - images are sent as binary data). For now all we need to know is the structure of the request:

HTTP Request: An HTTP request is a sequence of lines of text - each line is terminated by a CR LF pair. When using the telnet program you can send a CR LF pair by pressing the Enter key. The first line includes three parts:

  1. The request-method. There are a variety of methods supported by HTTP, including:
  2. a resource-identifier. There is a strict format (syntax) for the resource identifier - strings that adhere to this format are called URIs (Universal Resource Identifier). URIs are composed of simple alphanumeric names (some punctuation characters are permitted) delimited by the character "/". A URI looks like a UNIX file path, for example the following are valid URIs:
    • /~hollingd/eiw
    • /
    • /cgi-bin/pizza_server
    • /foo/bar/foo/bar/foo/bar/foo/bar/
  3. an HTTP version identifier that specifies the version of HTTP that the client understands. This string starts with the prefix "HTTP/" and is followed by a version number (for example "HTTP/1.1").

The three parts of the first line are separated by white space (blanks). Some examples of the first line of an HTTP request follow:

GET /~hollingd/eiw HTTP/1.1

GET / HTTP/1.1

GET /cgi-bin/signup?name=dave&address=amos+eaton+119 HTTP/1.1

POST /cgi-bin/signup.perl HTTP/1.1

HEAD/ HTTP/1.1

The remaining lines of an HTTP request are called "header lines", each included header contains additional information about the client or the request that may help the server provide a response. Each header includes a field name, followed by a colon ":", followed by a field value. There are many predefined header fields, some typical fields are shown below:

User-Agent: generic browser

From: hollingd@cs.rpi.edu

Referrer: http://badguy.com/easytargets

Accept: */*

Cookie: favorite=chocolatechip;

The (possibly null) list of header lines is terminated by a blank line (just a CRLF pair). Once the server sees blank line it knows it has the complete request and now sends back a reply. Each reply contains a single line status code, followed by a list of header lines terminated with a blank line. If the reply includes some content (typically some HTML) this follows the blank line. Although there are mechanisms that the client can use to make multiple requests using a single TCP connection, in general the server closes the connection as soon as it has sent a complete reply.

Example HTTP Session:

The following shows an example session using telnet to connect to an HTTP server. The request (typed in by the user) is shown in italics and include no header lines:

GET / HTTP/1.0

HTTP/1.0 200 Ok
Server: Xitami
Content-Type: text/html
Content-Length: 313
Last-Modified: Sat, 02 Sep 2000 14:21:28 GMT

<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type"CONTENT="text/html; charset=windows-1252">
<META NAME="Generator"CONTENT="Microsoft Word 97">
</HEAD>
<BODY>
<P>Hi Dave</P>
</BODY>
</HTML>


Email protocols

There are a number of protocols in use on The Internet that support electronic mail, in addition there are now a number of very popular Web based email systems. Before exploring the application level protocols that support email we will look at the structure of the Internet based email system.

Overview of The Internet email system.

MTA is a Message Transfer Agent (a.k.a. an SMTP server). The MTAs forward and/or store email messages.

UA is a User Agent (a mail client). User Agents provide the user interface, and sometimes also act as an SMTP client or POP client.

SMTP - Simple Mail Transfer Protocol ref: RFC 821

SMTP is the protocol used by mail servers to exchange email messages. SMTP supports sending of email, but does not support extraction of a user's email from a server - that function is supported by a number of other protocols including POP and MAP. SMTP conversations take place over a TCP connection and are based on a series of command-reply exchanges, this type of protocol is known as a "lock-step" protocol - a number of exchanges must take place in the proper order, the entire sequence of exchanges makes up a transaction. Although SMTP servers support a variety of commands, we will just look at a typical exchange that results in the creation of an email message.

SMTP exchanges are based on lines of ASCII text, just like with the HTTP protocol. Each line coming from the client starts with a request type and any parameters follow the request type on the same line. Below are listed some of the major request types:

HELO Establish SMTP connection, identify the client.
MAIL FROM: Tells the server the client wants to initiate the sequence of steps necessary to create an email message. Includes the email address that identifies the receiver of the message.
RCPT TO: Tells the server who the message is from (the email address of the sender).
DATA Tells the server that the content of the message follows (email message content can include a number of email headers).
VRFY Verify that an email address is valid (used for local email addresses).

To exchange an email message (the client is the sender), the following exchange occurs (typically):

This exchange takes place in the order shown, and the client should wait for each status response before proceeding. The only time a line is sent without receiving a status response in during the DATA command.

Question: Since a '.' is used to terminate the DATA command, is it possible to have a line in an email message with nothing except a "."?

Here is a sample trace containing a complete SMTP conversation, the lines send by the client are shown in black italics, the lines sent by the server are shown in blue:

220 cs.rpi.edu ESMTP Sendmail 8.8.8/8.8.8; Sat, 14 Mar 1998 21:28:41 -0500 (EST)
HELO foo.cs.rpi.edu
250 cs.rpi.edu Hello hollingd@foo.cs.rpi.edu [128.213.4.203], pleased to meet you
MAIL FROM: TheKingOfSiam@king.com
250 TheKingOfSiam@king.com... Sender ok
RCPT TO: hollingd@cs.rpi.edu
250 hollingd@cs.rpi.edu... Recipient ok (will queue)
DATA
354 Enter mail, end with "." on a line by itself
Hi Dave - this message is a test of SMTP
My name is really Joe Smith and I think you are going to get
fired to teaching us how to forge email! Do you have any idea who
will be your replacement?
.
250 VAA07541 Message accepted for delivery

POP - Post Office Protocol (version 3) ref: RFC 1939

POP is similar to SMTP, it involves command/reply lockstep protocol. Unlike SMTP, POP is used to retrieve mail for a single user, typically the POP server has access to a database email messages created by an SMTP server. POP connections require authentication - the user must somehow "prove" they are who they say the are. Typically this proof is in the form of a secret that is shared by the user and the POP server (a password).

POP commands and replies are formatted as ASCII lines, and all replies start with either "+OK" or "-ERR". Some of the commands that make up the POP protocol are listed below:

USER specify username
PASS specify password
STAT get mailbox status (number of messages in the mailbox)
LIST get a list of messages and sizes, one per line, termination line contains just a period.
RETR retrieve a message.
DELE mark a message for deletion.
QUIT remove marked messages and close the (TCP) connection

FTP - File Transfer Protocol ref: RFC 959

FTP is a protocol that supports the transfer of files. This protocol is more complex than TFTP, but provides a richer set of services including authentication. FTP is supported by most Web browsers, so you can retrieve files from an FTP server by simply clicking on an appropriate hyperlink (or by entering an ftp URL).

There are two TCP connections used to transfer a file using FTP, the initial connection is used for an exchange of commands and replies and the second connection is used to transfer a file (FTP supports transfer in either direction). Once the initial connection is established the client must authenticate itself with the server by supplying a username and password. Once authentication has been done the server will accept requests for file transfer. A file transfer includes establishing a second TCP connection - this connection can be made on any port - the "server" end of the connection creates a TCP endpoint and then sends the peer the new port number.

Standard FTP Connection Model (2 processes)

FTP can also be used by a client to transfer files between two remote computers, in this case, the client establishes connections to both servers and then tells the servers to exchange the desired file.

FTP Alternative Connection Model (3 processes)