CSCI-4220 Network Programming

Project 2 - Proxy Web Server
Frequently Asked Questions


karpes@rpi.edu suggests a good "torture test" for a proxy seems to be browsing at http://www.wired.com/

I can't verify this since Netscape 4.07 for Linux crashes as soon as I attempt to visit the site at all (without a proxy)!

Q: I'm trying to use gethostbyaddr to determine the hostname of the client, but it always returns NULL:
  struct sockaddr_in from;      
  ...
  if ( (sd = accept( ld, (struct sockaddr*) &from, &addrlen)) < 0) {
  ...
  if((hptr=gethostbyaddr(&from,sizeof(from),AF_INET))==NULL){
  ...

A: gethostbyaddr expects an IP address only, not the entire sockadr_in, you want something like this:

  if((hptr=gethostbyaddr(&from.sin_addr,sizeof(from.sin_addr),AF_INET))==NULL){
Q: Could you give us some reachable URLs which include port number for the purpose of testing?

A: I found one: http://www.lpwa.com:8000/

Q: When using a browser connected to my proxy I sometimes see a bunch of GET requests even though I've only told the browser to get a single document - is this normal?

A: Yes, this is normal. Once the browser tries to render some HTML it may find that it needs some images, so for each image it has to make another GET request.

Q: How important is the RFC931 authentication lookup (how many points)?

A: The RFC931 lookup is worth 5 points (out of 20).

Q: The Textbook talks about inet_ntop and inet_pton functions, but I can't find them.

A: These functions are part of IPV6 and are not available on most machines. You can use the IPv4 specific functions inet_ntoa and inet_aton. The CS BSD machines do seem to support inet_ntop and inet_pton, although there are no man pages for them.

Q: Can you tell me how to get a view of the sample code for proxy web server?

A: No. Only the executable is available.

Q: I get unresolved references when trying to build my project on a machine running Solaris.

A: You need to add the sockets library and the name server library - something like this:

    gcc -o server server.o -lsocket -lnsl

Q:Is this a valid URL to test our project on:
       GET http://www.rpi.edu:12345/~blah/foo.html HTTP/1.0
where the port is given and there is a path as well?

A: Yes - you need to be able to handle any URL, including those that specify port numbers.

Q:My program works perfectly for some urls. But it desn't work for others.

A:You may be bumping into problems with the transition between HTTP 1.0 and HTTP 1.1. According to the HTTP 1.1 spec. proxy servers must remove the "http://hostname:port" part of the URI, although it also requires that all HTTP servers are able to deal with absolute URIs (that include the "http://hostname:port" part). It appears that not all the servers on the WWW can deal with absolute URIs, so the best thing to do is to make sure you don't forward the "http://hostname:port" part of the request. My proxy does this and seems to work fine.

Q: Help! I don't understand what a proxy web server is!!!

A: A proxy web server accepts HTTP requests through a known TCP port, and forwards each request to the real server, sending back any reply to the client. So your program must be able to act as a server (to receive HTTP requests) and as a client (to make the HTTP requests from the real server). You will need to write code that does something like this:

  1. establish a passive mode TCP socket and print out the port number bound to the socket.
  2. accept a TCP connection from a HTTP client.
  3. Read the first line sent by the client and parse it. If it is not a GET request you can ignore it (close the connection and go back to step 2). If it is a GET request you need to parse the URL to determine the hostname and port number. Here is an example URL:
    http://www.foo.com:1234/funny/pages
    In this case your program needs to know that the HTTP server it should contact is running on the host www.foo.com on port 1234.
  4. Establish a TCP connection to the host,port specified in the URL (creating a new TCP socket and calling connect()).
  5. Forwarding the GET request (and any following HTTP header lines) to the HTTP server.
  6. Reading everything sent back by the HTTP server and sending it back through the socket to the client.
  7. Closing the connections to the server and client.
  8. go back to step 2 to handle the next client.
These steps make up a (minimal), iterative proxy HTTP server. For this assignment you also need to do an RFC931 lookup on the client and to print the result of the lookup along with the HTTP request.
Q: In regards to the auth protocol, the RFC states that the format of the response is this:
<local_port>, <foreign_port> : <message> : <additional-info>
Obviously, the local and foreign port cannot contain white space, however, should we expect the message or additional-info to contain white space or can we assume they do not?

A: The RFC states (end of page 4):

Notes on Syntax:

      1)  White space (blanks and tab characters) between tokens is
      not important and may be ignored.

      2)  White space, the token separator character (":"), and the
      port pair separator character (",") must be quoted if used within a
      token.  The quote character is a back-slash, ASCII 92 (decimal)
      ("\").  For example, a quoted colon is "\:".  The back-slash
      must also be quoted if its needed to represent itself ("\\")."
It is safe to assume the message and additional-info don't contain (quoted) whitespace - that is - although it is legal and should be handled by a real application - we won't be checking for this during grading.
Q: The authentication daemon isn't working. What am I doing wrong?

A: Double check RFC 931 to make sure you understand how authd works. I suggest using telnet to play with the authentication daemon until you get a good feel for what is going on. In particular, make sure you are connecting to port 113 when talking to the authentication daemon. Make sure your proxy server is talking to the authentication daemon on the client rather than the proxy server's host. Make sure you are sending the ports to the authentication server in the correct order. If none of these solve your problem then make sure that authd is actually running on the client host. This can be done by opening a telnet connection to port 113 on the client's host and entering any two values and seeing if you get a response (even an error message) from the authentication daemon. If the authentication server is not running on the client host then run the client from a different machine. Most of the RCS machines have an authentication daemon.

Q: If I run into an error with the authentication daemon, but I can correctly get the hostname do I still print out the hostname?

A: You should still print out the hostname. For example, if the user connects from pres.gov and there is an error communicating with authd on pres.gov, your proxy should print out:

    unknown@pres.gov 
rather than just:
 
    unknown