| Question:   | When my server is running and the browser sends an crawl request to my server, suppose the depth requirement is 3. During my server is crawling web and it does not finish yet, the browser disconnect with my server. What should happen to my server? In this case, I get a "broken pipe" error and my server crashed. I guess the reason is that before my server get the 'FIN' signal, it still try to send browser more info., but got failed. Is that correct? How should I deal with this problem and keep my server from crash? |
| Answer:   |
By default, the signal SIGPIPE will kill a process. Your process is receiving this signal when you write to a socket that has been closed by the other endpoint. To handle this you should:
The book contains a complete description of this situation, and sample code for dealing with it. |
| | |
| Question:   | What kind of URI should we expect from the client (browser)? |
| Answer:   | The URI sent by the client as part of the HTTP request line will
be a string starting with '/' followed by a hostname and path. Following
the path will be a single '?', followed by a number. The hostname and
path are to be used as the starting point for your crawl. For example,
if the URI received is :
/www.foo.com/blah/foo.html? the first page you should fetch will be the page You do not need to worry about URIs that contain multiple '?'s as part of the request you receive from the client. |
| | |
| Question:   | What should we do with HTTP response codes other than "OK"? |
| Answer:   | If an HTTP server sends back a response code of anything other than "OK" you can treat this as a blank page - so you don't need to try to follow any links in the content of the page, and you don't need to follow redirects. |
| | |
| Question:   | When read 'reads' data into the character buffer, is there a NULL appended to the end of that data? |
| Answer:   | No. read() doesn't know anything about C strings, or null. |
| | |
| Question:   | I am unsure of what type of input our server has to handle. What if someone is using a TELNET client (or the browser sends a corrupted request). also, my server will sit there and wait until read returns a string with '\r\n', this is OK as long as more data is on the way, but if there is no end of line marker the server will wait forever - is this acceptable? |
| Answer:   | Your server expects an HTTP request, which requires end of line markers (either "\n" or "\r\n"). It's fine if it sits there until it gets an end of line marker (it can't do anything else!). Your server should not crash if I use telnet and give it a messed up request - but it doesn't need to try to handle it other than to close the connection and move on to the next client. In general it's best to send an HTTP response that indicates there was an error (but it's not required for this assignment). |
| | |
| Question:   | What if the server sends more the Content-length bytes in
the response? |
| Answer:   | The server could do this - remember that even the server can be written
by a bad guy! You should only use the first content-length
bytes sent, but your program should not crash or allow a buffer to be overflowed
if the server sends more than it says it will! |