Java Programming Spring 2006 Homework #5

RSS Parsing

Due Date: 4/13

Assignment

This assignment involves writing a java program that can fetch and parse any number of RSS 2.0 files.

NOTE:Feel free to handle other RSS formats if you want, all that is required is 2.0

An RSS (Resource Site Summary) file is a summary of content available via some web site. Many web sites publich RSS files that can be used by RSS readers to provide a concise summary of the content available at any given time.

You job is to write a RSS reader in java. Your program will be given any number of URLs on the command line, you should assume each URL indicates the location of an RSS file. Your program should fetch each RSS file (Use a URL object for this!), and parse it - printing out the title for each item found.

In order to do this, you need to know what an RSS 2.0 file looks like (you need to understand the tags and the tag hierarchy). There are many references available on the web, Wikipedia is a good starting point for information about RSS file formats. In general each RSS file looks something like this (available at the url http://www.cs.rpi.edu/~hollingd/opsys/rss/opsys.rss):

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
 <channel>
  <title>OpSys Spring 2006</title>
  <description>OpSys Spring 2006</description>
  <link>http://www.cs.rpi.edu/~hollingd/opsys</link>

  <item>
   <title>Course Syllabus</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/syllabus.pdf</link>
  </item>

  <item>
   <title>Test #2 is on Mar 31st</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/tests/test2-topics.html</link>
  </item>

  <item>
   <title>Test #1 is on Feb 24th</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/tests/test1-topics.html</link>
  </item>

  <item>
   <title>HW 1 (due 2/7)</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/hw/hw1/hw1.html</link>
   <description>C Programming, readline and regular expressions.</description>
  </item>

  <item>
   <title>Lecture Notes: Threads Programming</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/notes/ThreadsProg/ThreadsProg.pdf</link>
  </item>

  <item>
   <title>Lecture Notes: Chapter 2: Processes and Threads
</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/notes/Chapter2/Chapter2.pdf</link>
  </item>

  <item>
   <title>Lecture Notes: C Programming for C++ Programmers (review)</title>
   <link>http://www.cs.rpi.edu/~hollingd/opsys/notes/cprogcc/cprogcc.pdf</link>
  </item>

 </channel>
</rss>

As you can see, the format is quite simple. Your job is to print the title tag for each item (not the title tag for the whole page). Given the above RSS, your program should print this:

   Course Syllabus
   Test #2 is on Mar 31st
   Test #1 is on Feb 24th
   HW 1 (due 2/7)
   Lecture Notes: Threads Programming
   Lecture Notes: Chapter 2: Processes and Threads
   Lecture Notes: C Programming for C++ Programmers (review)

You can use either a DOM or SAX parser.

Submitting

Submit to WebCT dropbox for HW5. Make sure all your code is commented with JavaDoc comments!