WWWPal Client-Server System for Webgraphs

John Punin, Yongxing Wang, and Mukkai Krishnamoorthy

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA.

Abstract

We describe a Three-tier Client-Server System for displaying and manipulating Webgraphs. Webgraphs are graphs, a collection of nodes (URL's) and edges (describing the interconnecting links) of a user's, department's or an organization's Web site. We also provide a XML design and a DTD for describing Webgraphs.

1. System Architecture

The architecture of our client-server system is three-tier consisting of Graph Client, Web Server and Graph Server. The Web Server is the "agent" between the Graph Client and the Graph Server. We use the Graph Visualizer of the WWWPal System [3] as the Graph Client. The Graph Client communicates with the Web Server using HTTP Protocol. The Graph Client uses libwww of W3C to implement the HTTP Protocol. The communication between the Web Server and the Graph Server is through the API of the Web Server or a CGI program. The purpose of the Graph Visualizer is to display the graphs that the server sends back as responses. The Graph Visualizer interacts with the Graph Library to further analyze the received graphs. The format of the Graphs is a new graph language XGMML (eXtensible Graph Modeling and Markup Language)[6] based on XML [4].

Figure 1: System Architecture

 

2. System Design

The Graph Client of the System is an updated version of the Graph Visualizer of the WWWPal System. The purpose of the WWWPal System is to organize web documents. The Graph Visualizer is a component of the WWWPal System to display Webgraphs. We incorporated WWWPal with a communication module, based on the Libwww of W3C [5]. This enables the Graph Visualizer to send a request to the Web Server using HTTP Protocol. The Web Server will redirect the request to the Graph Server where the request will be processed. The Graph Server will send back a Webgraph to the Graph Client so the user can visualize and analyze this graph.

The Graph Server communicates directly to the Web Server using the API of the Web Server. We have provided a CGI interface so the Graph Server can interact with any Web Server. We use webbot of the W3C to explore a Web Site and save its structure in a Webgraph. The Graph Server reads the Webgraph of a Web site and is ready to answer any inquires about the structure of the Webgraph. Examples of the requests can be:

Webgraphs are written in XGMML (eXtensible Graph Modeling and Markup Language) based on XML (eXtensible Markup Language) and GML (Graph Modeling Language) [1].  A DTD [6] is provided to ensure validation of the Graph File. We used GML because it is a powerful and general language to describe a graph. We expressed GML with XML so any XML parser can parse graphs written in XGMML.

Figure 2 shows the Graph Visualizer displaying a Webgraph written in XGMML. This Webgraph was obtained from the Graph Server. The request was to get all Web pages under the url: http://www.cs.rpi.edu/~puninj/JAVA/

Figure 2: Graph Visualizer

References

  1. Michael Himsolt, "GML: A portable Graph File Format", Technical Report, Universität Passau, 94030 Passau, Germany, 1997.
  2. J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM Research Report RJ 10076, May 1997.
  3. John Punin, Mukkai S. Krishnamoorthy, "WWWPal System - A System for analysis and synthesis of Web pages", In Proceedings of the WebNet 98 Conference, Orlando, November, 1998.
  4. Extensible Markup Language W3C Working Draft at http://www.w3.org/TR/REC-xml
  5. Libwww - The W3C Sample Code Library at http://www.w3c.org/Library/
  6. XGMML - eXtensible Graph Modeling and Markup Language at http://www.cs.rpi.edu/~puninj/XGMML/