WWWPal Client-Server System for Webgraphs
John Punin, Yongxing Wang, and Mukkai Krishnamoorthy
Computer Science Department, Rensselaer Polytechnic Institute,
Troy, NY, USA.
Abstract
We describe a Three-tier Client-Server System for displaying
and manipulating Webgraphs. Webgraphs are graphs, a collection of nodes
(URL's) and edges (describing the interconnecting links) of a user's, department's
or an organization's Web site. We also provide a XML design and a DTD for
describing Webgraphs.
1. System Architecture
The architecture of our client-server system is three-tier consisting of
Graph Client, Web Server and Graph Server. The
Web Server is the "agent" between the Graph Client and the Graph Server.
We use the Graph Visualizer of the WWWPal System [3] as the Graph Client.
The Graph Client communicates with the Web Server using HTTP
Protocol. The Graph Client uses libwww of W3C to implement the HTTP Protocol.
The communication between the Web Server and the Graph Server is
through the API of the Web Server or a CGI program. The purpose
of the Graph Visualizer is to display the graphs that the server sends
back as responses. The Graph Visualizer interacts with the Graph Library
to further analyze the received graphs. The format of the Graphs is a new
graph language XGMML (eXtensible Graph Modeling and Markup Language)[6] based
on XML [4].
Figure 1: System Architecture
2. System Design
The Graph Client of the System is an updated version of the Graph Visualizer
of the WWWPal System. The purpose of the WWWPal System is to organize web
documents. The Graph Visualizer is a component of the WWWPal System to
display Webgraphs. We incorporated WWWPal with a communication module,
based on the Libwww of W3C [5]. This enables the Graph Visualizer to send
a request to the Web Server using HTTP Protocol. The Web Server
will redirect the request to the Graph Server where the request will be
processed. The Graph Server will send back a Webgraph to the Graph Client
so the user can visualize and analyze this graph.
The Graph Server communicates directly to the Web Server using the
API of the Web Server. We have provided a CGI interface so
the Graph Server can interact with any Web Server. We use webbot of the
W3C to explore a Web Site and save its structure in a Webgraph. The Graph
Server reads the Webgraph of a Web site and is ready to answer any inquires
about the structure of the Webgraph. Examples of the requests can be:
-
Pages of a user (John Punin) or directory (Guide of Computer Science Department).
-
Pages as result of searching a word in the title, url or keywords.
-
Most visited Web pages, most visited Web hyperlinks and most visited path
of a web site.
-
Pages with highest authorities and hubs in a Webgraph. The Graph Server
implements Kleinberg's Algorithm [2] in the Webgraph.
-
Pages with problems such as: Broken links, Dead ends, Huge files.
-
Pages in the most expensive paths of a Webgraph. The Graph Server can compute
the cost of navigation from the root of the Webgraph to any of the nodes
in the graph.
-
Pages in a web collection. A Web Collection is found following the tag
<LINK> in the Web documents.
Webgraphs are written in XGMML (eXtensible Graph Modeling and Markup Language)
based on XML (eXtensible Markup Language) and GML (Graph Modeling Language)
[1]. A DTD [6] is provided to ensure validation of the Graph File. We
used GML because it is a powerful and general language to describe a graph.
We expressed GML with XML so any XML parser can parse graphs written in
XGMML.
Figure 2 shows the Graph Visualizer displaying a Webgraph written in XGMML. This Webgraph was
obtained from the Graph Server. The request was to get all Web pages under
the url: http://www.cs.rpi.edu/~puninj/JAVA/
Figure 2: Graph Visualizer
References
-
Michael Himsolt, "GML: A portable Graph File Format", Technical Report,
Universität Passau, 94030 Passau, Germany, 1997.
-
J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc.
9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM
Research Report RJ 10076, May 1997.
-
John Punin, Mukkai S. Krishnamoorthy, "WWWPal System - A System for analysis
and synthesis of Web pages", In Proceedings of the WebNet 98 Conference,
Orlando, November, 1998.
-
Extensible Markup Language W3C Working Draft at http://www.w3.org/TR/REC-xml
-
Libwww - The W3C Sample Code Library at http://www.w3c.org/Library/
-
XGMML - eXtensible Graph Modeling and Markup Language at http://www.cs.rpi.edu/~puninj/XGMML/