W3Pal Browser Tutorial

Back to Appliations

Table of Contents



1.   Document Scope

The target audience for this tutorial include Web developers, editors, site administrators, and researchers. Experience with computers, software, the World Wide Web, HTML and web design is expected and highly recommended.

This tutorial describes the basic features of the W3Pal Browser Interface. It is broken into four major parts. The first part gives a simple overview of the W3Pal web mining software suite and explains how the browser is utilized. The following three parts focus on the Browser itself. The first of these explains the visualization functions of the Browser. The second focuses on grouping graph sections for clarity. The final part describes the Collapse feature which aids in visualizing web site structure.

After reviewing this section of the manual you should be familiar with the Browser interface and some of its useful features. You should be able to load graphs into the W3Pal browser and perform basic web mining techniques with ease.

2.   W3Pal System Overview

The W3Pal system is designed for the analysis and synthesis of web pages.

W3Pal is a suite of Web Applications for researchers, web masters, and anyone interested in web site analysis. W3Pal offers a variety of programs for:

W3Pal makes it possible to view an entire web site, at a glance, as a graph which maps out every page, object, and hyperlink. The system quickly reveals broken links, orphan pages, poorly structured sections of the site, and many other aspects that are important to web developers and site managers.

The system is also extremely interactive and graph objects can be clicked and opened in a web browser. Apache log file information can be added to web site graphs allowing web page and hyperlink hits to be viewed. In addition, individual user sessions can be viewed and mapped so users' actions on a web site can be analyzed. The system also has functions that allow the most traversed path in a website to be viewed individually.

3.   W3Pal Browser Overview

The W3Pal Browser is the part of the W3Pal system that is responsible for displaying graph information. The Browser will display any graph formatted in XGMML or LOGML. XGMML, the Extensible Graph Modeling and Markup Language, is the XML language for describing graph structures. LOGML extends XGMML's features by adding log file information. The browser has a variety of functions and algorithms for visualizing, analyzing, and traversing XGMML and LOGML graph structures.

Figure 1 is the representation of a small web site graph loaded into the W3Pal browser window. The browser displays broken links as triangles, objects (such as Powerpoint slides and pdf files) as colored ovals, internal HTML documents as squares, and external HTML documents as octagons. In Figure 1, the browser is displaying the title of the objects. This is one of the many options available. The browser can also display file names, paths, URLs, and object IDs. These features are described in the Options Dialog section.


Figure 1: Sample of a typical website viewed in Browser Window

Figure 1:   Sample of a typical web site viewed in BrowserWindow

The image in Figure 1 is of a very small website. W3Pal works very well with small websites, but most sites of interest are very large with thousands of documents and objects. It is very easy to see the structure of the website in figure 1.0, but what about larger websites? The following sub sections describe how W3Pal is designed to analyze extremely large graphs.

4.   Getting Started

This subsection starts by describing the process for loading and displaying graph files into the W3Pal browser. Afterwards, the basic visualization methods are briefly overviewed.

4.1   Load and Display Graph

Loading graph files into the W3Pal Browser is a simple task. First we must select the location of the file we want to load. From the file menu, we can select Load Graph or Load Graph (URL) to load our graph file from a local disk or a URL, respectively. This assumes that we have already generated a graph file, or have the internet location of one. The option to load a graph from a URL gives site designers the ability to provide the public with structured graphs of their websites. (Instead of traversing a website by hitting all the pages, a user could download the site graph and go directly to the page of interest).

Once we have loaded a graph into the Browser, it is displayed by clicking the "Display Graph" button. This instructs the Browser to display the graph's nodes and edges. Initially, the browser does not display the graph, because it could be very large. We may want to do optimizations before loading a graph with hundreds of thousands of nodes and edges (a process that could take a long time).

The browser initially dumps all the graph's nodes and edges into the viewer window with no formatting. (They will be clumped at the top of the browser workspace.) In order to get a useful display of the graph's structure we need to apply a visualization algorithm to it. Figure 2 was created using the Radial Tree algorithm starting from the root node. The root node, in this case, happens to be the index page of the website. We can select any node to start from, but the index page is usually the best node to choose as the root.

Large Website graph displayed in Browser Window

Figure 2.0:   Graph of XMLJ Web site



Without any further modification we can already see the web site's basic structure. However, the Browser has many features to give us a much clearer picture of what the actual structure of the site is. Before we get into those, we should go over the other visualization algorithms provided by the W3Pal Browser. The Radial Tree algorithm is one of many.


4.2   Draw Modes

Web sites generally fall into different structure categories. For example, a web site describing a book or online slide presentation would have a much different structure than a personal or corporate homepage. The following algorithms are included in the W3Pal Browser to better visualize different types of graphs:

The above algorithms are extremely useful in partitioning graphs into identifiable sections. However, that is only the beginning, graphs with hundreds of thousands of nodes need to have more done to reveal their structure. W3Pal has the ability to apply the above algorithms to Sub-sections of a graph. So we could start by visualizing a whole graph as a Radial Tree and then select a few of its branches and render them as Circular or whatever else we see fit. The next section explains how to group and render Graph Subsections.

5.   Grouping Graph Subsections

W3Pal makes it very easy to partition graphs into subsections. Figure 3, shown below, is the graph of a website that has been partitioned into three different sections; each individually rendered. The first step in this process is to select portions of the graph that we wish to render separately. Once the portions have been selected, the algorithm we wish to use on our selection is chosen and the image can be redrawn. The selected portions are rendered without affecting the rest of the image.


Graph of website with individually grouped subsections

Figure 3:   Graph of XMLJ Website with Subsections Individually Grouped


The resulting image clearly shows us that the graph of this website has three major sections; two of which have a specific structure. The ability to group and render graph subsections makes large websites easier to understand and helps to reveal the overall structure of a graph. This ability allows us to create graphs like the one shown in Figure 3. As mentioned earlier, certain types of structures are better viewed with different algorithms. Now we can split a graph into subsections and use the most appropriate algorithm for each individual subsection. Section 5.1 explains how we select and group graph subsections and section 5.2 explains the mouse commands and operations which are helpful in this operation.

5.1   Group Subsections

The first task is to select the nodes we wish to render. This process is usually quite straight forward. Looking back to Figure 2, we see that, aside from the root node, there are two nodes that have several children branching off. The first of these nodes is located on the far left side of the graph and the second is located in the lower right of the graph. We begin by right clicking one of these nodes and selecting the option Select Tree. Next, we right click the mouse on the workspace window and select the option Highlight Selected Vertices. Now all the nodes in the desired subsection are highlighted. Next we open the Tools menu and select the Options item. This will open the W3Pal Options Dialog (Figure 6). Place a check mark in the Draw Highlight check box and then click the OK button. Finally, select the visualization algorithm, to use on the highlighted nodes, from the Draw Modes menu button. This action will cause the browser to reorganize the selected nodes.


5.2   Mouse Commands & Operations


Figure 4:   Mouse Command Menu

Mouse Command Menu

The mouse command menu (figure 4) has a variety of options for selecting and highlighting vertices and edges. Single Vertex Selection Mode and Single Vertex Deselection Mode are used to select and deselect verticies. Once one of these modes is selected, left clicking a node in the browser window will select or deselect nodes. The user can also draw a box with the mouse to select or deselect all the nodes that fall within the box.

The following selection options are for selecting specific types of nodes. Leaves are nodes that have no children. Sources are nodes than have no parents. Objects are nodes that represent non-html documents. Documents are nodes that represent HTML documents. The Complement are all nodes that are not selected. Neighbors are all nodes connected to the selected nodes by one edge. Finally, Shape selects nodes that have specific shapes. Different node types are represented by different shapes, for example, a mailto link is represented by an oval, normal HTML documents are represented by squares and non-HTML documents are represented by circles.

The next set of options are for highlighting Vertices. Highlight Vertex Mode is used for highlighting vertices, regardless of whether they are selected or not. Clicking a vertex will toggle its highlight and drawing a box will highlight all vertices within.

The final set of options are for selecting and highlighting edges. Select Edge mode works the same way for edges as Single Vertex Selection Mode works for nodes. Highlight Edge Mode works for edges the same way Highlight Vertex Mode works for nodes. SelectEdge by Class is used to select foward, backward, and tree edges by their class. Select Shortest is used to select the shortest or longest path between vertices. Deselect All Edges simply deselects all the currently selected edges.

6.   Collapsing Subgraph Sections

Large web graphs tend to have many subsections which can be grouped into single entities. The option to collapse Sections is designed to make these types of graphs more clear. Consider the XMLJ website displayed in figure 4, the two large circular sections are the slide presentations for class 1 and class 2. If there were 40 or 50 classes, the graph would be difficult to work with. However, W3pal gives us the ability to collapse each of these sections into a single node. Figure 5 shows the XMLJ website where the slide presentations have been collapsed into single nodes. The two nodes for the class lectures are now represented by pentagons in the image.


Graph of Website with Subsections Collapsed

Figure 5:   Graph of XMLJ Website with Subsections Collapsed


6.1   Collapse Section

The first step to collapse a group of nodes is to select the desired nodes. The best way to do this is to right click a node and choose the Select Children option. Once the nodes are selected, simply click the Collapse button located on the tool bar of the W3Pal browser window. This will cause the selected nodes to be shrunk into a single node. The inward and outward edges of each collapsed node will now be connected to the newly created pentagon node.

6.2   Options Dialog

The final display of the XMLJ website (figure 5) is different from the other images in that the vertex titles are now displayed. The vertex titles could have been displayed at any other time, but it would have made the images appear cluttered. In figure 3, we chose to display only the three hub nodes' labels. These nodes were handled individually by first highlighting them and then choosing theDraw Highlight option in the W3Pal options dialog. The W3Pal options dialog, shown in figure 6, has the varius options for displaying vertex information. ID displays the vertices' unique identifiers, Label displays the vertices' URL, Short Label displays the file or directory name of the vertices, Title displays the title of the HTML pages, and Short Title displays the first ten characters of the title.


W3Pal Browser Options Dialog

Figure 6:   W3Pal Browser Options Dialog


7.   Glossary of Terms

W3Pal
World Wide Web Pal, Graph software designed for the analysis and synthesis of web pages.
W3Pal Browser
Application for displaying and rendering web site graphs
Web Mining
Information and Pattern Discovery on the World Wide Web. The discovery of locations of unfamiliar files on the network, the acquisition of useful information from the WWW and, the discovery of information patterns from said resources.
Orphan Pages
HTML pages that have no Hyperlinks.
User Session
The set of HTML pages visited and hyperlinks traversed.
GML
Graph Markup Language, a text based markup language to describe graph information.
XGMML
Extensible Graph Markup and Modeling Language, XML vocabulary for describing graph information.
LOGML
Log Markup Language, XML vocabulary for describing web site statistics
URL
Uniform Resource Locator, a unique location on the world wide web