W3Pal Browser Tutorial |
|
The target audience for this tutorial include Web developers, editors, site administrators, and researchers. Experience with computers, software, the World Wide Web, HTML and web design is expected and highly recommended.
This tutorial describes the basic features of the W3Pal Browser Interface. It is broken into four major parts. The first part gives a simple overview of the W3Pal web mining software suite and explains how the browser is utilized. The following three parts focus on the Browser itself. The first of these explains the visualization functions of the Browser. The second focuses on grouping graph sections for clarity. The final part describes the Collapse feature which aids in visualizing web site structure.
After reviewing this section of the manual you should be familiar with the Browser interface and some of its useful features. You should be able to load graphs into the W3Pal browser and perform basic web mining techniques with ease.
The W3Pal system is designed for the analysis and synthesis of web pages.
W3Pal is a suite of Web Applications for researchers, web masters, and anyone interested in web site analysis. W3Pal offers a variety of programs for:
W3Pal makes it possible to view an entire web site, at a glance, as a graph which maps out every page, object, and hyperlink. The system quickly reveals broken links, orphan pages, poorly structured sections of the site, and many other aspects that are important to web developers and site managers.
The system is also extremely interactive and graph objects can be clicked and opened in a web browser. Apache log file information can be added to web site graphs allowing web page and hyperlink hits to be viewed. In addition, individual user sessions can be viewed and mapped so users' actions on a web site can be analyzed. The system also has functions that allow the most traversed path in a website to be viewed individually.
The W3Pal Browser is the part of the W3Pal system that is responsible for displaying graph information. The Browser will display any graph formatted in XGMML or LOGML. XGMML, the Extensible Graph Modeling and Markup Language, is the XML language for describing graph structures. LOGML extends XGMML's features by adding log file information. The browser has a variety of functions and algorithms for visualizing, analyzing, and traversing XGMML and LOGML graph structures.
Figure 1 is the representation of a small web site graph loaded into the W3Pal browser window. The browser displays broken links as triangles, objects (such as Powerpoint slides and pdf files) as colored ovals, internal HTML documents as squares, and external HTML documents as octagons. In Figure 1, the browser is displaying the title of the objects. This is one of the many options available. The browser can also display file names, paths, URLs, and object IDs. These features are described in the Options Dialog section.
The image in Figure 1 is of a very small website. W3Pal works very well with small websites, but most sites of interest are very large with thousands of documents and objects. It is very easy to see the structure of the website in figure 1.0, but what about larger websites? The following sub sections describe how W3Pal is designed to analyze extremely large graphs.
This subsection starts by describing the process for loading and displaying graph files into the W3Pal browser. Afterwards, the basic visualization methods are briefly overviewed.
Loading graph files into the W3Pal Browser is a simple task. First we must
select the location of the file we want to load. From the file menu, we
can select Load Graph or Load Graph (URL) to load our
graph file from a local disk or a URL, respectively. This assumes that we have
already generated a graph file, or have the internet location of one. The
option to load a graph from a URL gives site designers the ability to provide the
public with structured graphs of their websites. (Instead of traversing a website
by hitting all the pages, a user could download the site graph and go directly to the
page of interest).
Once we have loaded a graph into the Browser, it is displayed by clicking
the "Display Graph" button. This instructs the Browser to display
the graph's nodes and edges. Initially, the browser does not display the
graph, because it could be very large. We may want to do optimizations
before loading a graph with hundreds of thousands of nodes and edges (a
process that could take a long time).
The browser initially dumps all the graph's nodes and edges into the
viewer window with no formatting. (They will be clumped at the top of the
browser workspace.) In order to get a useful display of the graph's structure
we need to apply a visualization algorithm to it. Figure 2 was created
using the Radial Tree algorithm starting from the root node. The
root node, in this case, happens to be the index page of the website. We
can select any node to start from, but the index page is usually the best
node to choose as the root.
Without any further modification we can already see the web site's basic
structure. However, the Browser has many features to give us a much clearer
picture of what the actual structure of the site is. Before we get into
those, we should go over the other visualization algorithms provided by
the W3Pal Browser. The Radial Tree algorithm is one of many.
Web sites generally fall into different structure categories. For example, a web site describing a book or online slide presentation would have a much different structure than a personal or corporate homepage. The following algorithms are included in the W3Pal Browser to better visualize different types of graphs:
The above algorithms are extremely useful in partitioning graphs into identifiable
sections. However, that is only the beginning, graphs with hundreds of
thousands of nodes need to have more done to reveal their structure. W3Pal
has the ability to apply the above algorithms to Sub-sections of
a graph. So we could start by visualizing a whole graph as a Radial
Tree and then select a few of its branches and render them as Circular
or whatever else we see fit. The next section explains how to group and
render Graph Subsections.
W3Pal makes it very easy to partition graphs into subsections. Figure 3, shown below, is the graph of a website that has been partitioned into three different sections; each individually rendered. The first step in this process is to select portions of the graph that we wish to render separately. Once the portions have been selected, the algorithm we wish to use on our selection is chosen and the image can be redrawn. The selected portions are rendered without affecting the rest of the image.
The resulting image clearly shows us that the graph of this website has three major sections; two of which have a specific structure. The ability to group and render graph subsections makes large websites easier to understand and helps to reveal the overall structure of a graph. This ability allows us to create graphs like the one shown in Figure 3. As mentioned earlier, certain types of structures are better viewed with different algorithms. Now we can split a graph into subsections and use the most appropriate algorithm for each individual subsection. Section 5.1 explains how we select and group graph subsections and section 5.2 explains the mouse commands and operations which are helpful in this operation.
The first task is to select the nodes we wish to render. This process is
usually quite straight forward. Looking back to Figure 2, we see that,
aside from the root node, there are two nodes that have several children
branching off. The first of these nodes is located on the far left side
of the graph and the second is located in the lower right of the graph.
We begin by right clicking one of these nodes and selecting the option
Select Tree. Next, we right click the mouse on the workspace window
and select the option Highlight Selected Vertices. Now all the
nodes in the desired subsection are highlighted. Next we open the Tools
menu and select the Options item. This will open the W3Pal Options
Dialog (Figure 6). Place a check mark in the Draw Highlight
check box and then click the OK button. Finally, select the visualization
algorithm, to use on the highlighted nodes, from the Draw Modes menu button.
This action will cause the browser to reorganize the selected nodes.
|
The mouse command menu (figure 4) has a variety of options for selecting
and highlighting vertices and edges.
The following selection options are for selecting specific types of
nodes.
The next set of options are for highlighting Vertices.
The final set of options are for selecting and highlighting edges. |
Large web graphs tend to have many subsections which can be grouped into single entities. The option to collapse Sections is designed to make these types of graphs more clear. Consider the XMLJ website displayed in figure 4, the two large circular sections are the slide presentations for class 1 and class 2. If there were 40 or 50 classes, the graph would be difficult to work with. However, W3pal gives us the ability to collapse each of these sections into a single node. Figure 5 shows the XMLJ website where the slide presentations have been collapsed into single nodes. The two nodes for the class lectures are now represented by pentagons in the image.
The first step to collapse a group of nodes is to select the desired nodes.
The best way to do this is to right click a node and choose the Select
Children option. Once the nodes are selected, simply click the Collapse
button located on the tool bar of the W3Pal browser window. This will cause
the selected nodes to be shrunk into a single node. The inward and outward
edges of each collapsed node will now be connected to the newly created
pentagon node.
The final display of the XMLJ website (figure 5) is different from the
other images in that the vertex titles are now displayed. The vertex titles
could have been displayed at any other time, but it would have made the
images appear cluttered. In figure 3, we chose to display only the three
hub nodes' labels. These nodes were handled individually by first highlighting
them and then choosing theDraw Highlight option in the W3Pal
options dialog. The W3Pal options dialog, shown in figure 6, has the
varius options for displaying vertex information. ID displays
the vertices' unique identifiers, Label displays the vertices'
URL, Short Label displays the file or directory name of the vertices,
Title displays the title of the HTML pages, and Short Title
displays the first ten characters of the title.
