Visual Web Mining


 

Amir H. Youssefi (RPI), David Duke (Univ. of Leeds, UK), Mohammed J. Zaki (RPI) and Ephraim P. Glinert (National Science Foundation, RPI)
 
 

Abstract

Analysis of web site usage data involves two significant challenges: firstly the volume of data, arising from the growth of the web, and secondly, the structural complexity of web sites. In this project we apply Data Mining and Information Visualization techniques to the web domain in order to benefit from the power of both human visual perception and computing; we term this Visual Web Mining. In response to the two challenges, we propose a generic framework, where we apply Data Mining techniques to large web data sets and use Information Visualization methods on the results. In a prototype implementation we correlate the outcomes of mining Web Usage Logs and the extracted Web Structure, by visually superimposing the results. We propose several new information visualization diagrams and analyze their utility and elaborate on the loosely coupled architecture which integrates data exchanged in XML languages from Data/Web Mining and Link Analysis suites. Our interactive 3D visualization in vtkGraph, scales up to hundreds of thousand nodes.

Publications:

 - A. H. Youssefi, David J. Duke, Mohammed J. Zaki, Ephraim P. Glinert. Toward Visual Web Mining, Proc. of Int'l Visual Data Mining, IEEE Int'l Conf. on Data Mining ICDM, Florida, Nov 2003

 - A. H. Youssefi, David J. Duke, Mohammed J. Zaki. Visual Web Mining, submitted to International World Wide Web Conference, New York, May 2004. Please cite published version as Technical Report 03-16 Department of Computer Science, Rensselaer Polytechnic Institute, 17 Nov 2003.

 - D. J. Duke, A. H. Youssefi, M. J. Zaki. Unraveling the Web: A Modular Fusion of Visualization and Data Mining, submitted to EURO GRAPHICS IEEE TCVG Symposium on Visualization, 2004.

 - A. H. Youssefi, Olfa Nasraoui, David J. Duke, M. J. Zaki. Using Profiles in Visual Web Mining, in preperation.

Video Clip of VWM Tool in Action: Demo.avi (67.7 MB) (please download the whole move then play it i.e. right click then choose "save target as" from menu)


System Architecture

We greatfully acknowledge Prof. Mukkai Krishnamoorthy and Dr. John Punin for the help on W3Pal Suite.

 

2D layout as basment for visualization of Computer Science Dept. of RPI




Cylinder in Depth is the cluster to which web site's hits go!


Phenomena: Amplification of a user session clickstream(buttom left) in drill down cylinder, Cone Scatter (Top Right) and Funnel Backoff to main page of website (Top Right)




Superimposition of Frequent Patterns extracted from Web Mining on top of Web Usage.




Aggregatation of Sequence/Tree Mining support values visually amplified by thickness of stream tubes and highlighted by white color and later superimposed on top of Web Usage.



Higher Order layout for clear visualization and easier analysis.



And finally superimposition of Web Usage (colored) on top of Web Structure (gray). Graph layout of Web Usage is ignored and layout of Web Structure Graph is applied to it.







Web Structure is removed for clarity but layout of Web Usage is still taken from Web Structure.



Zoom on a cluster.


h






Glyphs for easy pick/click by mouse .






Reverse superimposition gives very bad oclusion side effects, yet falling strings show documents with sequential hypherlinks (documents converted from LaTeX to HTML)