W3Pal Webbot Tutorial |
|
Webbot is the web robot used by W3Pal to gather web site information. It is supported by the W3C and comes with the W3C libwww code base. It is used to check links, find bad HTML, download images, map out web sites, etc...
For more information on Webbot visit the W3C webbot page.
The W3C libwww robot, webbot, has been extended for use with the W3Pal software. The new
version of webbot can be controlled through the W3Pal webbot dialog window. The dialog can be accessed from
the Editor, Navigator, or Browser by selecting the File menu and expanding the New
submenu. From the New submenu, the Webbot menu item must be selected. Once the
Webbot menu item is selected the webbot dialog window, Figure 1, will appear.
Several fields need to be filled out in order to instruct webbot to scan a web site. The URL
field specifies the start page (root) that webbot uses when creating the webgraph. The Depth
field specifies the maximum depth webbot should scan to (from the root node). Likewise Num Docs
specifies the maximum number of documents the webgraph can have, leaving this field blank will specify
no maximum. The Prefix field specifies the location of the web site, everything falling
within the given prefix will be considered internal to the web site. Any document not within the specified
prefix will be treated as an external document.
To specify a filename for the graph that webbot creates, select the Log Files option and
enter the filename of the XGMML graph in the Graph field. The files will automatically
be saved in a specified directory. The directory where the files are being saved to is displayed in
the top of the dialog box next to the Directory For Files label.
|
The file directory can be changed by opening the |