The Fuzzy Clustering Applet

Written by Michael Kondorf
























Based upon the Fuzzy Clustering research of Mary Anne Egan.

Table of Contents

1.0 The Problem

2.0 The Java Language

2.1 A Java Applet

2.2 Java Tools

3.0 The Fuzzy Clustering Applet

4.0 The Original Source Application - Fitter

5.0 Loading Data Files Into The Applet

6.0 Running the Algorithms

7.0 Saving the Data Points

8.0 The Server Side Application

8.1 The FuzzyClustInit CGI Script

8.2 The FuzzyClustResults CGI Script

8.3 The FuzzyClustSave CGI Script

8.4 The FuzzyClustHelp.html File

9.0 Development Problems Encountered

10.0 The Source Code Description

11.0 The Source Code Listings

1.0 The Problem

There exists a very useful fuzzy-logic clustering simulation program which is strongly dependent upon the UNIX operating system, the X Windows interface and the SUIT prototyping tool. It is highly unlikely any group not possessing these three tools can successfully utilize the simulation program. There is therefore a strong compulsion to port the application to a more readily accessible platform in order to better demonstrate it's usefulness and promote it's acceptance. Additionally, it would be highly desirable to make the application accessible via the World Wide Web offering the broadest possible audience. Clearly the platform of choice then is the Java language.

2.0 The Java Language

For the first time in my recollection there exists a practical programming tool that allows for effortless portability amongst diverse hardware and OS platform combinations. But this platform independence does not come without penalty. Java is an interpreted language, which is inherently slow by definition. To maintain compatibility, Java forces the programmer to the lowest common denominator amongst all the platforms it is written to support. Features like multi-threading, memory allocation and memory deallocation are all left to the run-time interpreter. This ties the hands of the developer for optimization in the name of compatibility and portability.

Writing applications in Java is quite simple compared to other contemporary languages. Java is Objected Oriented by design. Java does not support pointers and does not have a preprocessor (like C++), making the language easy to debug. The Java compiler enforces strict type checking and array indexing checks; trapping potential problems at compile time. The default packages include a full graphics library, and a powerful networking library; making Java well positioned in today's development environment.

2.1 A Java Applet

A final feature of the language is its most exciting. Programs written in Java that extend (are children of) the class applet can be executed inside of a web browser. This simple feature opens a whole world of opportunities for developers. But, at the same time the user must be protected. A malicious program must not be able to harm the client's computer or browser. Because of this concern, Java applets have two major restrictive rules under which they must operate. First, an applet cannot read from or write to the client's local disk storage. This prevents snoop attacks and file renaming or removing. Second, an applet can only open a network connection to the server where it was downloaded from. This prevents bypassing network firewall security. While these security measures clearly protect the user, they make it very difficult to develop interactive applications, like fuzzy clustering. To overcome this problem I chose to use a combination of HTML forms and CGI scripts which I will discuss later.

2.2 Java Tools

The Java language is relatively young. The current release is version 1.1.x. But surprisingly, there already exists quite a range of development tools. I had a chance to evaluate two integrated development environment packages (IDE). Microsoft's J++, and SUN Microsystems Java Workshop. Both tools were very good but imposed upon the resultant applications the added overhead of loading extra class files from the IDE. These files, some of which were over 200K in size, seemed like overkill for our applet. If Fuzzy Clustering applet was to be truly web based, it should be as small as possible to minimize the download time. Therefore I ruled out using an IDE development tool.

The Java language was recently upgraded from version 1.0.2 to version 1.1 earlier this year. Because of this, the major browser suppliers, Netscape and Microsoft, do not fully support the new Java. Therefore to minimize incompatibilities, I chose to write the applet using the older Java release. The Java compiler I used comes directly from SUN Microsystems: the Java development kit (JDK) version 1.0.2.

3.0 The Fuzzy Clustering Applet

Figure 1. The Fuzzy Clustering Applet

Above, in figure 1, is a screen shot of the Fuzzy Clustering Applet. The generic controls of the application are straight forward. The white square with a cube in it is the plotting canvas where all data points are plotted. This surface is also sensitive to mouse clicks. The user can add points to the canvas by simply clicking the left mouse button on the canvas. The points on the plotting canvas may be manipulated using the controls under the heading Point Marker Control (See figure 1 the section labeled 5). To change the point size use the size controller. To change the point color use the color controller. To hide the cube, display the cube, or display a truncated cube, use the cube controller.

At the bottom of the applet there are a number of buttons whose function is self explanatory. To run the selected algorithm press the Run Algorithm button. To save the data points press the Save Points button. To clear the computed clusters or computed shells press the Clear Clusters button. To delete all the data points press the Clear Points button. For a help screen press the Help button.

The remaining inputs of the applet control the parameters to the different algorithms. If the reader refers to figure 1 again, the section labeled 4 contains two parameters valid for all algorithms. The approximate number of clusters is an integer value representing the number of groups of points in the data set. The weight assignment strategy can be: alternate, random or in order. This specifies the way in which cluster membership weights are initialized for every point in the graph.

If you refer to the section 1 of figure 1, this is where the user can choose the clustering process: Robust clustering or Shell clustering. If the user selects Shell clustering then they can select a Shell clustering algorithm to run in the section labeled 2. The choices for Shell clustering algorithms are: AFCES Simple, AFCES Newton, and AFCES U. In figure 1 again, the section labeled 3 contains all the controls for the Robust clustering algorithm. The Lambda and Fuzziness Value must be a real numbers greater than zero. The Initial Membership Weight is a percent value. It must be greater than zero but not exceeding one.

Figure 2. Display Clusters

The radio buttons Display Clusters and Display Confidence control how the Robust clusters are plotted. If the former is selected the points that are members of the cluster are colored a unique color. If the latter is selected the points surrounding the cluster are colored a unique color depending upon the probability that they belong to that cluster (see figures 2 and 3).

If the Run Once option is selected the Robust clustering algorithm will compute one iteration using the given number of clusters. Otherwise, the algorithm will first compute the optimum number of clusters then compute the clustering. This process takes a significant amount of time.

Figure 3. Display Confidence

The second to last option - Display After Each Iteration, plots the intermediate clustering results while computing. Finally the last option - Show All Computed Clusters, plots all clusters even if they fall below the noise threshold.






4.0 The Original Source Application - Fitter

Figure 4. The Fitter Application

For the readers reference here is a screen snapshot of the original C program, Fitter (see figure 4), written by Mary Anne Egan. I took care to reproduce the controls and functionality of the original within my applet.


5.0 Loading Data Files Into The Applet

Figure 5. The HTML Data and Image Loading Form

Figure 5 is a screen shot of the loading HTML form. This is the first screen the user sees upon starting the applet. As you discovered earlier, a Java applet cannot read from the local disk of a client's computer. This is a real problem; how are you going to give the applet some personal test data? I solved the problem by using a little known extension of HTML, the File Selection field. In a HTML form containing a File Selection field, the browser will automatically upload the user specified file to the server.

Once the file is safely on the server, I spool it until the Fuzzy Clustering Applet calls for it. So in a roundabout way, data files are transferred from the user of the applet into the applet using the server as the transfer medium. The only drawback of this procedure is that to load another data set it is necessary to return to the HTML form, specify the new data file, then reload the applet again.

To load a file into the applet, you simply have to enter its name in the appropriate field on the form. The first field is for points files, and the second is for image files. If you don't know a filename, select the browse button to search your local directories.

A points file layout is simple. You just have to do is create an ASCII file with the following:

The format of this file is quite liberal. You can use any white space character, a comma, or even a newline interchangeably as field delimiters.

If you wish to load an image the applet supports gif and jpeg encoded image files. The plotting canvas is 300 by 300 pixels so remember to keep your images small.

6.0 Running the Algorithms

Figure 6. The Algorithm Results Dialog (Running)

Figure 7. The View Results Button

As you read earlier, to start an algorithm computing: first you have to load some data points, set the computation controls, then select the Run Algorithm button. When an algorithm gets run, two things happen: a new thread is created to control the computation and a Results Dialog gets created (see figure 6). Collected in the scrollable area of the dialog box is any textual output from the running algorithm. While an algorithm is active, the Results Dialog appears like in figure 6. If you select the Hide button, the dialog disappears from view. To restore the dialog then, press the View Results button (see figure 7). You can also stop the computation before it completes by pressing the Stop button. This destroys any work that has not completed and kills the thread.

When the algorithm has finished computing, the thread terminates. The Results Dialog appearance then changes to something similar to figure 8. To close this dialog without saving the text output, press the Close button. To save the captured textual output, select the Save Results button. If you opted to save, the next screen you will see is similar to figure 9. The saving process is similar to the loading process. The applet must upload its information to the server then instruct the user's browser to fetch this information back from the server. To complete the saving process, the user must select the save option from the browser's file menu, writing to local disk.

Figure 9. The Save Results Screen

Figure 8. The Algorithm Results Dialog (Finished)

7.0 Saving the Data Points

Figure 10. The Save Data Points Screen

If you have entered data points on the canvas manually through clicking the mouse, you should save your changes. To save the data points select the Save Points button (see figure 1). The screen in figure 10 is similar to what you will see next. It is a new browser window in which the data points in the applet are exported. As in all saving and loading operations, the applet must first upload the data to the server then instruct the browser to fetch it from the server. To complete the saving process, the user must select the save option from the browser's file menu, writing to local disk.

8.0 The Server Side Application

The previous sections have been solely about the client side Java applet. This section presents the data handling and initialization scripts present upon the server which are needed for the applet to properly function as an application.

The first piece of the server side application is the input form InitFC.html. This file generates the web page seen in figure 5. It is the entry point to the applet. This HTML form calls the initialization script FuzzyClustInit.cgi. This file with a .cgi extension is called a CGI script. CGI stands for Common Gateway Interface. The power of a CGI file is that when it is accessed by the web server, it is executed as if it were a program. The actual content of the FuzzyClustInit.cgi script is written in Perl.

8.1 The FuzzyClustInit CGI Script

This script's job is two fold. It's responsible for spooling and unspooling of the user's input data files. Its operation is simple. If the script is called using a http "post" request, it attempts to spool data. If the script is called using a http "get" request, it attempts to despool data. The script assumes when it receives a "post" request that it originated from a user's browser who processed the InitFC.html form. This form is configured to call the script when the user selects the Start Applet button (see figure 5). The script checks the "post" request for the proper format, and then spools any data files that it might contain. Now the client's browser is waiting for a response from the server. The script dynamically generates a HTML web page and passes it to the browser. Imbedded in this web page are the applet, a data file key, and an image file key.

When the applet is fully loaded in the client's browser, it searches for these file keys. A file key present indicates that there is data on the server to be downloaded. For example, if the file key for an image exists, then the server has spooled an image file. To retrieve the spooled data, the applet open a connection to the server, then generates a http "get" request. Inside this request is the key to the file the applet needs. When the FuzzyClistInit script is called with a http "get" request, it searches for the spooled file that matches the key. If found, the script returns the file to the applet and then deletes it.

8.2 The FuzzyClustResults CGI Script

This script is very similar to the script of the previous section and, for that manner, the script of the next section. It performs two different operations depending upon how it is called. This script differs in that it assumes that it will always be called from inside the Fuzzy Clustering Applet. Like the previous script, a http "post" request means spool data and a http "get" request means despool data. The applet invokes this script when the user selects Save Results button in the Results Dialog (see section 6.0). The applet opens a "post" connection to the script and uploads the data. When the upload is finished the server send the file key back to the applet. The applet now opens a new browser window with a http "get" request, including the key. The script fetches the appropriate file then deletes it from the spool directory.

8.3 The FuzzyClustSave CGI Script

This is the last of the server scripts, all of which operate virtually the same. Like the two previous scripts, it performs dual operations depending upon how it is called. It also assumes that it will always be called from inside the Fuzzy Clustering Applet. This script differs internally because it has to parse and format the input files while spooling them. Like the other scripts, a http "post" request means spool data and a http "get" request means despool data. The applet invokes this script when the user selects the Save Points button (see section 7.0). The applet opens a "post" connection to the script and uploads the data. The script reads in the data. Before spooling it, the script adds a header and then formats the numbers, ten to a line separated by commas. When the upload is finished the server send the file key back to the applet. The applet now opens a new browser window with a http "get" request, including the key. The script fetches the appropriate file then deletes it from the spool directory.

8.4 The FuzzyClustHelp.html file

The last piece of the server side application is the help file FuzzyClustHelp.html. This file is retrieved by the applet when the "help" button is selected.

9.0 Development Problems Encountered

The first problem in the project I encountered was applet security. As stated in section 2.1, the framers of the Java language wanted the applet mechanism to be hacker proof. While they have succeeded in doing so, they have made very difficult to write an applet that supports the loading and saving of data. For the project I was able to code an acceptable compromise solution using server side CGI scripts. An unfortunate consequence of this is that only the browsers Netscape Navigator version 3.x and newer and Microsoft's Internet Explorer version 4.x and newer can run the Fuzzy Clustering Applet.

Another problem which turned out to be very difficult to fix was the applet's layout inconsistency between different browsers on different OS platforms. Netscape implements spacing more liberally than does Microsoft. The PC platform of the browsers have totally different looking controls. Radio buttons, choice boxes and text fields all appear differently on the PC platform. Finally the default font sizes and screens sizes produced havoc. The final version of my applet appearance should be acceptable on all major OS/browser combinations.

A final and the most shocking problem I encountered with the Java language is threads. I discovered that on UNIX platforms, particularly Solaris, the Java virtual machine implementations are cooperatively threaded. This means that if an executing thread does not yield the processor, all the other Java thread will starve. I could not find a clear solution to the problem which impacted my application tremendously. Having to code around this problem decreased the applet's performance by as much as 20 percent.

10.0 The Source Code Description

The Fuzzy Clustering Applet version 2.0b contain the following source files:

The Java source code:

The build utilities:

The server application (CGI) files:

The server HTML files:


11.0 The Source Code Listings