|
Marcus Raitner Lehrstuhl für Theoretische Informatik Universität Passau 94032 Passau raitner@fmi.uni-passau.de |
Michael Himsolt Lehrstuhl für Theoretische Informatik Universität Passau 94032 Passau himsolt@fmi.uni-passau.de |
GML_scanner procedure
GML_token structure
GML_value enumeration (scanner related
entries)
GML_tok_val structure
GML_parser procedure
GML_stat structure
GML_pair structure
GML_value enumeration (parser related
entries)
GML_pair_val structure
GML_free_list procedure
GML_print_list procedure
gml_demo programm
GML, the Graph Modelling Language, is a portable file format for graphs. GML has been developed as part of the Graphlet system, and has been implemented in several other systems, including LEDA, GraVis and VGJ.
This document describes a sample scanner and parser for GML.
Unlike other implementations, this one uses ANSI C and does not rely
on external tools such as lex and yacc.
This implementation is also designed to be highly portable and can be
used as a library.
The procedure GML_scanner implements
the scanner for GML files:
struct GML_token GML_scanner (FILE*
file);
GML_scanner reads the next input token from
file and returns it in a GML_token
structure. file must be open for read access; the caller
is responsible for opening anc closing the file.
The type GML_token is defined as follows:
struct GML_token {
GML_value kind;
union tok_val value;
};
Where kind determines the type of the token and
value is its value.
kind is of type GML_value, which is listed in Table
1.
GML_KEY |
token is the name of a key |
data in value.string |
GML_INT |
token is integer number |
data in value.integer |
GML_DOUBLE |
token is floating point number |
data in value.floating |
GML_STRING |
token is a string |
data in value.string |
GML_L_BRACKET |
token is left bracket |
no data in value |
GML_R_BARCKET |
token is right bracket |
no data in value |
GML_END |
EOF was reached (value undefined) |
no data in value |
GML_ERROR |
error occured while parsing |
additional information in value.error |
The value field in GML_token is of type
GML_tok_val, which is defined as follows:
union GML_tok_val {
long integer; // used with GML_INT
double floating; // used with GML_DOUBLE
char* string; // used with GML_STRING, GML_KEY
struct GML_error err; // used with GML_ERROR
};
The procedure GML_parser
implements the parser for GML files:
struct GML_pair* GML_parser (FILE* file,
struct GML_stat* stat,
int mode);
Input parameters for GML_parser are the
file, a pointer to a GML_stat structure and
and the operations mode. file must be open
for reading; the caller is responsible for opening and closing the
file. stat must point to a structure of type
GML_stat, which is defined as follows:
struct GML_stat {
struct GML_error err;
struct GML_list_elem* key_list;
};
The variable err is used to report errors (for information on
GML_error see below)
If an error occurs during parsing,
stat->err.err_num is set to the corresponding error
code, and additional information is written into the data
structure pointed to by stat->err. If no error occurs, then
stat->err.err_num has the value GML_OK.
mode is almost always 0.
The other field in GML_stat is key_list, a
pointer to a singly linked list of the strings used for keys. You can access
the first key-string with key_list->key and the next node with
key_list->next.
The parameter mode needs further clarification.
GML_parser parses lists recursively. Therefore, A
closing square bracket (]) means the end of a list in a
recusive call and a syntax error (GML_TOO_MANY_BRACKETS)
at the top level. Therefore, mode is used to
discriminate between the top level (mode == 0) and a
recursion step (mode == 1). 0 at the top
level and and 1 in a recursion step.
GML_parser returns a structure of type
GML_pair, which is defined as follows:
struct GML_pair {
char* key;
GML_value kind;
union pair_val value;
struct GML_pair* next;
};
Each object in a GML_pair structure corresponds to a
key-value pair in the GML file, where key is a pointer to
the key, and kind and value hold the value.
For example, the sequence "id 42" translates into a
GML_pair structure where key is "id",
kind is GML_INT and
value.integer is 42. next
implements GML lists. next is a pointer to the next
element within the current in the list, or NULL if there
are no more elements in the
list.
The field kind determines which of the fields in
value is used. kind is of type
GML_value, which is listed in Table 2.
GML_INT |
value is a integer number |
data in value.integer |
GML_DOUBLE |
value is a floating point number |
data in value.floating |
GML_STRING |
value is a string |
data in value.string |
GML_LIST |
value is a list of key-value pairs |
data in value.list |
The data structure GML_pair_val is defined as
follows:
union GML_pair_val {
long integer; // kind is GML_INT
double floating; // kind is GML_DOUBLE
char* string; // kind is GML_STRING
struct GML_pair* list; // kind is GML_LIST
};
Note: string contains no characters with
ASCII code greater than 127 because these are converted into the
iso8859-1-format. See the GML Manual for details.
The following auxiliary procedures are defined for
GML_pair:
void GML_free_list (struct GML_pair* list, struct GML_list_elem* key_list)
Frees recursivly all storage allocated for list and for
key_list (which is decribed above).
void GML_print_list (struct GML_pair* list, int level)
Writes list to stdout, using level
for indentation. This meant for debugging only.
The currently read line and column are stored in the global
variables GML_line and GML_column. If you
are interested in where an error occured, you should call
before calling parser or scanner the first time. It will set both variables to 1.void GML_init ()
The procedures GML_scanner and
GML_parser read until they find an
error, or an end of file. If the parser encounters
an error, it returns the GML structure parsed so far and provides error
information in its error parameter.
The structure GML_error reports scanner and parser errors:
struct GML_error {
GML_error_value err_num;
int line;
int column;
};
line is the input line in which the error
occured, and column is the corresponding column.
Both line and column start at 1. err_num is of type
GT_error_value, which is listed in Table 3.
GML_UNEXPECTED |
unexpected charcter was found |
GML_SYNTAX |
Broken key-value structure |
GML_PREMATURE_EOF |
End of file encountered while reading a string |
GML_TOO_MANY_DIGITS |
A number has too many digits (that is, more than 1024 in the current implementation). |
GML_OPEN_BRACKET |
At least one bracket not closed at EOF |
GML_OK |
No errors occured |
The following example
reads a GML file (filename is specified in the command line) and writes the
parsed key-value pairs and the list of keys to standard
output.
#include "gml_parser.h"
#include <stdio.h>
#include <stdlib.h>
void print_keys (struct GML_list_elem* list) {
while (list) {
printf ("%s\n", list->key);
list = list->next;
}
}
void main (int argc, char* argv[]) {
struct GML_pair* list;
struct GML_stat* stat=(struct GML_stat*)malloc(sizeof(struct GML_stat));
stat->key_list = NULL;
if (argc != 2) printf ("Usage: gml_demo <gml_file> \n");
else {
FILE* file = fopen (argv[1], "r");
if (file == 0) printf ("\n No such file: %s", argv[1]);
else {
GML_init ();
list = GML_parser (file, stat, 0);
if (stat->err.err_num != GML_OK) {
printf ("An error occured while reading line %d column
%d of %s:\n", stat->err.line, stat->err.column, argv[1]);
switch (stat->err.err_num) {
case GML_UNEXPECTED:
printf ("UNEXPECTED CHARACTER");
break;
case GML_SYNTAX:
printf ("SYNTAX ERROR");
break;
case GML_PREMATURE_EOF:
printf ("PREMATURE EOF IN STRING");
break;
case GML_TOO_MANY_DIGITS:
printf ("NUMBER WITH TOO MANY DIGITS");
break;
case GML_OPEN_BRACKET:
printf ("OPEN BRACKETS LEFT AT EOF");
break;
case GML_TOO_MANY_BRACKETS:
printf ("TOO MANY CLOSING BRACKETS");
break;
default:
break;
}
printf ("\n");
}
GML_print_list (list, 0);
printf ("Keys used in %s: \n", argv[1]);
print_keys (stat->key_list);
GML_free_list (list, stat->key_list);
}
}
}