There are now a number of libraries for processing some of the technologies upon which RDF/XML is built. Particularly, Closure XML is a robust, mature XML parser, and manipulation of URIs is easy using Puri.
This RDF/XML parser builds upon these libraries so as to minimize the amount of code that is not directly related to parsing RDF/XML, and provides a simple interface for extracting triples from RDF/XML documents.
This parser is designed to be easy to use. It is distributed as an ASDF package, so getting it up and running should be as simple as:
CL-USER> (load #P".../cl-rdfxml.asd")
CL-USER> (asdf:oos 'asdf:load-op '#:cl-rdfxml)
Parsing a document is performed
using cl-rdfxml:parse-document. cl-rdfxml:parse-document
takes two arguments, the first is a function designator for a function
that accepts three arguments, the second represents the is a value
suitable
to cxml:make-source.
For each triple in the graph, the provided function (the first
argument) is called with three arguments, the subject, predicate, and
object of the triple.
URIrefs in the graph are interned PURI URIs, and are comparable
with eq. Blank nodes are represented by blank node
objects which may have an id, local with respect to calls
to parse-document (i.e., local to a single graph). Blank
nodes are comparable under object equality, that is,
using eq. Literals, both plain (with optional language
tags) and typed are interned globally, and so are comparable
by eq, even across graphs.
Get the latest version, or browse the source.
This previously unnamed RDF/XML parse is now called CL-RDFXML,
and is
also ASDF-Installable,
by virtue of a Cliki
page. This means that (asdf-install:install
'#:cl-rdfxml) should be enough to get the parser
installed. The basic usage is still the same. Documentation within
the code is fairly thorough, though hopefully external
documentation, probably HTML, will be available soon.
Added a bugfix contributed by Red Daly, in which determining
whether an element without an rdf:parseType attibute
was an empty element, a resource element, or a lieral element, was
incorrect in the presence of comments. I also split up the system
into several files. This makes loading a bit more straightforward,
and I also added the proper wrappers for constant definitions
(SBCL behaves a bit differently than some Lisps). Still to come,
proper handling of DTD content, and other possible things in XML
documents whose positions I might not have considered. (The
comments within elements was one such kind of thing.)
Went through some RDF files pulled from the web (notably, the
dump from Profiles
In Terror) and compared the results of parsing with this
system and parsing
with Jena. Jena parsed
a bit faster, so some profiling may be in order soon. More
importantly, Jena doesn't seem to do validation of URIs. For
instance, no warning or error is triggered
by <rdf:Description rdf:about="some invalid
URI"/>. On the other hand, some rdf:_0
elements were flagged as unrecognized terms in the RDF
namespace. This latest version provides restarts to try reparsing
malformed URIs. The keyword
argument parse-uris-strictly, when nil,
establishes a handler for puri:uri-parse-error that
retries the parsing with puri:*strict-parse* bound
to nil. Flagging unrecognized terms in the RDF
namespace is on the list of things to do.
Worked through a number of
the RDF
test cases, and made sure that they were correctly
handled. Particularly, rdf:IDs should be unique
within a document, and rdf:IDs
and rdf:nodeIDs should both have values that are
valid XML NCNames. Though the symbols aren't exported, there are
now functions for printing
in N-Triples. In
particular, look
at rdfxml::make-n-triple-printer.
This version adds error checking and restarts, provides (optional) support for datatyped literals with empty lexical forms (which are, strictly speaking, prohibited by the RDF/XML specification, but appear quite frequently in some RDF/XML documents. Language tag conformance to RFC 3066 is also (optionally) enforced. XML Literals are now handled, though container support is still weak (see notes from prior version).
The initial release. This version lacks a great deal of error
checking, and does not handle XML Literals, nor great support for
some of the RDF collections. (Particularly, the
attribute/value rdf:parseType="Collection" will
create a list, but the rdf:li and rdf:_n
attributes aren't handled.