There are now a number of libraries for processing some of the technologies upon which RDF/XML is built. Particularly, Closure XML is a robust, mature XML parser, and manipulation of URIs is easy using Puri.
This RDF/XML parser builds upon these libraries so as to minimize the amount of code that is not directly related to parsing RDF/XML, and provides a simple interface for extracting triples from RDF/XML documents.
GitHub: CL-RDFXML source is hosted on GitHub. Visit the project page or clone:
git clone https://github.com/tayloj/cl-rdfxml.git
CL-RDFXML can be installed using ASDF-Install, or downloaded from http://www.cs.rpi.edu/~tayloj/CL-RDFXML/cl-rdfxml_0.9.tar.gz. The current version is 0.9.
CL-RDFXML comes with an ASDF system definition, and depends on CXML and PURI.
You can also browse the source.
parse-document function input => | function---a function of three arguments input---an input suitable for cxml:make-source Parse-document parses input, binds *triple-receiver* to function, and calls emit-triple with each triple that can be extracted from the input. *blank-nodes* is rebound to a new equal hash table and maps blank node identifiers to their blanks nodes. If the document element is rdf:RDF, then its children are processed as node elements, otherwise, the body of the document is parsed as a sequence of node elements.
[Special variable]
*warn-on-non-namespaced-names*
A boolean (whose default value is true) that controls whether a warning is signalled when a permitted non-namespaced attribute is encountered. The only attributes which may appear without namespaces are ID, about, resource, parseType, and type. New documents should not use unqualified forms, though they may appear in legacy documents. See Section 6.1.4 of the RDF/XML Syntax Specification.
[Special variable]
*warn-on-parse-type-other*
A boolean (whose default value is false) that controls whether a warning is signalled when an element is encountered that specifies the rdf:parseType attribute with a value other than "Literal", "Resource", or "Collection". Such an element is treated as though the value were "Literal", and this situation is not an error. Nonetheless, it seems likely that one might be interested in knowing when it occurs.
[Special variable]
*warn-on-rdf-prefixed-non-rdf-names*
According to to Section 5.1, The RDF Namespace and Vocabulary of the RDF/XML Syntax Specification, warnings SHOULD be generated when a name is encountered that begins with the RDF namespace name, but is not an RDF name. If *warn-on-rdf-prefixed-non-rdf-names* is true (the default), then such warnings are generated, but are muffled otherwise.
[Standard class]
literal
The literal class is the superclass of both the plain-literal and the typed literal. Every literal has some lexical form, and the slot storing this form is defined in the literal class, and may be read with literal-string.
[Generic function]
literal-string literal => result
literal-string literal => string literal---a literal string---a string Literal-string returns the lexical form of the literal.
[Standard class]
plain-literal
The plain-literal class is the class comprising all plain-literals. These literals have a lexical form, inherited from the superclass literal, and an optional language tag. The language tag, when provided, should be of the form specified by RFC 3066, and is normalized to lowercase.
[Function]
intern-plain-literal string &optional language => result
intern-plain-literal string [language] => plain-literal string, language---strings plain-literal---a plain literal Intern-plain-literal returns a literal with the specified string and language. Calls to intern-plain-literal with strings that are equal and languages that are equal return the same literal object.
[Generic function]
literal-language plain-literal => result
literal-language plain-literal => result plain-literal---a plain-literal result---a string or nil Literal-language return the language tag of the plain-literal, if there is one, and nil if no language tag is associated with the literal.
[Standard class]
typed-literal
The typed-literal class is the class comprising all typed-literals. These literals have a lexical form, inherited from the superclas literal, and a required datatype. The datatype is a puri:uri.
[Function]
intern-typed-literal string datatype => result
intern-plain-literal string datatype => typed-literal string---a string datatype---a URI designator typed-literal---a typed literal intern-typed-literal returns a literal with the specified string and datatype. Calls to intern-plain-literal with strings that are equal and designators for the same URI return the same literal object.
[Generic function]
literal-datatype typed-literal => result
literal-datatype typed-literal => datatype typed-literal---a typed-literal datatype---an interned PURI uri Literal-datatype returns the datatype of a typed-literal. The datatype URI is interned, and may be compared with eq.
[Standard class]
blank-node
The blank-node class represents blank nodes in a graph. Blanks nodes are local to a graph, and can be identified within a graph by their ID. The blank-node id is used for refering to the same blank node in an RDF/XML document, and so in general, blank-nodes ought to compared using object equality, i.e., eq.
[Function]
blank-node &optional id namespace => result
blank-node [id [namespace]] => blank-node id---a string namespace---an equal hash-table Blank-node returns a blank node. If id is specified, then if there is already a blank node in namespace whose id is equal to id, then that blank node is returned. Otherwise, a new blank node is created, inserted into namespace, and returned. If id is not specified, then a new blank node is returned, and namespace is not modified.
[Generic function]
blank-node-id blank-node => result
blank-node-id blank-node => id blank-node---a blank-node id---a string Blank-node-id returns the ID of the blank-node. Blank-node-ids are intended to be used for readability purposes. Blank-nodes should be compared directly using object equality. That two blank-nodes have ids that are string= does not mean that they represent the same RDF blank-node.
[Condition type]
rdfxml-warning
The class of warnings signalled by the RDF/XML parser.
[Condition type]
rdf-prefixed-non-rdf-name
According to "Section 5.1 The RDF Namespace and Vocabulary" of the RDF/XML Syntax Specification, certain names are defined as RDF names, and these begin with the RDF namespace name, but "any other names [beginning with the RDF namespace name] are not defined and SHOULD generate a warning when encountered, but should otherwise behave normally." rdf-prefixed-non-rdf-name is the class of warnings that are signalled in such situations.
[Condition type]
non-namespaced-name
According to 6.1.4 of the RDF/XML Syntax Specification, the attributes ID, about, resource, parseType, and type may appear without a namespace prefix, and are interpreted as the corresponding RDF names. Also, "new documents SHOULD NOT use these unqualified attributes, and applications MAY choose to warn when the unqualified form is seen in a document." non-namespaced-name is the class of warnings that are signalled in such situations.
[Condition type]
other-parse-type
The rdf:parseType attribute has three explicitly meaning values, "Resource", "Literal", and "Collection". If rdf:parseType is encountered with a different value, the element is processed as though the value had been "Literal". The specification does not indicate that a warning should be signalled, and so such warnings are not generated in the default case, but if the user requests warnings on such attribute values, a warning of type other-parse-type is signalled.
[Condition type]
rdfxml-error
The class of errors signalled by the RDF/XML parser.
[Condition type]
invalid-attribute-value
Conditions of type invalid-attribute-value are signalled when an attribute value is not appropriate for the particular attribute. This kind of situation may happen, for instance, if a xml:lang value is not RFC 3066 compliant, or if an rdf:ID or rdf:nodeID value is not an XML NCName. Note that these situations are distinct from those in which an attribute appears where it should not.
[Condition type]
repeated-id
Errors of type repeated-id are signalled when the value of an rdf:ID on an element is the same as the value of an rdf:ID attribute on another element. rdf:IDs should be unique within the a document.
[Condition type]
non-nc-name-id
Errors of type non-nc-name-id are raised when attributes rdf:ID or rdf:nodeID appear with values that are not valid NCNames.
[Condition type]
invalid-language-tag
Language tags in RDF/XML (and more generally, XML) must be in accordance with RFC 3066. When a language tag is specified that is not of the proper form, an error of type invalid-language-tag is signalled.
[Condition type]
unexpected-characters
Excess whitespace is always permitted between elements, but arbitrary character data is not. When non-whitespace character data is encountered where whitespace is expected, an error of type unexpected characters is signalled.
[Condition type]
duplicate-attribute
Errors of type duplicate-attribute are signalled when attributes are specified more than once and the XML parser did not flag the error. This happens when, according to the RDF/XML specification, certain non-namespaced attributes are interpreted as being in the RDF namespace. A duplicate attribute can appear, for instance, when rdf:ID is specified in conjunction with ID, which is interpreted as rdf:ID.
[Condition type]
prohibited-attribute
At various places in RDF/XML, the set of attributes permissible on an element is restricted. When an attribute appears on an element but is not allowed there, an prohibited-attribute error is signalled.
[Condition type]
datatyped-empty-property
Errors of type datatyped-empty-property are signaled when the parser is attempting to parser an empty property, but an rdf:datatype attribute is specified. rdf:datatype is not a permissible attribute on on empty properties, but it is expected that this case will most likely arise when the intent was to generate a literal element with an empty string. When this type of error is signalled, it is expected that one of the available restarts will make an attempt to parse the element as a literal with a null string. This issue is also in the errata of the RDF/XML syntax specification.
[Condition type]
non-namespaced-attribute
Certain attribute, namely rdf:ID, rdf:about, rdf:resource, rdf:parseType, and rdf:type are permitted to appear without a namespace specified. These attributes are automatically treated as though they had appeared with the RDF namespace prefix. Any other attributes without namespaces, however, must not appear.
[Condition type]
mutually-exclusive-attributes
Some elements are permitted to contain one of a set of attributes, but no more than one of the set. That is, there are attributes that are permitted on an element, but are mutually exclusive. This class of error is signalled when such attributes are encountered.
[Condition type]
bad-uri
Errors that are subclasses of bad-uri are signalled when a URI appears in a place that a URI is required, but the provided URI is not a valid URI for that place.
[Condition type]
bad-node-element-uri
Errors of type bad-node-element-uri are signalled when a URI that is not a node-element-uri is specified in a position where a node-element-uri is expected.
[Condition type]
bad-property-element-uri
Errors of type bad-property-element-uri are signalled when a URI that is not a property-element-uri is specified in a position where a property-element-uri is expected.
[Function]
ignore-attribute &optional condition => result
ignore-attribute [condition] => | condition---a condition Ignore-attribute attempts to invoke the restart named ignore-attribute. This is intended for use when an attribute appears in a place where it is prohibited, but parsing would continue successfully if the attribute had not been specified.
[Function]
ignore-attributes &optional condition => result
ignore-attributes [condition] => | condition---a condition Ignore-attributes attempts to invoke the restart named ignore-attributes. This is intended for use when duplicate attributes are provided and all can be ignored, or when mutually exclusive attributes appear, and all can be ignored.
[Function]
ignore-characters &optional condition => result
ignore-characters [condition] => | condition---a condition Ignore-characters attempts to invoke the restart named ignore-characters. This is intended for use when character data appears in a place that should have been whitespace.
[Function]
ignore-language &optional condition => result
ignore-language [condition] => | condition---a condition Ignore-language treats a xml:lang attribute whose value was not a language tag conforming to RFC 3066 as though the attribute had not been specified. This occurs by invoking the restart ignore-language.
[Function]
parse-as-typed-literal &optional condition => result
parsed-as-typed-literal [condition] => | condition---a condition parse-as-typed-literal attempts to parse an empty-property-element as typed literal. This is intended to be used when an rdf:datatype attribute is present on an empty property element. Strictly speaking, this is prohibited by the RDF/XML specification (see errata), but some RDF/XML parses output it anyway.
Each of the following so-called constants is actually a variable defined with defvar
, so there will be no error from the Lisp implementation if you try to rebind it. However, these should not be modified (as their surrounding +
's indicate). They are defvar
s rather than defconstant
s only because of limitations in the serialization of interned URIs and the desire for fast URI comparison. Modifying these values will break CL-RDFXML. (So don't modify them.)
[Constant]
+rdf-about+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#about .
[Constant]
+rdf-about-each+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#aboutEach .
[Constant]
+rdf-about-each-prefix+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#aboutEachPrefix .
[Constant]
+rdf-alt+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt .
[Constant]
+rdf-bag+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag .
[Constant]
+rdf-bag-id+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#bagID .
[Constant]
+rdf-datatype+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#datatype .
[Constant]
+rdf-description+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Description .
[Constant]
+rdf-id+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#ID .
[Constant]
+rdf-li+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#li .
[Constant]
+rdf-list+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#List .
[Constant]
+rdf-namespace+
The string prefix of RDF terms: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
[Constant]
+rdf-nil+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#nil .
[Constant]
+rdf-node-id+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#nodeID .
[Constant]
+rdf-parse-type+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#parseType .
[Constant]
+rdf-property+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property .
[Constant]
+rdf-rdf+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF .
[Constant]
+rdf-resource+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#resource .
[Constant]
+rdf-seq+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq .
[Constant]
+rdf-statement+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement .
[Constant]
+rdf-xml-literal+
An interned PURI uri: http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral .
This documentation was prepared with DOCUMENTATION-TEMPLATE.