With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and inferencing fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale inferencing and data access with an efficient data structure for storing and querying this accessed data on a traditional personal computer or smaller embedded device. We present results of using this system to load the Billion Triples Challenge dataset, fully materialize RDFS inferences, and extract an ``interesting'' subset of the data using a large cluster, and further analyze the extracted data using a traditional personal computer.
| Inferencing | 349 sec |
| Extracting/Reducing | 940 sec |
| BitMat Creation | 25 sec |
| TOTAL | 1,314 sec (~22 min) |
(Note that the closure was computed with the exclusion of rules lg, gl, rdfs1, rdfs4a, rdfs4b, rdfs6, rdfs8, and rdfs10, and that it also excludes extensions of RDF/S terms like [my:subclass rdfs:subPropertyOf rdfs:subClassOf].)