wiki:ApertureRDF

Version 5 (modified by anonymous, 19 years ago) (diff)

--

The Use of RDF in Aperture

We should discuss where and how RDF is used in this framework. In previous email discussions we already thought about using RDF as a way to let an Extractor output its extracted information, because of the flexibility it provides:

  • no assumption on what the metadata looks like, can be very simple or very complex
  • easy to store in RDF stores, no transformation necessary (provided that you have named graphs support)

Also, this would be a unique selling point, making it stand apart from projects like Nutch, Zilverline, etc., which also provide frameworks for extracting and handling full-text and metadata.

This design decision also applies to DataObjects, which now use a Map with dedicated keys, defined per DataObject type. I would be in favour of changing this to "something RDF", as it considerably eases development.

Leo came up with an idea that allows delivering RDF while at the same time providing a simpler interface to programmers not knowledgeable in RDF. The idea is to create a class that implements both the org.openrdf.model.Graph interface as well as the java.util.Map interface. The effect of

result.put(authorURI, "chris");

with the authorURI being equal to the URI of the author predicate, would then be equal to

result.add(documentURI, authorURI, "chris");

I.e., you can use the Map methods to insert simple resource-predicate-literal statements (the majority), which is simple to document and understand, whereas people who know what they are doing can also add arbitrary RDF statements.

Unfortunately, Graph will soon be removed/is removed from the Rio library, meaning that we become dependent on the entire Sesame library.