= The Use of RDF in Aperture = We should discuss where and how RDF is used in this framework. In previous email discussions we already thought about using RDF as a way to let an Extractor output its extracted information, because of the flexibility it provides: * no assumption on what the metadata looks like, can be very simple or very complex * easy to store in RDF stores, no transformation necessary (provided that you have named graphs support) Also, this would be a unique selling point, making it stand apart from projects like [http://lucene.apache.org/nutch/ Nutch], [http://www.zilverline.org/zilverlineweb/space/home Zilverline], [http://www.bibl.ulaval.ca/lius/index.en.html Lius] etc., which also provide frameworks for extracting and handling full-text and metadata. This design decision also applies to !DataObjects, which now use a Map with dedicated keys, defined per !DataObject type. I would be in favour of changing this to "something RDF", as it considerably eases development. Leo came up with an idea that allows delivering RDF while at the same time providing a simpler interface to programmers not knowledgeable in RDF. The idea is to create a class that implements both the org.openrdf.model.Graph interface as well as the java.util.Map interface. The effect of result.put(authorURI, "chris"); with the authorURI being equal to the URI of the author predicate, would then be equal to result.add(documentURI, authorURI, "chris"); I.e., you can use the Map methods to insert simple resource-predicate-literal statements (the majority), which is simple to document and understand, whereas people who know what they are doing can also add arbitrary RDF statements. The concrete manifestation of these ideas can now be found in '''wiki:ApertureRDFMap'''. == Sesame 2 == Some notes on the development of Sesame 2 and how it applies to Aperture. The Sesame guys are rounding up their last efforts for releasing an alpha version. This means the code is still under development, although core interfaces are stabilizing. One change is that the model and model.impl packages are removed from Rio. Furthermore, Graph and !GraphImpl have been removed from these packages, as well as all methods that change something in the RDF structure (e.g. Resource.addProperty). Arjohn explained this decision to me as follows. Have a look at the architecture graphic on http://www.openrdf.org/doc/sesame2/system/ch02.html. The RDF Model at the bottom is the foundation for the rest of the system to manipulate RDF information. It is very awkward and may potentially result in problems when you are able to manipulate the RDF at the model level, as it bypasses the Sail stack and any inferencing, security restrictions, etc. that takes place in it. Therefore these interfaces from now on provide read-only information only. This way it is for example not possible to add properties to Resources obtained from a query result, which may result in undefined behaviour. If you want to manipulate statements, the Repository class is the way to go. It contains methods for adding triples as well as functionality for posing queries, extracting RDF, etc. Furthermore, a number of utility classes (URIVertex, !LiteralVertex, ...) are provided that take a Repository as argument and that let you treat the RDF statements as a graph datastructure. The only drawback of the Repository class is that it's quite a big class (note that it is a class and not an interface!). Also, just creating a repository is not enough, it always operates on top of a Sail. This architecture provides great flexibility at the cost of more code complexity. Example: you want to create an in-memory RDF "container" that you can pass to an Extractor: {{{ #!java Repository repository = new Repository(new MemoryStore()); repository.initialize(); extractor.doYourWork(docURI, repository); }}} Since we're now passing the Repository, we should also pass the document URI so that the Extractor knows around which resource it has to create a CBD. The Extractor may then do something as follows: {{{ #!java repository.add(docURI, Vocabulary.titleURI, new LiteralVertex(repository, titleString); repository.add(docURI, Vocabulary.fullTextURI, new LiteralVertex(repository, fullText); }}} assuming the full text is put as a literal in the RDF. The following code uses a graph-oriented approach, but its effect is exactly the same: {{{ #!java URIVertex docVertex = new URIVertex(repository, docURI); ... (waiting for Sesame 2 Javadoc to be updated ;) ) }}} Since the Repository is specified as a parameter to the Extractor, starting and committing any transactions is the responsibility of the integrator. In case of the memory store, this can even be omitted. = Conclusion Leo = Since the sesame2 repository api provides less usability than Leo wishes to provide for developers, I would suggest to stick to the RDFMap idea and provide about 10-20 methods there that "do the trick". In the methods I would mix resource-based and model based methods, for ease of implementation. Although it would be fine to have a real "abstract" layer, I still hesitate to use rdf2go, because it is java 1.5 and not adapted to sesame2 yet. Will ask Max though, if he can do these two things.