Changes between Initial Version and Version 1 of ApertureRdfContainerProblems


Ignore:
Timestamp:
02/01/06 10:51:42 (19 years ago)
Author:
sauermann
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ApertureRdfContainerProblems

    v1 v1  
     1= Problems with RDFContainer = 
     2 
     3= Solution: own in-mem sail? = 
     4 
     5Leo: How about making our own in-memory sail, that does not support transactions, context, etc. 
     6It is just needed to ship RDF from the DataAccessor to the CrawlerHandler. 
     7The CrawlerHandler then has to read it again and copy all the triples to another repository 
     8 
     9Then, the default RDFContainer (in-mem sesame) must not contain "context" anymore. Context is only used when creating the RDFContainer via the factory methods of CrawlerHandler. 
     10 
     11 * pro: for 80% of the use cases, it gets simpler (extracting data to things like gnowsis or Lucene) 
     12 * pro: for the 20% of the use cases where the DataAccessor directly streams into the RDFContainer (=the sesame database), the CrawlerHandler can provide a context-aware RDFContainer. 
     13 * con: we have to write it. 
     14 
     15=  Problem autocommit = 
     16 
     17When you use a SesameRDFContainer, its Repository typically has its  
     18auto-commit mode switched off for performance reasons. However, this  
     19breaks the contract of RDFContainer's API. The reason is that statements  
     20in the repository are not visible until a commit is performed. So when  
     21you do a put() for a certain property and you later do a put() for the  
     22same property with a different value, you expect that the latter put()  
     23overwrites the former value. However, when the repository hasn't been  
     24committed yet, replaceInternal won't see the first value. Likewise, the  
     25get methods will not return a value for that property. When you finally  
     26commit, you end up with both values being stored, leading to a  
     27MultipleValuesException upon retrieval. 
     28 
     29My example classes are already crowded with commit's to make sure  
     30certain put and get methods work correctly. There is something to say  
     31for this as these classes also provide the CrawlerHandler implementation  
     32that manage the repository. However, I now had to add Sesame-specific  
     33code in WebCrawler as it depends on the overwriting capability of the  
     34put method. 
     35 
     36This is a problem with the SesameRDFContainer but I can imagine that  
     37other implementations working with persistence storage facilities will  
     38have similar issues. 
     39 
     40Although I still like the simplicity of the RDFContainer API, this is  
     41another item on the list of problems I've had with it. Does anyone see a  
     42simple solution for this? Other that adding a commit() method to  
     43RDFContainer or using the Repository's auto-commit mode (hurts  
     44performance badly)? 
     45