| | 1 | = Problems with RDFContainer = |
| | 2 | |
| | 3 | = Solution: own in-mem sail? = |
| | 4 | |
| | 5 | Leo: How about making our own in-memory sail, that does not support transactions, context, etc. |
| | 6 | It is just needed to ship RDF from the DataAccessor to the CrawlerHandler. |
| | 7 | The CrawlerHandler then has to read it again and copy all the triples to another repository |
| | 8 | |
| | 9 | Then, the default RDFContainer (in-mem sesame) must not contain "context" anymore. Context is only used when creating the RDFContainer via the factory methods of CrawlerHandler. |
| | 10 | |
| | 11 | * pro: for 80% of the use cases, it gets simpler (extracting data to things like gnowsis or Lucene) |
| | 12 | * pro: for the 20% of the use cases where the DataAccessor directly streams into the RDFContainer (=the sesame database), the CrawlerHandler can provide a context-aware RDFContainer. |
| | 13 | * con: we have to write it. |
| | 14 | |
| | 15 | = Problem autocommit = |
| | 16 | |
| | 17 | When you use a SesameRDFContainer, its Repository typically has its |
| | 18 | auto-commit mode switched off for performance reasons. However, this |
| | 19 | breaks the contract of RDFContainer's API. The reason is that statements |
| | 20 | in the repository are not visible until a commit is performed. So when |
| | 21 | you do a put() for a certain property and you later do a put() for the |
| | 22 | same property with a different value, you expect that the latter put() |
| | 23 | overwrites the former value. However, when the repository hasn't been |
| | 24 | committed yet, replaceInternal won't see the first value. Likewise, the |
| | 25 | get methods will not return a value for that property. When you finally |
| | 26 | commit, you end up with both values being stored, leading to a |
| | 27 | MultipleValuesException upon retrieval. |
| | 28 | |
| | 29 | My example classes are already crowded with commit's to make sure |
| | 30 | certain put and get methods work correctly. There is something to say |
| | 31 | for this as these classes also provide the CrawlerHandler implementation |
| | 32 | that manage the repository. However, I now had to add Sesame-specific |
| | 33 | code in WebCrawler as it depends on the overwriting capability of the |
| | 34 | put method. |
| | 35 | |
| | 36 | This is a problem with the SesameRDFContainer but I can imagine that |
| | 37 | other implementations working with persistence storage facilities will |
| | 38 | have similar issues. |
| | 39 | |
| | 40 | Although I still like the simplicity of the RDFContainer API, this is |
| | 41 | another item on the list of problems I've had with it. Does anyone see a |
| | 42 | simple solution for this? Other that adding a commit() method to |
| | 43 | RDFContainer or using the Repository's auto-commit mode (hurts |
| | 44 | performance badly)? |
| | 45 | |