| 1 | = Problems with RDFContainer = |
| 2 | |
| 3 | = Solution: own in-mem sail? = |
| 4 | |
| 5 | Leo: How about making our own in-memory sail, that does not support transactions, context, etc. |
| 6 | It is just needed to ship RDF from the DataAccessor to the CrawlerHandler. |
| 7 | The CrawlerHandler then has to read it again and copy all the triples to another repository |
| 8 | |
| 9 | Then, the default RDFContainer (in-mem sesame) must not contain "context" anymore. Context is only used when creating the RDFContainer via the factory methods of CrawlerHandler. |
| 10 | |
| 11 | * pro: for 80% of the use cases, it gets simpler (extracting data to things like gnowsis or Lucene) |
| 12 | * pro: for the 20% of the use cases where the DataAccessor directly streams into the RDFContainer (=the sesame database), the CrawlerHandler can provide a context-aware RDFContainer. |
| 13 | * con: we have to write it. |
| 14 | |
| 15 | = Problem autocommit = |
| 16 | |
| 17 | When you use a SesameRDFContainer, its Repository typically has its |
| 18 | auto-commit mode switched off for performance reasons. However, this |
| 19 | breaks the contract of RDFContainer's API. The reason is that statements |
| 20 | in the repository are not visible until a commit is performed. So when |
| 21 | you do a put() for a certain property and you later do a put() for the |
| 22 | same property with a different value, you expect that the latter put() |
| 23 | overwrites the former value. However, when the repository hasn't been |
| 24 | committed yet, replaceInternal won't see the first value. Likewise, the |
| 25 | get methods will not return a value for that property. When you finally |
| 26 | commit, you end up with both values being stored, leading to a |
| 27 | MultipleValuesException upon retrieval. |
| 28 | |
| 29 | My example classes are already crowded with commit's to make sure |
| 30 | certain put and get methods work correctly. There is something to say |
| 31 | for this as these classes also provide the CrawlerHandler implementation |
| 32 | that manage the repository. However, I now had to add Sesame-specific |
| 33 | code in WebCrawler as it depends on the overwriting capability of the |
| 34 | put method. |
| 35 | |
| 36 | This is a problem with the SesameRDFContainer but I can imagine that |
| 37 | other implementations working with persistence storage facilities will |
| 38 | have similar issues. |
| 39 | |
| 40 | Although I still like the simplicity of the RDFContainer API, this is |
| 41 | another item on the list of problems I've had with it. Does anyone see a |
| 42 | simple solution for this? Other that adding a commit() method to |
| 43 | RDFContainer or using the Repository's auto-commit mode (hurts |
| 44 | performance badly)? |
| 45 | |