Version 9 (modified by sauermann, 19 years ago) (diff) |
---|
This will be implemented by Gunnar Grimnes and Daniel Burkhart in ticket:92 we make more tickets for subtasks.
We need JenaSesameCrossovers
Needed Services
We need rdf storage in different parts of gnowsis. Each storage has a reason and different requirements.
- configuration store: part of gnowsis-config api. Is used to configure the services (hostname, passwords, services, etc). It has to be stored in an N3-RDF to be able to hack it if something goes wrong. This relates to ticket:107.
- pimo-store: all things of the user, the pimo ontology itself, the imported domain ontologies, the things of the user, links to occurrences, wiki pages, wiki names, all.... Features: text indexing (using the lucene SAIL, configured to what is indexed). Named graphs are not really used? Reification is used to say when a triple was inserted when its a user triple like pimo:occurrence. Pimo-store replaces the Central-Rpository!
- resource-store: stores all resources (pimo:ResourceManifestation) of the user. Its filled by Aperture and sometimes by the applications that want to add a few resources before they are annotated (using pimo:occurrence). No other parts of pimo go there, especially no pimo:occurrence links. Needs fulltext indexing using LuceneSail(configured by Aperture ontology and PIMO-ResourceManifestation) OR Catwiesel. Named graphs are used to seperate single resources, each ResourceManifestation gets a named graph.
- service-data-store: used by services like Thumbnailservice, IconService, ContextService, etc to store the RDF data they need to work. Named graphs are used to seperate the services. No text-indexing needed (no LuceneSail). Interfaces: Sesame2, Jena ModelDatabase..?
Startup sequence for the stores is:
- first config store to get the thing up
- then service-data-store to get the config data for the services available
- then resource-store and pimo-store.
Common Interfaces we need in different services:
- Named graphs: in Sesame2 they are Repository/Sail, in Jena they are SPARQL-DataSource
Logical Services to provide:
- big trick: write a wrapper that wraps Sesame2 as Jena Named graph (NG4J) and as a normal Jena Model. Vice Versa: have a sesame2 repository that wrapper for Jena Models. The native Jena Model is neede for inference perhaps. What might happen is that the pimo-store is stacked with one Jena model in between for the PIMO inference.
- Gnowsis CentralHub API: implement the ususal features. SEe the current centralhub.
Approach
at the moment, we have two major features and key factors in gnowsis:
- fast storage of masses of RDF in a quad-store (context-aware triplestore) and having this with SPARQL available
- fulltext search
- publishing this RDF store with an API that can be easily programmed.
At the moment, these three goals are both realised using Jena and home-grown software. The first features is implemented using Jena and SPARQL2SQL. SPARQL2SQL is not maintained anymore and has serious performance problems on insert, especially when we activate our bad hack for mysql. The second solution at the moment is a mysql hack. We also had to implement client and server APIs to support the third feature.
at the moment we have:
- Jena and SPARQL2SQL
- a MYSQL hack that crashes from time to time
- an API that is expensive to maintain and to use - the gnowsis Repository and CentralHub? API
So, by switching to Sesame2 we hope to have a more durable solution in the future, because:
- sesame supports a triple store and quad store in sesame2
- sesame supports lucene SAILs (or will support them soon) to enable fulltext search
- sesame has a well-known API that can be used from many applications
- sesame does not depend on any third party apps like MySQL - their native SAIL is said to be performant and scalable
Especially the last point will have spare us some trouble. So we will still need something for the CentralHub? API but the repository and ontManager APIs can be replaces or enhanced completely using Sesame2.
Another reason to switch to Sesame2 is our dedication to Aperture. BEFORE this we evaluate
Sesame2 has to be stressed-tested before, reagrding its SPARQL capabilities, etc:
- SPARQL
- QUADS
- Fulltext-search ( or with catwiesel )
- fast inserting of data
Steps to do - Sesame2
learning Sesame2- which parts in gnowsis have to be replaced? (which functions are important?)
- which parts of gnowsis can be deleted?
- We need an architecture overview (graphical)
- do everything from scratch
- API'S have to be implemented
(striked-through items have been done)
learning Sesame2 (4 hours -> 12.01.2006)
- get it from Sourceforge-CVS, project: openRDF, http://cvs.sourceforge.net/viewcvs.py/sesame/openrdf/
- using Sesame Server or Sesame Library?
- Useful Documentation: http://www.openrdf.org/doc/sesame/users/userguide.html#chapter-api, specially chapter 7 seems to be interesting
- writing some basic examples like:
- creating/accessing a repository
- adding RDF data to repository
- querying a repository
- update from 12.01.2006:
- Sesame 2.0-alpha-1 is now available for download from http://www.openrdf.org/download.jsp
- Also a documentation draft is now online
- done:
- Due to learn the Sesame2 Architecture I've worked through the documentation and added some code examples in the gnowsis exampleservice
which parts of gnowsis have to be replaced? (2 hours - 12.01.2006)
Currently gnowsis uses Jena as Storage and Sparql for querying the Jena storage from the MySQL database.In which packages is this implemented? What is still needed?
which parts of gnowsis can be deleted? (2 hours - 12.01.2006)
most code inside this package can be removed: source:trunk/gnowsis/src/org/gnowsis/repository
graphical architecture overview (4 hours - 18.01.2006)
do everything from scratch (? hours - Gnowsis weekend)
- how to start?
- implement the APIs
- integration to Gnowsis/GUI
API's have to be implemented (? hours - Gnowsis weekend)
- Central Repository: store triples persistently. Create new models, delete models.
- Ontology Manager: Helps to access ontology information.
- Gnowsis Search:
- Central Hub: The CentralHub is a Model that contains many other models. It is a conglomerate of all the adapter models and the central repository model.
Steps to do - Aperture
- learning aperture
- Aperture will displace some packages from Gnowsis
- start, stop, crawl?
- integration with/to sesame?
- Outlook adapter
learning aperture (4 hours)
project page: http://aperture.sourceforge.net/
Aperture will displace some packages from Gnowsis
This will be the package source:trunk/gnowsis/src/org/gnowsis/adapters and source:trunk/gnowsis/src/org/gnowsis/data
start, stop, crawl
integration with/to sesame
outlook adapter
Result
Everything into the new Gnowsis Server project