wiki:PimoService

Version 14 (modified by sauermann, 18 years ago) (diff)

added idea of transformation server

TracNav?

PIMO Service

The PimoService is an implementation of the PIMO (see PIMO Technical Report), which is an improved manifestation of the Wikitology idea.

It allows the user to create things, classes and properties in his personal information model and link them together or to other (external) resources and things.

Tasks of the PIMO-Service:

  • during the first start of the system, the PIMO-Service creates and instance of Pimo-Person for the user and hangs it to the pimo-model.

some useful facts:

  • While the given labels are preserved, the URIs may differ due to syntax restrictions.
  • It is possible to create different things with the same name.
  • It is NOT possible to create different classes with the same name.

Managing Domain Ontologies

Adding ontologies, removing ontologies, updating ontologies is implemented in the PimoService. A convenient interface to these functions is implemented in the web-gui:

A list of ontologies that work with gnowsis is at DomainOntologies.

The implementation of Domain Ontologies is done using named graphs in sesame. read on at Named graphs in Pimo.

Domain ontologies are added/deleted/updated using methods of the PimoService. You can interact directly with the triples of an ontology in the store, but you have to care for inference and the correct context yourself then.

Validation of PIMO Models - PimoChecker

The semantics of the PIMO language allow us to verify the integrity of the data. In normal RDF/S semantics, verification is not possible. For example, setting the domain of the property knows to the class Person, and then using this property on an instance Rome Business Plan of class Document creates, using RDF/S, the new information that the Document is also a Person. In the PIMO language, domain and range restrictions are used to validate the data. The PIMO is checked using a Java Object called PimoChecker, that encapsulates a Jena reasonser to do the checking and also does more tricks:

The following rules describe what is validated in the PIMO, a formal description is given in the gnowsis implementation's PIMO rule file.

  • All relating properties need inverse properties.
  • Check domain and range of relating and describing properties.
  • Check domain and range for rdf:type statements
  • Cardinality restrictions using the protege statements
  • Rdfs:label is mandatory for instances of ”Thing” and classes
  • Every resource that is used as object of a triple has to have a rdf:type set. This is a prerequisite for checking domains and ranges.

Above rules are checking semantic modeling errors, that are based on errors made by programmers or human users. Following are rules that check if the inference engine correctly created the closure of the model: –

  • All statements that have a predicate that has an inverse defined require another triple in the model representing the inverse statement.

The rules work only, when the language constructs and upper ontology are part of the model that is validated. For example, validating Paul’s PIMO is only possible when the PIMO-Basic and PIMO-Upper is available to the inference engine, otherwise the definition of the basic classes and properties are missing. The validation can be used to restrict updates to the data model in a way that only valid data can be stored into the database. Or, the model can be validated on a regular basis after the changes were made. In the gnowsis prototype, validation was activated during automatic tests of the system, to verify that the software generates valid data in different situations. Ontologies are also validated during import to the ontology store. Before validating a new ontology, it’s import declarations have to be satisfied. The test begins by building a temporal ontology model, where first the ontology under test and then all imported ontologies are added. If an import cannot be satisfied, because the required ontology is not already part of the system, either the missing part could be fetched from the internet using the ontology identifier as URL, or the user can be prompted to import the missing part first. When all imports are satisfied, the new ontology under test is validated and added to the system. A common mistake at this point is to omit the PIMO-Basic and PIMO-Upper import declarations. By using this strict testing of ontologies, conceptual errors show at an early stage. Strict usage of import-declarations makes dependencies between ontologies explicit, whereas current best practice in the RDF/S based semantic web community has many implicit imports that are often not leveraged.

A vision to publish and download valid PIMO ontologies

This is an idea that should be implemented on top of Nepomuk to keep our ontologies valid.

Users want to import as many RDF as possible into their semantic desktop. For example, they send each other RDF via e-mail or download descriptions of projects (DOAP) from websites or download FOAF files from websites, etc. We do not want to forbid the import of any RDF, but on the other hand invalid RDF breaks the inferencer and breaks other parts (for example invalid files cannot be removed so easily). So Leo recommends to use a quarantine for imported RDF, a contamination barrier in front of the desktop store that keeps invalid RDF out. This quarantine works in a way that users can put imported RDF in the quarantine and let a program run on it until the new RDF is valid.

The approach to import ontologies from outside sources would be:

  • Test if the new RDF is valid PIMO rdf. If yes, allow the import of the RDF, using the provenance information of the graph as named graph in the PimoStore.
  • If the new file is invalid, start a semi-automatic import assistant. This assistant tries to fix as many things as possible using the following heuristics:
    • check what ontology language the new RDF uses: OWL or RDFS. If this is detected, use some RDFS and OWL specific transformation scripts as preprocessing
    • for any invalidity, see if there is a default way to fix it
    • determine the imported ontologies by looking at the namespaces
    • download imported ontologies by the namespaces
  • Present the user a status of the import assistant, saying what steps where taken to make the RDF valid.
  • If the RDF is still invalid, offer actions to make the RDF valid. Such actions can be
    • write an import script that fixes the errors
    • search for import scripts that fixed these bugs before
    • express graph transformation using a SPARQL construct-like language. This should allow to replace, delete, or add triples.
  • at the end, save all actions taken into an import script that summarizes the actions that need to be taken to import RDF from the given source.
  • the valid RDF is imported.

As the user is assisted through all these steps, and intelligent default values are entered beforehand, it may be that many RDF graphs can be made valid through this import assistant. We hope that this approach keeps invalid RDF out of the store and on the other hand lets users import as many external RDF sources as possible. Similar to piggy-bank, scripts are needed for this task.

The interesting part is now, that this import assistant can be realised as centralised online (web 2.0 like) application. Users can let the online service "pimo transformation server" run its magic on any RDF they find in the net. If one user writes a useful import script for, say, FOAF, then the script is stored at the transformation server. So the transformation scripts are "user generated content" and shared within the Gnowsis/Nepomuk commmunity. A core element in this approach are tools like the "Exobot" (source:branches/gnowsis0.9/gnowsis-server/src/java/org/gnowsis/exobot/Exobot.java) or Haystack's adenine programming language. The scripts to transform RDF from one state to another can be written using a combination of SPARQL, inference rules, and other operations. There is no need to install a runtime for this language, one installation of the transformation engine at the transformation server is a good start.

The goal here is that users can import as much RDF as possible, and if one user found out how to transform data into valid PIMO, then this "how-to" information is stored into a script, that can be used by the next user. Based on paths of source files or classes found inside the new rdf, it is easy to program a case-based-reasoning machine that suggests which script may be used to make a file valid.

The question is: does this approach bring us to our goal?