wiki:DynamicDatasources

Version 24 (modified by nadeem, 18 years ago) (diff)

--

Adding new datasources to Gnowsis is easy, a new datasource needs to define at minimum three classes:

  1. A class implementing Aperture:DataSourceFactory
  2. A class implementing Aperture:CrawlerFactory a
  3. A class extending Gnowsis:ConfigPanel

The factory classes have to have constructors that take no parameters, so that we can instantiate them with class.newInstance().

Pack these classes up in a jar, write a sensible manifest and put them in ~/.gnowsis-beta/datasources

A quick guide to creating your own aperture datasource (in eclipse)

Here I will step through all the steps required to create a new aperture datasource and how to package it so that it can be used with gnowsis. I will create a datasource that wraps an RSS/Atom feed, and I will base it on rome.

  • Create a new Eclipse project, copy the main aperture jar (mine is aperture-2006.1-alpha-2.jar) and the 3 sesame jars into the project and add them to the build-path.
  • Create a new class that implements CrawlerFactory, tell Eclipse to add all unimplemented methods.

  • Create a new class that implements DataSourceFactory, add unimplemented methods, make newInstance return a new instance of the same class.

  • Create a new class that extends DataSourceBase - create a public static URI called TYPE, and return this in the getType(). Make DataSourceFactory.getSupportedType and CrawlerFactory.getSupportedTypes return the same.
    public static URI TYPE=new URIImpl("http://example.org/RSSDataSource");
    
  • Create a new class that extends CrawlerBase, and add a constructor that takes a DataSource, and calls super(); and setDataSource with the argument. Use this constructor to return a new crawler instance in your crawlerFactory class.
  • Now the crawlerFactory, DataSource and DataSource factory classes are all finished - only the meat remains, implementing the crawlObjects method.
  • First you have to decide what configuration options your datasource will take. Have a look at the aperture datasource schema for a selection. In my case I will use source: rootURI to specify what RSS feed to crawl. Use the aperture utility method to get this:
    RDFContainer config=source.getConfiguration();
    String root=ConfigurationUtil.getRootUrl(config);
    
    You are of course free to make up any config properties you want, but then the ConfigurationUtil class might not help you.
  • Implementing the crawlObjects method is clearly quite datasource dependent, in my case I add the rome jar and jdom, and copy some example for how to read a feed. Some other hints:
    • Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file:
      Logger log=Logger.getLogger(RSSCrawler.class.getName());
      
    • The return value of crawlObjects is taken from ExitCode, it has predefined values for you.
    • The CrawlerBase does many things for you, for example it provides a dataAccess object, which you use to check if things are new. This is essentially like a hash-table for each URI, allowing you to store key=>values for each Id, and it has predefined keys for date and size.
    • For each item you crawl, call handler.objectNew, objectNotModified or objectChanged.
    • The handler object can also supply you with RDFContainers for storing info about your items:
      RDFContainer rdf = handler.getRDFContainerFactory(this,e.getUri()).getRDFContainer(uri);
      DataObject res=new DataObjectBase(uri,source,rdf);
      		
      rdf.add(DATA.name,e.getTitle());
      rdf.add(DATA.description,e.getDescription().toString());
      
      This RDFContainer also has nice utility methods for creating the rdf content, and the DATA schema has handy properties.
    • Remember to add either Data:Name or RDFS:label to your dataobjects, otherwise Gnowsis will display "NO LABEL (default)" which is ugly!
  • Finally, create a TestMyDataSource class which implements CrawlerHandler and RDFContainerFactory, you don't have to implement many of the many unimplemented methods. This makes testing easier than using all of gnowsis... Look at my example for details.

Making your datasource play with Gnowsis

ConfigPanel

Make a class that extends JPanel and Implements org.gnogno.datasource.cfgpanels.ConfigPanel.

This can be really simple if your datasource needs very little configuration. Some things to look out for:

  • A ConfigPanel keeps track of whether changes are made - either add your components using setEditOnChange or use setEdit()
  • The newConfig method is never called so leave it blank :)

Manifest

Gnowsis needs your jar to have a manifest file with special fields to know what classes to deploy.

An example MANIFEST.MF:

Manifest-Version: 1.0
Created-By: 1.4.2_09 (Apple Computer, Inc.)
Gnowsis-DS-CrawlerFactory: org.gnowsis.data.datasource.rss.RSSCrawlerFactory
Gnowsis-DS-DataSourceFactory: org.gnowsis.data.datasource.rss.RSSDataSourceFactory
Gnowsis-DS-ConfigPanel: org.gnowsis.data.datasource.rss.RSSConfigPanel
Gnowsis-DS-Icon: /org/gnowsis/data/datasource/rss/rssicon.png
Gnowsis-DS-Label: RSS Datasource
Gnowsis-DS-Desc: A datasource for crawling RSS feeds.

Note that blank lines are not allowed.

Deployment

Copy the jar with your datasource class files and the correct manifest into ~/.gnowsis_beta/datasources Copy any libraries your datasource needs (that are not already in gnowsis!) into ~/.gnowsis_beta/datasources-libs

Download

A working example speaks louder than alot of words, so download the example I constructed on this page: http://www.dfki.uni-kl.de/~grimnes/2006/04/RSSExampleDataSource/RSSExampleDataSourceDist.zip

Attachments (2)

Download all attachments as: .zip