wiki:DynamicDatasources

Version 11 (modified by grimnes, 19 years ago) (diff)

--

Adding new datasources to Gnowsis is easy, a new datasource needs to define at minimum three classes:

  1. A class implementing Aperture:DataSourceFactory
  2. A class implementing Aperture:CrawlerFactory a
  3. A class extending Gnowsis:ConfigPanel

The factory classes have to have constructors that take no parameters, so that we can instantiate them with class.newInstance().

Pack these classes up in a jar, write a sensible manifest and put them in ~/.gnowsis-beta/datasources

An example MANIFEST.MF:

Manifest-Version: 1.0
Created-By: 1.4.2_09 (Apple Computer, Inc.)
Gnowsis-DS-CrawlerFactory: org.gnowsis.data.datasource.rss.RSSCrawlerFactory
Gnowsis-DS-DataSourceFactory: org.gnowsis.data.datasource.rss.RSSDataSourceFactory
Gnowsis-DS-ConfigPanel: org.gnowsis.data.datasource.rss.RSSConfigPanel
Gnowsis-DS-Icon: /org/gnowsis/data/datasource/rss/rssicon.png
Gnowsis-DS-Label: RSS Datasource
Gnowsis-DS-Desc: A datasource for crawling RSS feeds.

Note that blank lines are not allowed.

A quick guide to creating your own aperture datasource (in eclipse)

Here I will step through all the steps required to create a new aperture datasource and how to package it so that it can be used with gnowsis. I will create a datasource that wraps an RSS/Atom feed, and I will base it on rome.

  • Create a new Eclipse project, copy the main aperture jar (mine is aperture-2006.1-alpha-2.jar) and the 3 sesame jars into the project and add them to the build-path.
  • Create a new class that implements CrawlerFactory, tell Eclipse to add all unimplemented methods.
  • Create a new class that implements DataSourceFactory, add unimplemented methods, make newInstance return a new instance of the same class.
  • Create a new class that extends DataSourceBase - create a public static URI called TYPE, and return this in the getType(). Make DataSourceFactory.getSupportedType and CrawlerFactory.getSupportedTypes return the same.
    public static URI TYPE=new URIImpl("http://example.org/RSSDataSource");
    
  • Create a new class that extends CrawlerBase, and add a constructor that takes a DataSource, and calls super(); and setDataSource with the argument. Use this constructor to return a new crawler instance in your crawlerFactory class.
  • Now the crawlerFactory, DataSource and DataSource factory classes are all finished - only the meat remains, implementing the crawlObjects method.
  • First you have to decide what configuration options your datasource will take. Have a look at the aperture datasource schema for a selection. In my case I will use source:rootURI to specify what RSS feed to crawl. Use the aperture utility method to get this:
    RDFContainer config=source.getConfiguration();
    String root=ConfigurationUtil.getRootUrl(config);
    
    You are of course free to make up any config properties you want, but then the ConfigurationUtil class might not help you.
  • Implementing the crawlObjects method is clearly quite datasource dependent, in my case I add the rome jar and jdom, and copy some example for how to read a feed. Some other hints:
    • Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file:
      Logger log=Logger.getLogger(RSSCrawler.class.getName());
      
    • The return value of crawlObjects is taken from ExitCode, it has predefined values for you.
    • The CrawlerBase does many things for you, for example it provides a dataAccess object, which you use to check if things are new. This is essentially like a hash-table for each URI, allowing you to store key=>values for each Id.
    • For each item you crawl, call handler.objectNew, objectNotModified or objectChanged.

Attachments (2)

Download all attachments as: .zip