Version 10 (modified by grimnes, 19 years ago) (diff) |
---|
Adding new datasources to Gnowsis is easy, a new datasource needs to define at minimum three classes:
- A class implementing Aperture:DataSourceFactory
- A class implementing Aperture:CrawlerFactory a
- A class extending Gnowsis:ConfigPanel
The factory classes have to have constructors that take no parameters, so that we can instantiate them with class.newInstance().
Pack these classes up in a jar, write a sensible manifest and put them in ~/.gnowsis-beta/datasources
An example MANIFEST.MF:
Manifest-Version: 1.0 Created-By: 1.4.2_09 (Apple Computer, Inc.) Gnowsis-DS-CrawlerFactory: org.gnowsis.data.datasource.rss.RSSCrawlerFactory Gnowsis-DS-DataSourceFactory: org.gnowsis.data.datasource.rss.RSSDataSourceFactory Gnowsis-DS-ConfigPanel: org.gnowsis.data.datasource.rss.RSSConfigPanel Gnowsis-DS-Icon: /org/gnowsis/data/datasource/rss/rssicon.png Gnowsis-DS-Label: RSS Datasource Gnowsis-DS-Desc: A datasource for crawling RSS feeds.
Note that blank lines are not allowed.
A quick guide to creating your own aperture datasource (in eclipse)
Here I will step through all the steps required to create a new aperture datasource and how to package it so that it can be used with gnowsis. I will create a datasource that wraps an RSS/Atom feed, and I will base it on rome.
- Create a new Eclipse project, copy the main aperture jar (mine is aperture-2006.1-alpha-2.jar) and the 3 sesame jars into the project and add them to the build-path.
- Create a new class that implements CrawlerFactory, tell Eclipse to add all unimplemented methods.
- Create a new class that implements DataSourceFactory, add unimplemented methods, make newInstance return a new instance of the same class.
- Create a new class that extends DataSourceBase - create a public static URI called TYPE, and return this in the getType(). Make DataSourceFactory.getSupportedType and CrawlerFactory.getSupportedTypes return the same.
public static URI TYPE=new URIImpl("http://example.org/RSSDataSource");
- Create a new class that extends CrawlerBase, and add a constructor that takes a DataSource, and calls super(); and setDataSource with the argument. Use this constructor to return a new crawler instance in your crawlerFactory class.
- Now the crawlerFactory, DataSource and DataSource factory classes are all finished - only the meat remains, implementing the crawlObjects method.
- First you have to decide what configuration options your datasource will take. Have a look at the aperture datasource schema for a selection. In my case I will use source:rootURI to specify what RSS feed to crawl. Use the aperture utility method to get this:
RDFContainer config=source.getConfiguration(); String root=ConfigurationUtil.getRootUrl(config);
You are of course free to make up any config properties you want, but then the ConfigurationUtil class might not help you. - Implementing the crawlObjects method is clearly quite datasource dependent, in my case I add the rome jar and jdom, and copy some example for how to read a feed. Some other hints:
- Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file:
Logger log=Logger.getLogger(RSSCrawler.class.getName());
- The return value of crawlObjects is taken from ExitCode, it has predefined values for you.
- The CrawlerBase does many things for you, for example it provides a dataAccess object, which you use to check if things are new.
- For each item you crawl, call handler.objectNew, objectNotModified or objectChanged.
- Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file:
Attachments (2)
- Picture 1.png (55.1 KB) - added by grimnes 19 years ago.
- Picture 2.png (50.2 KB) - added by grimnes 19 years ago.
Download all attachments as: .zip