Adding new datasources to Gnowsis is easy, a new datasource needs to define at minimum three classes: 1. A class implementing Aperture:DataSourceFactory 1. A class implementing Aperture:CrawlerFactory a 1. A class extending Gnowsis:ConfigPanel The factory classes have to have constructors that take no parameters, so that we can instantiate them with class.newInstance(). Pack these classes up in a jar, write a sensible manifest and put them in ~/.gnowsis-beta/datasources An example MANIFEST.MF: {{{ Manifest-Version: 1.0 Created-By: 1.4.2_09 (Apple Computer, Inc.) Gnowsis-DS-CrawlerFactory: Gnowsis-DS-DataSourceFactory: Gnowsis-DS-ConfigPanel: Gnowsis-DS-Icon: /org/gnowsis/data/datasource/rss/rssicon.png Gnowsis-DS-Label: RSS Datasource Gnowsis-DS-Desc: A datasource for crawling RSS feeds. }}} Note that blank lines are not allowed. = A quick guide to creating your own aperture datasource (in eclipse) = Here I will step through all the steps required to create a new aperture datasource and how to package it so that it can be used with gnowsis. I will create a datasource that wraps an RSS/Atom feed, and I will base it on [ rome]. * Create a new Eclipse project, copy the main aperture jar (mine is aperture-2006.1-alpha-2.jar) and the 3 sesame jars into the project and add them to the build-path. * Create a new class that implements CrawlerFactory, tell Eclipse to add all unimplemented methods. * Create a new class that implements DataSourceFactory, add unimplemented methods, make newInstance return a new instance of the same class. * Create a new class that extends DataSourceBase - create a public static URI called TYPE, and return this in the getType(). Make DataSourceFactory.getSupportedType and CrawlerFactory.getSupportedTypes return the same. {{{ public static URI TYPE=new URIImpl(""); }}} * Create a new class that extends CrawlerBase, and add a constructor that takes a DataSource, and calls super(); and setDataSource with the argument. Use this constructor to return a new crawler instance in your crawlerFactory class. * Now the crawlerFactory, DataSource and DataSource factory classes are all finished - only the meat remains, implementing the crawlObjects method. * First you have to decide what configuration options your datasource will take. Have a look at [ the aperture datasource schema] for a selection. In my case I will use source:rootURI to specify what RSS feed to crawl. Use the aperture utility method to get this: {{{ RDFContainer config=source.getConfiguration(); String root=ConfigurationUtil.getRootUrl(config); }}} You are of course free to make up any config properties you want, but then the ConfigurationUtil class might not help you. * Implementing the crawlObjects method is clearly quite datasource dependent, in my case I add the rome jar and jdom, and copy some example for how to read a feed. Some other hints: * Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file: {{{ Logger log=Logger.getLogger(RSSCrawler.class.getName()); }}} * The return value of crawlObjects is taken from ExitCode, it has predefined values for you. * The CrawlerBase does many things for you, for example it provides a dataAccess object, which you use to check if things are new. This is essentially like a hash-table for each URI, allowing you to store key=>values for each Id. * For each item you crawl, call handler.objectNew, objectNotModified or objectChanged.