Adding new datasources to Gnowsis is easy, a new datasource needs to define at minimum three classes: 1. A class implementing Aperture:DataSourceFactory 1. A class implementing Aperture:CrawlerFactory a 1. A class extending Gnowsis:ConfigPanel The factory classes have to have constructors that take no parameters, so that we can instantiate them with class.newInstance(). Pack these classes up in a jar, write a sensible manifest and put them in ~/.gnowsis-beta/datasources An example MANIFEST.MF: {{{ Manifest-Version: 1.0 Created-By: 1.4.2_09 (Apple Computer, Inc.) Gnowsis-DS-CrawlerFactory: org.gnowsis.data.datasource.rss.RSSCrawlerFactory Gnowsis-DS-DataSourceFactory: org.gnowsis.data.datasource.rss.RSSDataSourceFactory Gnowsis-DS-ConfigPanel: org.gnowsis.data.datasource.rss.RSSConfigPanel Gnowsis-DS-Icon: /org/gnowsis/data/datasource/rss/rssicon.png Gnowsis-DS-Label: RSS Datasource Gnowsis-DS-Desc: A datasource for crawling RSS feeds. }}} Note that blank lines are not allowed. = A quick guide to creating your own aperture datasource (in eclipse) = Here I will step through all the steps required to create a new aperture datasource and how to package it so that it can be used with gnowsis. I will create a datasource that wraps an RSS/Atom feed, and I will base it on [https://rome.dev.java.net/ rome]. * Create a new Eclipse project, copy the main aperture jar (mine is aperture-2006.1-alpha-2.jar) and the 3 sesame jars into the project and add them to the build-path. * Create a new class that implements CrawlerFactory, tell Eclipse to add all unimplemented methods. * Create a new class that implements DataSourceFactory, add unimplemented methods, make newInstance return a new instance of the same class. * Create a new class that extends DataSourceBase - create a public static URI called TYPE, and return this in the getType(). Make DataSourceFactory.getSupportedType and CrawlerFactory.getSupportedTypes return the same. {{{ public static URI TYPE=new URIImpl("http://example.org/RSSDataSource"); }}} * Create a new class that extends CrawlerBase, and add a constructor that takes a DataSource, and calls super(); and setDataSource with the argument. Use this constructor to return a new crawler instance in your crawlerFactory class. * Now the crawlerFactory, DataSource and DataSource factory classes are all finished - only the meat remains, implementing the crawlObjects method. * First you have to decide what configuration options your datasource will take. Have a look at [http://aperture.sourceforge.net/ontology/source.rdfs the aperture datasource schema] for a selection. In my case I will use source:rootURI to specify what RSS feed to crawl. Use the aperture utility method to get this: {{{ RDFContainer config=source.getConfiguration(); String root=ConfigurationUtil.getRootUrl(config); }}} You are of course free to make up any config properties you want, but then the ConfigurationUtil class might not help you. * Implementing the crawlObjects method is clearly quite datasource dependent, in my case I add the rome jar and jdom, and copy some example for how to read a feed. Some other hints: * Gnowsis uses java.util.logging for logging, so to get useful debugging message add this the top of your file: {{{ Logger log=Logger.getLogger(RSSCrawler.class.getName()); }}} * The return value of crawlObjects is taken from ExitCode, it has predefined values for you. * The CrawlerBase does many things for you, for example it provides a dataAccess object, which you use to check if things are new. This is essentially like a hash-table for each URI, allowing you to store key=>values for each Id, and it has predefined keys for date and size. * For each item you crawl, call handler.objectNew, objectNotModified or objectChanged. * The handler object can also supply you with RDFContainers for storing info about your items: {{{ RDFContainer rdf = handler.getRDFContainerFactory(this,e.getUri()).getRDFContainer(uri); DataObject res=new DataObjectBase(uri,source,rdf); rdf.add(DATA.name,e.getTitle()); rdf.add(DATA.description,e.getDescription().toString()); }}} This RDFContainer also has nice utility methods for creating the rdf content, and the [http://aperture.sourceforge.net/ontology/data.rdfs DATA schema] has handy properties. * Finally, create a TestMyDataSource class which implements CrawlerHandler and RDFContainerFactory, you don't have to implement many of the many unimplemented methods. This makes testing easier than using all of gnowsis... Look at my example for details.