wiki:ApertureOSGi

Version 13 (modified by mylka, 18 years ago) (diff)

--

The Use of OSGi in Aperture

Both Aduna and DFKI are in favour of using OSGi as a way to bundle these components. It seems to be a good platform for running and maintaining applications with a large set of plugin-like components.

Furthermore, the use of bundle-local classpaths may provide befeficiary. Consider for example the TextMining library for Word extraction which bundles a partial and unknown version of the POI libraries.

At Aduna we have followed a specific way of modeling services, using a factory for every implementation of a service, and a separate registry that registers all implementations of a specific service. It is the responsibility of the bundle activator of a service to register an instance of a service implementation's factory with the service registry.

This has two benefits:

  • It allows for a very light-weight initialization of the system, provided that creation of a factory instance is very light-weight.
  • All bundle activation code is kept outside of the factory, meaning that the code/bundle jar file is also usable in a non-OSGi application.

For Aperture, this means that the project is divided into several bundles:

  • one bundle with the core engine including the registries and all interfaces
  • for each ApertureExtractor (or several for convenience, like "extractor pack") one bundle

Chris> I think the DataSource API, a DataSource implementation, the DataCrawler API and the DataCrawler implementation should all be separate bundles. This is the only way to ensure that you can use different crawlers for the same DataSource.

Currenly, Aduna and DFKI think that we should base our code only on pure OSGi code (i.e. org.osgi.*) and not use any other utilities such as the DependencyManager that's currently used in the Aduna code. Perhaps Herko can tell us more about what we're in for, because we both have hardly any experience with OSGi yet.

Comments by Herko:

Ideally, libraries like this should contain no OSGi-specifics whatsoever. I think of OSGi as just one of many possible runtime environments. However, it is beneficial to design the software with OSGi in mind, as the programming model of OSGi (or SOA in general) enforces a clean separation of interface and implementation and loose coupling between modules (bundles).

Having said that, I see no reason NOT to use the DependencyManager when using the library in an OSGi context. The DependencyManager provides support for a number of common scenarios when dealing with interdependent services, in addition to the syntactic dependency managment of the OSGi framework itself.

Bundles

Core

accessor.AccessData
accessor.DataAccessor
accessor.DataAccesssorFactory
accessor.DataAccessorRegistry
accessor.DataObject
accessor.FileDataObject
accessor.FolderDataObject
accessor.RDFContainerFactory
accessor.URLNotFoundException
accessor.impl.DataAccessorRegistryImpl

crawler.Crawler
crawler.CrawlerFactory
crawler.CrawlerHandler
crawler.CrawlerRegistry
crawler.CrawlReport
crawler.ExitCode
crawler.impl.CrawlerRegistryImpl

datasource.DataSource
datasource.DataSourceFactory
datasource.DataSourceRegistry
datasource.impl.DataSourceRegistryImpl

extractor.Extractor
extractor.ExtractorException
extractor.ExtractorFactory
extractor.ExtractorRegistry
extractor.impl.ExtractorRegistryImpl

hypertext.linkextractor.LinkExtractor
hypertext.linkextractor.LinkExtractorRegistry
hypertext.linkextractor.LinkExtractorFactory
hypertext.linkextractor.impl.LinkExtractorRegistryImpl

mime.identifier.MimeTypeIdentifier
mime.identifier.MimeTypeIdentifierFactory
mime.identifier.MimeTypeIdentifierRegistry
mime.identifier.impl.MimeTypeIdentifierRegistryImpl

opener.DataOpener
opener.DataOpenerFactory
opener.DataOpenerRegistry
opener.impl.DataOpenerRegistryImpl

rdf.MultipleValuesException
rdf.RDFContainer
rdf.RDFContainerFactory
rdf.UpdateException
rdf.ValueFactory

Exported packages:
accessor
crawler
datasource
extractor
hypertext.linkextractor
mime.identifier
opener
rdf

Services
DataAccessorRegistry
CrawlerRegistry
DataSourceRegistry
ExtractorRegistry
LinkExtractorRegistry
MimeTypeIdentifierRegistry
DataOpenerRegistry

Aperture-impls

accessor.file.FileAccessor
accessor.file.FileAccessorFactory

accessor.http.ContentType
accessor.http.HttpAccessor
accessor.http.HttpAccessorFactory

addressbook.AddressbookCrawler
addressbook.AddressbookCrawlerFactory
addressbook.AddressbookDataSource
addressbook.AddressbookDataSourceFactory
addressbook.AppleAddressbookCrawler
addressbook.ThunderbirdCrawler

crawler.filesystem.FileSystemCrawler
crawler.filesystem.FileSystemCrawlerFactory
datasource.filesystem.FileSystemDataSource
datasource.filesystem.FileSystemDataSourceFactory

crawler.ical.IcalCrawler
crawler.ical.IcalCrawlerFactory
crawler.ical.IcalDataType
datasource.ical.IcalDataSource
datasource.ical.IcalDataSourceFactory

crawler.imap.DataObjectFactory
crawler.imap.ImapCrawler
crawler.imap.ImapCrawlerFactory
datasource.imap.ImapDataSource
datasource.imap.ImapDataSourceFactory

crawler.web.CrawlJob
crawler.web.WebCrawler
crawler.web.WebCrawlerFactory
datasource.web.WebDataSource
datasource.web.WebDataSourceFactory

extractor.excel.*
extractor.html.*
extractor.mime.*
extractor.office.*
extractor.opendocument.*
extractor.openxml.*
extractor.pdf.*
extractor.plaintext.*
extractor.powerpoint.*
extractor.presentations.*
extractor.publisher.*
extractor.quattro.*
extractor.rdf.*
extractor.rtf.*
extractor.visio.*
extractor.word.*
extractor.wordperfect.*
extractor.works.*
extractor.xml.*
extractor.util.*  - If extractors were to be separated into multiple bundles - these utils would have to be included in every bundle that uses them

hypertext.linkextractor.http.*`

mime.identifier.magic.*

opener.file.*
opener.http.*

outlook.*

Aperture-helpers.jar

accessor.base.AccessDataImpl
accessor.base.DataObjectBase
accessor.base.FileAccessData
accessor.base.FileDataObjectBase
accessor.base.FilterAccessData
accessor.base.FolderDataObjectBase
accessor.base.ModelAccessData
accessor.base.RepositoryAccessData

crawler.base.CrawlerBase
crawler.base.CrawlerHandlerBase
crawler.base.CrawlReportBase

datasource.base.DataSourceBase
datasource.config.ConfigurationUtil
datasource.config.DomainBoundaries
datasource.config.RegExpPattern
datasource.config.SubStringCondition
datasource.config.SubStringPattern
datasource.config.UrlPattern

security.*

util.*

vocabulary.*

rdf.rdf2go.RDF2GoRDFContainer
rdf.rdf2go.RDF2GoRDFContainerFactory
rdf.rdf2go.RDF2GoValueFactory

rdf2go-bundle (reused with other rdf2go dependent projects) independent of aperture and sesame2

ClassPath: rdf2go.jar
Exposes:

org.ontoware.rdf2go.model
org.ontoware.rdf2go.exception

rdf2go-sesamedriver (independent from aperture)

org.ontoware.rdf2go.impl.sesame2.*

sesame2 (independent from aperture)

org.openrdf.*

We assume that the user of application will create the models somehow... Models can be initialized with many various attributes - providing some generic ModelFactory might not be feasible (or even possible).

Classes not needed anymore

accessor.impl.DefaultDataAccessorRegistry
crawler.impl.DefaultCrawlerRegistry
extractor.impl.DefaultExtractorRegistry
hypertext.linkextractor.impl.DefaultLinkExtractorRegistry
mime.identifier.impl.DefaultMimeTypeIdentifierRegistry
opener.impl.DefaultDataOpenerRegistry

rdf.sesame.SesameRDFContainer
rdf.sesame.SesameRDFContainerFactory