Version 14 (modified by mylka, 18 years ago) (diff) |
---|
The Use of OSGi in Aperture
Both Aduna and DFKI are in favour of using OSGi as a way to bundle these components. It seems to be a good platform for running and maintaining applications with a large set of plugin-like components.
Furthermore, the use of bundle-local classpaths may provide befeficiary. Consider for example the TextMining library for Word extraction which bundles a partial and unknown version of the POI libraries.
At Aduna we have followed a specific way of modeling services, using a factory for every implementation of a service, and a separate registry that registers all implementations of a specific service. It is the responsibility of the bundle activator of a service to register an instance of a service implementation's factory with the service registry.
This has two benefits:
- It allows for a very light-weight initialization of the system, provided that creation of a factory instance is very light-weight.
- All bundle activation code is kept outside of the factory, meaning that the code/bundle jar file is also usable in a non-OSGi application.
For Aperture, this means that the project is divided into several bundles:
- one bundle with the core engine including the registries and all interfaces
- for each ApertureDataSource implementation and crawlers that are related one bundle
- for each ApertureExtractor (or several for convenience, like "extractor pack") one bundle
Chris> I think the DataSource API, a DataSource implementation, the DataCrawler API and the DataCrawler implementation should all be separate bundles. This is the only way to ensure that you can use different crawlers for the same DataSource.
Currenly, Aduna and DFKI think that we should base our code only on pure OSGi code (i.e. org.osgi.*) and not use any other utilities such as the DependencyManager that's currently used in the Aduna code. Perhaps Herko can tell us more about what we're in for, because we both have hardly any experience with OSGi yet.
Comments by Herko:
Ideally, libraries like this should contain no OSGi-specifics whatsoever. I think of OSGi as just one of many possible runtime environments. However, it is beneficial to design the software with OSGi in mind, as the programming model of OSGi (or SOA in general) enforces a clean separation of interface and implementation and loose coupling between modules (bundles).
Having said that, I see no reason NOT to use the DependencyManager when using the library in an OSGi context. The DependencyManager provides support for a number of common scenarios when dealing with interdependent services, in addition to the syntactic dependency managment of the OSGi framework itself.
Bundles
Core
accessor.AccessData accessor.DataAccessor accessor.DataAccesssorFactory accessor.DataAccessorRegistry accessor.DataObject accessor.FileDataObject accessor.FolderDataObject accessor.RDFContainerFactory accessor.URLNotFoundException accessor.impl.DataAccessorRegistryImpl crawler.Crawler crawler.CrawlerFactory crawler.CrawlerHandler crawler.CrawlerRegistry crawler.CrawlReport crawler.ExitCode crawler.impl.CrawlerRegistryImpl datasource.DataSource datasource.DataSourceFactory datasource.DataSourceRegistry datasource.impl.DataSourceRegistryImpl extractor.Extractor extractor.ExtractorException extractor.ExtractorFactory extractor.ExtractorRegistry extractor.impl.ExtractorRegistryImpl hypertext.linkextractor.LinkExtractor hypertext.linkextractor.LinkExtractorRegistry hypertext.linkextractor.LinkExtractorFactory hypertext.linkextractor.impl.LinkExtractorRegistryImpl mime.identifier.MimeTypeIdentifier mime.identifier.MimeTypeIdentifierFactory mime.identifier.MimeTypeIdentifierRegistry mime.identifier.impl.MimeTypeIdentifierRegistryImpl opener.DataOpener opener.DataOpenerFactory opener.DataOpenerRegistry opener.impl.DataOpenerRegistryImpl rdf.MultipleValuesException rdf.RDFContainer rdf.RDFContainerFactory rdf.UpdateException rdf.ValueFactory Exported packages: accessor crawler datasource extractor hypertext.linkextractor mime.identifier opener rdf Services DataAccessorRegistry CrawlerRegistry DataSourceRegistry ExtractorRegistry LinkExtractorRegistry MimeTypeIdentifierRegistry DataOpenerRegistry
Aperture-impls
accessor.file.FileAccessor accessor.file.FileAccessorFactory accessor.http.ContentType accessor.http.HttpAccessor accessor.http.HttpAccessorFactory addressbook.AddressbookCrawler addressbook.AddressbookCrawlerFactory addressbook.AddressbookDataSource addressbook.AddressbookDataSourceFactory addressbook.AppleAddressbookCrawler addressbook.ThunderbirdCrawler crawler.filesystem.FileSystemCrawler crawler.filesystem.FileSystemCrawlerFactory datasource.filesystem.FileSystemDataSource datasource.filesystem.FileSystemDataSourceFactory crawler.ical.IcalCrawler crawler.ical.IcalCrawlerFactory crawler.ical.IcalDataType datasource.ical.IcalDataSource datasource.ical.IcalDataSourceFactory crawler.imap.DataObjectFactory crawler.imap.ImapCrawler crawler.imap.ImapCrawlerFactory datasource.imap.ImapDataSource datasource.imap.ImapDataSourceFactory crawler.web.CrawlJob crawler.web.WebCrawler crawler.web.WebCrawlerFactory datasource.web.WebDataSource datasource.web.WebDataSourceFactory extractor.excel.* extractor.html.* extractor.mime.* extractor.office.* extractor.opendocument.* extractor.openxml.* extractor.pdf.* extractor.plaintext.* extractor.powerpoint.* extractor.presentations.* extractor.publisher.* extractor.quattro.* extractor.rdf.* extractor.rtf.* extractor.visio.* extractor.word.* extractor.wordperfect.* extractor.works.* extractor.xml.* extractor.util.* - If extractors were to be separated into multiple bundles - these utils would have to be included in every bundle that uses them hypertext.linkextractor.http.* mime.identifier.magic.* opener.file.* opener.http.* outlook.*
Aperture-helpers.jar
accessor.base.AccessDataImpl accessor.base.DataObjectBase accessor.base.FileAccessData accessor.base.FileDataObjectBase accessor.base.FilterAccessData accessor.base.FolderDataObjectBase accessor.base.ModelAccessData accessor.base.RepositoryAccessData crawler.base.CrawlerBase crawler.base.CrawlerHandlerBase crawler.base.CrawlReportBase datasource.base.DataSourceBase datasource.config.ConfigurationUtil datasource.config.DomainBoundaries datasource.config.RegExpPattern datasource.config.SubStringCondition datasource.config.SubStringPattern datasource.config.UrlPattern security.* util.* vocabulary.* rdf.rdf2go.RDF2GoRDFContainer rdf.rdf2go.RDF2GoRDFContainerFactory rdf.rdf2go.RDF2GoValueFactory
rdf2go-bundle (reused with other rdf2go dependent projects) independent of aperture and sesame2
ClassPath: rdf2go.jar Exposes: org.ontoware.rdf2go.model org.ontoware.rdf2go.exception
rdf2go-sesamedriver (independent from aperture)
org.ontoware.rdf2go.impl.sesame2.*
sesame2 (independent from aperture)
org.openrdf.*
We assume that the user of application will create the models somehow... Models can be initialized with many various attributes - providing some generic ModelFactory might not be feasible (or even possible).
Classes not needed anymore
accessor.impl.DefaultDataAccessorRegistry crawler.impl.DefaultCrawlerRegistry extractor.impl.DefaultExtractorRegistry hypertext.linkextractor.impl.DefaultLinkExtractorRegistry mime.identifier.impl.DefaultMimeTypeIdentifierRegistry opener.impl.DefaultDataOpenerRegistry rdf.sesame.SesameRDFContainer rdf.sesame.SesameRDFContainerFactory