== Aperture: Semantic Data Access by Aduna & DFKI == Goals: * To extract individual data objects (e.g. documents, emails, ...) from various data sources * To extract all possible information from the binary content of these data objects (e.g. full text, titles, authors, ...) * To deliver a storage back-end in which this information can be stored and made queryable * To deliver an architecture that can easily be extended by others, e.g. with new document formats, data source types, ... The software distribution package will contain all relevant information about semantic data extraction, everything that is needed to get starting with a full-text and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes all available adapter and extractor implementations. == General == [wiki:ApertureOverview Project Overview] [wiki:ApertureLicense License] [wiki:ApertureCredits Credits] == Architecture == * [wiki:ApertureArchitecture DataSource Architecture] * [wiki:ApertureExtractor Extractors] * [wiki:ApertureArchives Archives] * [wiki:ApertureEmailInterpretation Email Interpretation] * [wiki:ApertureOpeningDocuments Opening Documents] * [wiki:ApertureRDF The Use of RDF] * [wiki:ApertureOSGi The Use of OSGi] == API Development == * [wiki:ApertureDataSource ApertureDataSource] * [wiki:ApertureDataObject ApertureDataObject] * [wiki:ApertureDataCrawler ApertureDataCrawler] * [wiki:ApertureDataCrawlerListener ApertureDataCrawlerListener] * [wiki:ApertureDataAccessor ApertureDataAccessor] * [wiki:ApertureAccessData ApertureAccessData] * [wiki:ApertureDataOpener ApertureDataOpener] - suggested! * [wiki:ApertureScanReport ApertureScanReport] * [wiki:ArchiveExtractor ApertureArchiveExtractor] - suggested! * [wiki:ApertureExtractor ApertureExtractor] - suggested! * [wiki:ApertureSimpleDataCrawler] - The other extreme: a simple data crawler that leaves the detection of changes to the outside. = todo Daniel = * source:trunk/gnowsis/src/org/gnowsis/data/datasource/DataSource.java - is equal ApertureDataSource * org.gnowsis.data.adapter.CommandExecutor - equal to ApertureDataOpener * org.gnowsis.data.adapter.CBDAdapter - equal to DataAccessor * org.gnowsis.data.structured.StructuredAdapter - equal to DataCrawler * org.gnowsis.data.extractor.ExtractorPlaintext - equal to ApertureExtractor(new) * source:/trunk/gnoDesktopSearch/src/java/org/gnowsis/desktopsearch/extractor/DocumentExtractor.java - merge with ApertureExtractor * source:/trunk/gnoDesktopSearch/src/java/org/gnowsis/desktopsearch/crawler - merge this with ApertureDataCrawler and ApertureDataCrawlerListener (find in codebase2)