== Aperture: Semantic Data Access by Aduna & DFKI == Goal: to extract data and fulltext from various datasources and store them in systems like gnowsis or Aduna Metadata Server. [wiki:ApertureLicense License] [wiki:ApertureCredits Credits] == Sourceforge Project == Administrators: Christiaan Fluit & Leo Sauermann Source Code: Interfaces and standard implementations of the SeDAF The source will contain all relevant information about semantic data extraction, everything that is needed to get starting with a fulltext and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes adapter and extractor implementations. Developers can use this package to fill their lucene-based applications or other data stores. The features of the framework will be: * easy to use: easy to learn, easy to code, easy to deploy in industrial projects * Extract fulltext from many common file formats and information systems like IMAP email servers * Extract metadata like author, date, subject and more from the data sources * open the data objects for viewing * Fully configurable framework, storing and editing config files is done through a SWING gui. * Pluggable architecture: can be easily extended, can be easily integrated to other projects. * Architecture based on industry standard OSGI * Compatible with RDF, but not solely based on it Components in the framework are: * DataSource Interface * TextExtractor Interface * DataSource implementation for Filesystem * DataSource implementation for IMAP mail servers * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel * OSGI bindings and connector code * Configuration gui * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) * Metadata format description (RDFS schema) and example file for the metadata Right from the beginning we will support the following file types: * Plain text * HTML * XML * PDF (Portable Document Format) * RTF (Rich Text Format) * Microsoft Word 97+ * Microsoft Excel 97+ * Microsoft Powerpoint 97+ * Microsoft Works * OpenOffice 1.0+: Writer, Calc, Impress, Draw * StarOffice 6.0+: Writer, Calc, Impress, Draw * WordPerfect 5.x * Emails * IMAP Servers