| | 1 | = Aperture Overview = |
| | 2 | |
| | 3 | Administrators: Christiaan Fluit & Leo Sauermann |
| | 4 | Source Code: Interfaces and standard implementations of the SeDAF |
| | 5 | |
| | 6 | The source will contain all relevant information about semantic data extraction, everything that is needed to get starting with a fulltext and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes adapter and extractor implementations. Developers can use this package to fill their lucene-based applications or other data stores. |
| | 7 | |
| | 8 | The features of the framework will be: |
| | 9 | |
| | 10 | * easy to use: easy to learn, easy to code, easy to deploy in industrial projects |
| | 11 | * Extract fulltext from many common file formats and information systems like IMAP email servers |
| | 12 | * Extract metadata like author, date, subject and more from the data sources |
| | 13 | * open the data objects for viewing |
| | 14 | * Fully configurable framework, storing and editing config files is done through a SWING gui. |
| | 15 | * Pluggable architecture: can be easily extended, can be easily integrated to other projects. |
| | 16 | * Architecture based on industry standard OSGI |
| | 17 | * Compatible with RDF, but not solely based on it |
| | 18 | |
| | 19 | Components in the framework are: |
| | 20 | |
| | 21 | * DataSource Interface |
| | 22 | * TextExtractor Interface |
| | 23 | * DataSource implementation for Filesystem |
| | 24 | * DataSource implementation for IMAP mail servers |
| | 25 | * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel |
| | 26 | * OSGI bindings and connector code |
| | 27 | * Configuration gui |
| | 28 | * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) |
| | 29 | * Metadata format description (RDFS schema) and example file for the metadata |
| | 30 | |
| | 31 | Right from the beginning we will support the following file types: |
| | 32 | |
| | 33 | * Plain text |
| | 34 | * HTML |
| | 35 | * XML |
| | 36 | * PDF (Portable Document Format) |
| | 37 | * RTF (Rich Text Format) |
| | 38 | * Microsoft Word 97+ |
| | 39 | * Microsoft Excel 97+ |
| | 40 | * Microsoft Powerpoint 97+ |
| | 41 | * Microsoft Works |
| | 42 | * OpenOffice 1.0+: Writer, Calc, Impress, Draw |
| | 43 | * StarOffice 6.0+: Writer, Calc, Impress, Draw |
| | 44 | * WordPerfect 5.x |
| | 45 | * Emails |
| | 46 | * IMAP Servers |