Changes between Initial Version and Version 1 of ApertureOverview


Ignore:
Timestamp:
10/12/05 10:26:39 (19 years ago)
Author:
anonymous
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ApertureOverview

    v1 v1  
     1= Aperture Overview = 
     2 
     3Administrators: Christiaan Fluit & Leo Sauermann 
     4Source Code: Interfaces and standard implementations of the SeDAF 
     5 
     6The source will contain all relevant information about semantic data extraction, everything that is needed to get starting with a fulltext and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes adapter and extractor implementations. Developers can use this package to fill their lucene-based applications or other data stores. 
     7 
     8The features of the framework will be: 
     9 
     10    * easy to use: easy to learn, easy to code, easy to deploy in industrial projects 
     11    * Extract fulltext from many common file formats and information systems like IMAP email servers 
     12    * Extract metadata like author, date, subject and more from the data sources 
     13    * open the data objects for viewing 
     14    * Fully configurable framework, storing and editing config files is done through a SWING gui. 
     15    * Pluggable architecture: can be easily extended, can be easily integrated to other projects. 
     16    * Architecture based on industry standard OSGI 
     17    * Compatible with RDF, but not solely based on it 
     18 
     19Components in the framework are: 
     20 
     21    * DataSource Interface 
     22    * TextExtractor Interface 
     23    * DataSource implementation for Filesystem 
     24    * DataSource implementation for IMAP mail servers 
     25    * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel 
     26    * OSGI bindings and connector code 
     27    * Configuration gui 
     28    * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) 
     29    * Metadata format description (RDFS schema) and example file for the metadata 
     30 
     31Right from the beginning we will support the following file types: 
     32 
     33    * Plain text 
     34    * HTML 
     35    * XML 
     36    * PDF (Portable Document Format) 
     37    * RTF (Rich Text Format) 
     38    * Microsoft Word 97+ 
     39    * Microsoft Excel 97+ 
     40    * Microsoft Powerpoint 97+ 
     41    * Microsoft Works 
     42    * OpenOffice 1.0+: Writer, Calc, Impress, Draw 
     43    * StarOffice 6.0+: Writer, Calc, Impress, Draw 
     44    * WordPerfect 5.x 
     45    * Emails 
     46    * IMAP Servers