Changes between Version 9 and Version 10 of SemanticDataIntegrationFramework


Ignore:
Timestamp:
10/12/05 10:26:47 (18 years ago)
Author:
anonymous
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SemanticDataIntegrationFramework

    v9 v10  
    88 
    99[wiki:ApertureCredits Credits] 
    10  
    11  
    12 == Sourceforge Project == 
    13  
    14 Administrators: Christiaan Fluit & Leo Sauermann 
    15 Source Code: Interfaces and standard implementations of the SeDAF 
    16  
    17 The source will contain all relevant information about semantic data extraction, everything that is needed to get starting with a fulltext and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes adapter and extractor implementations. Developers can use this package to fill their lucene-based applications or other data stores. 
    18  
    19 The features of the framework will be: 
    20  
    21     * easy to use: easy to learn, easy to code, easy to deploy in industrial projects 
    22     * Extract fulltext from many common file formats and information systems like IMAP email servers 
    23     * Extract metadata like author, date, subject and more from the data sources 
    24     * open the data objects for viewing 
    25     * Fully configurable framework, storing and editing config files is done through a SWING gui. 
    26     * Pluggable architecture: can be easily extended, can be easily integrated to other projects. 
    27     * Architecture based on industry standard OSGI 
    28     * Compatible with RDF, but not solely based on it 
    29  
    30 Components in the framework are: 
    31  
    32     * DataSource Interface 
    33     * TextExtractor Interface 
    34     * DataSource implementation for Filesystem 
    35     * DataSource implementation for IMAP mail servers 
    36     * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel 
    37     * OSGI bindings and connector code 
    38     * Configuration gui 
    39     * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) 
    40     * Metadata format description (RDFS schema) and example file for the metadata 
    41  
    42 Right from the beginning we will support the following file types: 
    43  
    44     * Plain text 
    45     * HTML 
    46     * XML 
    47     * PDF (Portable Document Format) 
    48     * RTF (Rich Text Format) 
    49     * Microsoft Word 97+ 
    50     * Microsoft Excel 97+ 
    51     * Microsoft Powerpoint 97+ 
    52     * Microsoft Works 
    53     * OpenOffice 1.0+: Writer, Calc, Impress, Draw 
    54     * StarOffice 6.0+: Writer, Calc, Impress, Draw 
    55     * WordPerfect 5.x 
    56     * Emails 
    57     * IMAP Servers 
    58