Changes between Version 1 and Version 2 of SemanticDataIntegrationFramework


Ignore:
Timestamp:
10/07/05 11:09:39 (19 years ago)
Author:
Leo Sauermann <leo.sauermann@…>
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SemanticDataIntegrationFramework

    v1 v2  
    1 <h1>Semantic Data Access by Aduna &amp; DFKI <br> 
    2 </h1> 
    3 To extract data and fulltext from various datasources and store them in 
    4 systems like gnowsis or Aduna Metadata Server.<br> 
    5 <h2>Sourceforge Project</h2> 
    6 Administrators: Christiaan Fluit &amp; Leo Sauermann<br> 
    7 Source Code: Interfaces and standard implementations of the SeDAF<br> 
    8 <br> 
    9 The source will contain all relevant information about semantic data 
    10 extraction, everything that is needed to get starting with a fulltext 
    11 and metadata extraction framework. Our intent is that developers can 
    12 download a single distribution file with a fully working environment, 
    13 that also includes adapter and extractor implementations. Developers 
    14 can use this package to fill their lucene-based applications or other 
    15 data stores.<br> 
    16 <br> 
    17 The features of the framework will be:<br> 
    18 <ul> 
    19   <li>easy to use: easy to learn, easy to code, easy to deploy in 
    20 industrial projects<br> 
    21   </li> 
    22   <li>Extract fulltext from many common file formats and information 
    23 systems like IMAP email servers</li> 
    24   <li>Extract metadata like author, date, subject and more from the 
    25 data sources</li> 
    26   <li>open the data objects for viewing<br> 
    27   </li> 
    28   <li>Fully configurable framework, storing and editing config files is 
    29 done through a SWING gui.</li> 
    30   <li>Pluggable architecture: can be easily extended, can be easily 
    31 integrated to other projects. <br> 
    32   </li> 
    33   <li>Architecture based on industry standard OSGI</li> 
    34   <li>Compatible with RDF, but not solely based on it</li> 
    35 </ul> 
    36 Components in the framework are:<br> 
    37 <ul> 
    38   <li>DataSource Interface</li> 
    39   <li>TextExtractor Interface</li> 
    40   <li>DataSource implementation for Filesystem</li> 
    41   <li>DataSource implementation for IMAP mail servers</li> 
    42   <li>TextExtractor implementation for everything we know: PDF, Word, 
    43 Fulltext, excel</li> 
    44   <li>OSGI bindings and connector code<br> 
    45   </li> 
    46   <li>Configuration gui</li> 
    47   <li>Sample appication showing how to use it, with gui (=either 
    48 Autofocus or Sesame or Gnowsis)</li> 
    49   <li>Metadata format description (RDFS schema) and example file for 
    50 the metadata<br> 
    51   </li> 
    52 </ul> 
    53 Right from the beginning we will support the following file types:<br> 
    54 <ul> 
    55   <li>Plain text</li> 
    56   <li>HTML</li> 
    57   <li>XML</li> 
    58   <li>PDF (Portable Document Format)</li> 
    59   <li>RTF (Rich Text Format)</li> 
    60   <li>Microsoft Word 97+</li> 
    61   <li>Microsoft Excel 97+</li> 
    62   <li>Microsoft Powerpoint 97+</li> 
    63   <li>Microsoft Works</li> 
    64   <li>OpenOffice 1.0+: Writer, Calc, Impress, Draw</li> 
    65   <li>StarOffice 6.0+: Writer, Calc, Impress, Draw</li> 
    66   <li>WordPerfect 5.x</li> 
    67   <li>Emails</li> 
    68   <li>IMAP Servers</li> 
    69 </ul> 
    70 <h2>credits<br> 
    71 </h2> 
    72 The following third party libraries have helped making the metadata 
    73 framework<br> 
    74 the success that it is. These freely available libraries deserve<br> 
    75 a lot of credit for that, and we highly recommend them to others<br> 
    76 as well!<br> 
    77 <ul> 
    78   <li>Gnowsis: http://www.gnowsis.org/</li> 
    79   <li>HtmlParser: http://htmlparser.sourceforge.net/</li> 
    80   <li>Idmeta: http://www.geocities.com/marcoschmidt.geo/</li> 
    81   <li>Jakarta Commons FileUpload: 
    82 http://jakarta.apache.org/commons/fileupload/</li> 
    83   <li>Jakarta Lucene: http://jakarta.apache.org/lucene/</li> 
    84   <li>Jakarta POI: http://jakarta.apache.org/poi/</li> 
    85   <li>Java Look and Feel Graphics Repository: 
    86 http://java.sun.com/developer/techDocs/hi/repository/</li> 
    87   <li>JavaBeans Activation Framework: 
    88 http://java.sun.com/products/javabeans/glasgow/jaf.html</li> 
    89   <li>JavaMail API: http://java.sun.com/products/javamail/</li> 
    90   <li>JGoodies Looks: http://www.jgoodies.com/freeware/looks/</li> 
    91   <li>NGramJ: http://ngramj.sourceforge.net/</li> 
    92   <li>PDFBox: http://www.pdfbox.org/</li> 
    93   <li>Sesame: http://www.openrdf.org/</li> 
    94   <li>WinLAF: https://winlaf.dev.java.net/</li> 
    95   <li>Xpdf: http://www.foolabs.com/xpdf/</li> 
    96 </ul> 
    97 <h2>license</h2> 
    98 The SeDAF is published under a BSD or CPL compatible license.<br> 
     1 
     2== Semantic Data Access by Aduna & DFKI == 
     3 
     4 
     5To extract data and fulltext from various datasources and store them in systems like gnowsis or Aduna Metadata Server. 
     6 
     7 
     8== Sourceforge Project == 
     9 
     10Administrators: Christiaan Fluit & Leo Sauermann 
     11Source Code: Interfaces and standard implementations of the SeDAF 
     12 
     13The source will contain all relevant information about semantic data extraction, everything that is needed to get starting with a fulltext and metadata extraction framework. Our intent is that developers can download a single distribution file with a fully working environment, that also includes adapter and extractor implementations. Developers can use this package to fill their lucene-based applications or other data stores. 
     14 
     15The features of the framework will be: 
     16 
     17    * easy to use: easy to learn, easy to code, easy to deploy in industrial projects 
     18    * Extract fulltext from many common file formats and information systems like IMAP email servers 
     19    * Extract metadata like author, date, subject and more from the data sources 
     20    * open the data objects for viewing 
     21    * Fully configurable framework, storing and editing config files is done through a SWING gui. 
     22    * Pluggable architecture: can be easily extended, can be easily integrated to other projects. 
     23    * Architecture based on industry standard OSGI 
     24    * Compatible with RDF, but not solely based on it 
     25 
     26Components in the framework are: 
     27 
     28    * DataSource Interface 
     29    * TextExtractor Interface 
     30    * DataSource implementation for Filesystem 
     31    * DataSource implementation for IMAP mail servers 
     32    * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel 
     33    * OSGI bindings and connector code 
     34    * Configuration gui 
     35    * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) 
     36    * Metadata format description (RDFS schema) and example file for the metadata 
     37 
     38Right from the beginning we will support the following file types: 
     39 
     40    * Plain text 
     41    * HTML 
     42    * XML 
     43    * PDF (Portable Document Format) 
     44    * RTF (Rich Text Format) 
     45    * Microsoft Word 97+ 
     46    * Microsoft Excel 97+ 
     47    * Microsoft Powerpoint 97+ 
     48    * Microsoft Works 
     49    * OpenOffice 1.0+: Writer, Calc, Impress, Draw 
     50    * StarOffice 6.0+: Writer, Calc, Impress, Draw 
     51    * WordPerfect 5.x 
     52    * Emails 
     53    * IMAP Servers 
     54 
     55 
     56== credits == 
     57 
     58The following third party libraries have helped making the metadata framework 
     59the success that it is. These freely available libraries deserve 
     60a lot of credit for that, and we highly recommend them to others 
     61as well! 
     62 
     63    * Gnowsis: http://www.gnowsis.org/ 
     64    * HtmlParser: http://htmlparser.sourceforge.net/ 
     65    * Idmeta: http://www.geocities.com/marcoschmidt.geo/ 
     66    * Jakarta Commons FileUpload: http://jakarta.apache.org/commons/fileupload/ 
     67    * Jakarta Lucene: http://jakarta.apache.org/lucene/ 
     68    * Jakarta POI: http://jakarta.apache.org/poi/ 
     69    * Java Look and Feel Graphics Repository: http://java.sun.com/developer/techDocs/hi/repository/ 
     70    * JavaBeans Activation Framework: http://java.sun.com/products/javabeans/glasgow/jaf.html 
     71    * JavaMail API: http://java.sun.com/products/javamail/ 
     72    * JGoodies Looks: http://www.jgoodies.com/freeware/looks/ 
     73    * NGramJ: http://ngramj.sourceforge.net/ 
     74    * PDFBox: http://www.pdfbox.org/ 
     75    * Sesame: http://www.openrdf.org/ 
     76    * WinLAF: https://winlaf.dev.java.net/ 
     77    * Xpdf: http://www.foolabs.com/xpdf/ 
     78 
     79license 
     80The SeDAF is published under a BSD or CPL compatible license.