Changes between Initial Version and Version 1 of SemanticDataIntegrationFramework


Ignore:
Timestamp:
10/07/05 11:08:05 (19 years ago)
Author:
Leo Sauermann <leo.sauermann@…>
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SemanticDataIntegrationFramework

    v1 v1  
     1<h1>Semantic Data Access by Aduna &amp; DFKI <br> 
     2</h1> 
     3To extract data and fulltext from various datasources and store them in 
     4systems like gnowsis or Aduna Metadata Server.<br> 
     5<h2>Sourceforge Project</h2> 
     6Administrators: Christiaan Fluit &amp; Leo Sauermann<br> 
     7Source Code: Interfaces and standard implementations of the SeDAF<br> 
     8<br> 
     9The source will contain all relevant information about semantic data 
     10extraction, everything that is needed to get starting with a fulltext 
     11and metadata extraction framework. Our intent is that developers can 
     12download a single distribution file with a fully working environment, 
     13that also includes adapter and extractor implementations. Developers 
     14can use this package to fill their lucene-based applications or other 
     15data stores.<br> 
     16<br> 
     17The features of the framework will be:<br> 
     18<ul> 
     19  <li>easy to use: easy to learn, easy to code, easy to deploy in 
     20industrial projects<br> 
     21  </li> 
     22  <li>Extract fulltext from many common file formats and information 
     23systems like IMAP email servers</li> 
     24  <li>Extract metadata like author, date, subject and more from the 
     25data sources</li> 
     26  <li>open the data objects for viewing<br> 
     27  </li> 
     28  <li>Fully configurable framework, storing and editing config files is 
     29done through a SWING gui.</li> 
     30  <li>Pluggable architecture: can be easily extended, can be easily 
     31integrated to other projects. <br> 
     32  </li> 
     33  <li>Architecture based on industry standard OSGI</li> 
     34  <li>Compatible with RDF, but not solely based on it</li> 
     35</ul> 
     36Components in the framework are:<br> 
     37<ul> 
     38  <li>DataSource Interface</li> 
     39  <li>TextExtractor Interface</li> 
     40  <li>DataSource implementation for Filesystem</li> 
     41  <li>DataSource implementation for IMAP mail servers</li> 
     42  <li>TextExtractor implementation for everything we know: PDF, Word, 
     43Fulltext, excel</li> 
     44  <li>OSGI bindings and connector code<br> 
     45  </li> 
     46  <li>Configuration gui</li> 
     47  <li>Sample appication showing how to use it, with gui (=either 
     48Autofocus or Sesame or Gnowsis)</li> 
     49  <li>Metadata format description (RDFS schema) and example file for 
     50the metadata<br> 
     51  </li> 
     52</ul> 
     53Right from the beginning we will support the following file types:<br> 
     54<ul> 
     55  <li>Plain text</li> 
     56  <li>HTML</li> 
     57  <li>XML</li> 
     58  <li>PDF (Portable Document Format)</li> 
     59  <li>RTF (Rich Text Format)</li> 
     60  <li>Microsoft Word 97+</li> 
     61  <li>Microsoft Excel 97+</li> 
     62  <li>Microsoft Powerpoint 97+</li> 
     63  <li>Microsoft Works</li> 
     64  <li>OpenOffice 1.0+: Writer, Calc, Impress, Draw</li> 
     65  <li>StarOffice 6.0+: Writer, Calc, Impress, Draw</li> 
     66  <li>WordPerfect 5.x</li> 
     67  <li>Emails</li> 
     68  <li>IMAP Servers</li> 
     69</ul> 
     70<h2>credits<br> 
     71</h2> 
     72The following third party libraries have helped making the metadata 
     73framework<br> 
     74the success that it is. These freely available libraries deserve<br> 
     75a lot of credit for that, and we highly recommend them to others<br> 
     76as well!<br> 
     77<ul> 
     78  <li>Gnowsis: http://www.gnowsis.org/</li> 
     79  <li>HtmlParser: http://htmlparser.sourceforge.net/</li> 
     80  <li>Idmeta: http://www.geocities.com/marcoschmidt.geo/</li> 
     81  <li>Jakarta Commons FileUpload: 
     82http://jakarta.apache.org/commons/fileupload/</li> 
     83  <li>Jakarta Lucene: http://jakarta.apache.org/lucene/</li> 
     84  <li>Jakarta POI: http://jakarta.apache.org/poi/</li> 
     85  <li>Java Look and Feel Graphics Repository: 
     86http://java.sun.com/developer/techDocs/hi/repository/</li> 
     87  <li>JavaBeans Activation Framework: 
     88http://java.sun.com/products/javabeans/glasgow/jaf.html</li> 
     89  <li>JavaMail API: http://java.sun.com/products/javamail/</li> 
     90  <li>JGoodies Looks: http://www.jgoodies.com/freeware/looks/</li> 
     91  <li>NGramJ: http://ngramj.sourceforge.net/</li> 
     92  <li>PDFBox: http://www.pdfbox.org/</li> 
     93  <li>Sesame: http://www.openrdf.org/</li> 
     94  <li>WinLAF: https://winlaf.dev.java.net/</li> 
     95  <li>Xpdf: http://www.foolabs.com/xpdf/</li> 
     96</ul> 
     97<h2>license</h2> 
     98The SeDAF is published under a BSD or CPL compatible license.<br>