| | 1 | <h1>Semantic Data Access by Aduna & DFKI <br> |
| | 2 | </h1> |
| | 3 | To extract data and fulltext from various datasources and store them in |
| | 4 | systems like gnowsis or Aduna Metadata Server.<br> |
| | 5 | <h2>Sourceforge Project</h2> |
| | 6 | Administrators: Christiaan Fluit & Leo Sauermann<br> |
| | 7 | Source Code: Interfaces and standard implementations of the SeDAF<br> |
| | 8 | <br> |
| | 9 | The source will contain all relevant information about semantic data |
| | 10 | extraction, everything that is needed to get starting with a fulltext |
| | 11 | and metadata extraction framework. Our intent is that developers can |
| | 12 | download a single distribution file with a fully working environment, |
| | 13 | that also includes adapter and extractor implementations. Developers |
| | 14 | can use this package to fill their lucene-based applications or other |
| | 15 | data stores.<br> |
| | 16 | <br> |
| | 17 | The features of the framework will be:<br> |
| | 18 | <ul> |
| | 19 | <li>easy to use: easy to learn, easy to code, easy to deploy in |
| | 20 | industrial projects<br> |
| | 21 | </li> |
| | 22 | <li>Extract fulltext from many common file formats and information |
| | 23 | systems like IMAP email servers</li> |
| | 24 | <li>Extract metadata like author, date, subject and more from the |
| | 25 | data sources</li> |
| | 26 | <li>open the data objects for viewing<br> |
| | 27 | </li> |
| | 28 | <li>Fully configurable framework, storing and editing config files is |
| | 29 | done through a SWING gui.</li> |
| | 30 | <li>Pluggable architecture: can be easily extended, can be easily |
| | 31 | integrated to other projects. <br> |
| | 32 | </li> |
| | 33 | <li>Architecture based on industry standard OSGI</li> |
| | 34 | <li>Compatible with RDF, but not solely based on it</li> |
| | 35 | </ul> |
| | 36 | Components in the framework are:<br> |
| | 37 | <ul> |
| | 38 | <li>DataSource Interface</li> |
| | 39 | <li>TextExtractor Interface</li> |
| | 40 | <li>DataSource implementation for Filesystem</li> |
| | 41 | <li>DataSource implementation for IMAP mail servers</li> |
| | 42 | <li>TextExtractor implementation for everything we know: PDF, Word, |
| | 43 | Fulltext, excel</li> |
| | 44 | <li>OSGI bindings and connector code<br> |
| | 45 | </li> |
| | 46 | <li>Configuration gui</li> |
| | 47 | <li>Sample appication showing how to use it, with gui (=either |
| | 48 | Autofocus or Sesame or Gnowsis)</li> |
| | 49 | <li>Metadata format description (RDFS schema) and example file for |
| | 50 | the metadata<br> |
| | 51 | </li> |
| | 52 | </ul> |
| | 53 | Right from the beginning we will support the following file types:<br> |
| | 54 | <ul> |
| | 55 | <li>Plain text</li> |
| | 56 | <li>HTML</li> |
| | 57 | <li>XML</li> |
| | 58 | <li>PDF (Portable Document Format)</li> |
| | 59 | <li>RTF (Rich Text Format)</li> |
| | 60 | <li>Microsoft Word 97+</li> |
| | 61 | <li>Microsoft Excel 97+</li> |
| | 62 | <li>Microsoft Powerpoint 97+</li> |
| | 63 | <li>Microsoft Works</li> |
| | 64 | <li>OpenOffice 1.0+: Writer, Calc, Impress, Draw</li> |
| | 65 | <li>StarOffice 6.0+: Writer, Calc, Impress, Draw</li> |
| | 66 | <li>WordPerfect 5.x</li> |
| | 67 | <li>Emails</li> |
| | 68 | <li>IMAP Servers</li> |
| | 69 | </ul> |
| | 70 | <h2>credits<br> |
| | 71 | </h2> |
| | 72 | The following third party libraries have helped making the metadata |
| | 73 | framework<br> |
| | 74 | the success that it is. These freely available libraries deserve<br> |
| | 75 | a lot of credit for that, and we highly recommend them to others<br> |
| | 76 | as well!<br> |
| | 77 | <ul> |
| | 78 | <li>Gnowsis: http://www.gnowsis.org/</li> |
| | 79 | <li>HtmlParser: http://htmlparser.sourceforge.net/</li> |
| | 80 | <li>Idmeta: http://www.geocities.com/marcoschmidt.geo/</li> |
| | 81 | <li>Jakarta Commons FileUpload: |
| | 82 | http://jakarta.apache.org/commons/fileupload/</li> |
| | 83 | <li>Jakarta Lucene: http://jakarta.apache.org/lucene/</li> |
| | 84 | <li>Jakarta POI: http://jakarta.apache.org/poi/</li> |
| | 85 | <li>Java Look and Feel Graphics Repository: |
| | 86 | http://java.sun.com/developer/techDocs/hi/repository/</li> |
| | 87 | <li>JavaBeans Activation Framework: |
| | 88 | http://java.sun.com/products/javabeans/glasgow/jaf.html</li> |
| | 89 | <li>JavaMail API: http://java.sun.com/products/javamail/</li> |
| | 90 | <li>JGoodies Looks: http://www.jgoodies.com/freeware/looks/</li> |
| | 91 | <li>NGramJ: http://ngramj.sourceforge.net/</li> |
| | 92 | <li>PDFBox: http://www.pdfbox.org/</li> |
| | 93 | <li>Sesame: http://www.openrdf.org/</li> |
| | 94 | <li>WinLAF: https://winlaf.dev.java.net/</li> |
| | 95 | <li>Xpdf: http://www.foolabs.com/xpdf/</li> |
| | 96 | </ul> |
| | 97 | <h2>license</h2> |
| | 98 | The SeDAF is published under a BSD or CPL compatible license.<br> |