| 13 | | * Extract metadata like author, date, subject and more from the data sources |
| 14 | | * open the data objects for viewing |
| 15 | | * Fully configurable framework, storing and editing config files is done through a SWING gui. |
| 16 | | * Pluggable architecture: can be easily extended, can be easily integrated to other projects. |
| | 13 | * Extract metadata like author, date, subject and more from the data sources and file formats |
| | 14 | * Open data objects for viewing |
| | 15 | * Fully configurable framework, storing and editing config files is done through a SWING gui |
| | 16 | * Pluggable architecture: can be easily extended, can be easily integrated to other projects |
| 22 | | * DataSource Interface |
| 23 | | * TextExtractor Interface |
| 24 | | * DataSource implementation for Filesystem |
| 25 | | * DataSource implementation for IMAP mail servers |
| 26 | | * TextExtractor implementation for everything we know: PDF, Word, Fulltext, excel |
| 27 | | * OSGI bindings and connector code |
| 28 | | * Configuration gui |
| 29 | | * Sample appication showing how to use it, with gui (=either Autofocus or Sesame or Gnowsis) |
| 30 | | * Metadata format description (RDFS schema) and example file for the metadata |
| | 22 | * !DataSource interface |
| | 23 | * !DataSource implementations for file systems, websites (or rather hypertextual sources in general) and IMAP servers |
| | 24 | * Near future work: !OutlookSource, !MozillaSource/ThunderbirdSource |
| 32 | | Right from the beginning we will support the following file types: |
| | 26 | * !DataAccessor interface |
| | 27 | * !DataAccessor implementations for file, http(s) and imap schemes |
| | 28 | |
| | 29 | * !DataCrawler interface |
| | 30 | * One basic !DataCrawler implementation for every !DataSource type |
| | 31 | * Later maybe more specialized !DataCrawler implementations, e.g. a !WindowsFileSystemCrawler with OS-specific optimizations |
| | 32 | |
| | 33 | * Extractor interface |
| | 34 | * Extractor implementation for everything we can easily support: PDF, Word, Excel, HTML, plain text, ... |
| | 35 | * New domain for us but also probably very doable: PNG, JPG, AVI, ... |
| | 36 | |
| | 37 | * !ArchiveExtractor interface |
| | 38 | * !ArchiveExtractor implementations for Zip and Gzip |
| | 39 | |
| | 40 | * !LinkExtractor interface |
| | 41 | * !LinkExtractor implementation for HTML and XHTML |
| | 42 | * Later maybe PDF, Flash, ... |
| | 43 | |
| | 44 | * !MimetypeIdentifier interface |
| | 45 | * Badic !MimeTypeIdentifer implementation based on magic numbers; absolute necessity for choosing the right Extractor, !LinkExtractor or !ArchiveExtractor implementation for a given file |
| | 46 | |
| | 47 | * [http://www.osgi.org/ OSGi] bindings and connector code (can be realized so that code is also usable outside an OSGi-based application) |
| | 48 | * Configuration gui (what needs to be configured? isn't this very application-specific?) |
| | 49 | * Sample GUI appication showing how to use it. Can also be used as test application, e.g. when you are developing new Extractor implementations. |
| | 50 | * Metadata format descriptions (RDFS schema) and example metadata files |
| | 51 | |
| | 52 | == Supported File Formats == |
| | 53 | |
| | 54 | Right from the beginning we will support these file formats: |
| 43 | | * OpenOffice 1.0+: Writer, Calc, Impress, Draw |
| 44 | | * StarOffice 6.0+: Writer, Calc, Impress, Draw |
| 45 | | * WordPerfect 5.x |
| 46 | | * Emails |
| 47 | | * IMAP Servers |
| | 65 | * !OpenOffice 1.0+: Writer, Calc, Impress, Draw |
| | 66 | * !StarOffice 6.0+: Writer, Calc, Impress, Draw |
| | 67 | * !OpenDocument (!OpenOffice 2.0+) |
| | 68 | * !WordPerfect 5.x |
| | 69 | * Emails (.eml files) |