| 1 | = Archives = |
| 2 | |
| 3 | Some functionality that is still missing but that we at Aduna would really like to have is support for handling archives such as zip and rar files. |
| 4 | |
| 5 | The interface for doing archive extraction will probably be a mixture of Extractor and DataSource/DataCrawler?. On the one hand they will be mimetype- specific and will operate on an InputStream? (perhaps a DataObject?), just like Extractor, on the other hand they deliver a stream of new DataObjects?. |
| 6 | |
| 7 | A URI scheme also has to be developed for such nested objects, so that you can identify a stream packed inside an archive. |
| 8 | |
| 9 | Support for zip and gzip are probably trivial as these formats are already accessible through java.util.zip. Rar is another format we encounter sometimes. As far as I know there is no java library available for it is an open format, i.e. the specs are available. |