= Archives = Some functionality that is still missing but that we at Aduna would really like to have (customer demand!) is support for handling archives such as zip, gzip and rar files. The interface for doing archive extraction will probably be a mixture of Extractor and !DataSource/!DataCrawler. On the one hand they will be mimetype-specific and will operate on an !InputStream (perhaps a !DataObject), just like Extractor, on the other hand they deliver a stream of new !DataObjects. A URI scheme also has to be developed for such nested objects, so that you can identify a stream packed inside an archive. == Supported Formats == Support for zip and gzip are probably trivial as these formats are already accessible through java.util.zip. Rar is another format we encounter sometimes. As far as I know there is no java library available for it. It is an open format though, i.e. the specs are available ([http://schmidt.devlib.org/file-formats/rar-archive-file-format.html link1], [http://schmidt.devlib.org/file-formats/rar-archive-file-format.html link2]). == Opening Resources == Opening of these resources also get rather tricky, e.g. how to open a text file in a zip file on a website. Good thinking required!