[[PageOutline]]
= Open Issues in Aperture =

== java.io.File-Based Exctractors ==

We have some code-pieces (like MP3 extraction) that do not work on inputstreams but only on files. 

There are different approaches to solve that:
=== ideaA: rewrite all File-Based extractors using inputstream ===
Somebody writes new Extractors implementing the InputStream-based extraction interface.
 * issue: these have to be written completly new?
  * idea: have somebody else write them.
 * pro: They are probably more performant than the existing ones and have less overhead
 * con: they have to be written new

=== ideaB: add a new  method to Extractor, passing in the file as argument ===
This is the existing Method:
{{{extract(URI id, InputStream stream, Charset charset,
			String mimeType, RDFContainer result)}}}
We could add a new one to the Interface Extractor:
{{{extract(URI id, File file, Charset charset,
			String mimeType, RDFContainer result)}}}

 * pro: no new interface
 * issue: looking at the Interface, it is not clear what method to use and what is implemented. Should I call first the method with InputStream and see if it fails? hm
 * issue: this depends on ideaC

=== ideaB1: create a new Interface FileExtractor, passing in the file as argument ===
Create a new Interface FileExtractor, that implements only one method. Declare that this interface should only be used in cases, when there is no InputStream-based extraction library available and say that this FileExtractor is mediocre to the normal Extractor.
{{{extract(URI id, File file, Charset charset,
			String mimeType, RDFContainer result)}}}

 * pro: developers can determine which kind of Extractor they face and which method to call
 * con: we need a new registry for FileExtractors
 * issue: this depends on ideaC

=== ideaC: Add a new method getFile() to FileDataObject ==)
Add a new method getFile(), returning a file, to FileDataObject. This is easily implemented on File-based data objects (crawling local file system). For remote FileDataObjects, the method will be implemented using a buffering of the InputStream. ideaB and ideaB1 depend on this getFile() method.

 * pro: optimizes the implementation for file-system-crawler
 * issue: on some constellations (crawling remote MP3s), there will only be FileExtractors and everything will be buffered on local harddisk
   * idea: this is not so much an issue, the benefit for the end user of having more data outweights it

== DataOpener.open(uri) ==
should throw a "NotFoundException" if the element does not exist.
That is a fallback: if the element was moved, the calling gui that uses DataOpener could then search for the new location of the element and suggest a new location. The new URI could then work better.

This blocks:
 * https://gnowsis.opendfki.de/repos/gnowsis/branches/gnowsis0.9/enquire2006/src/java/org/gnogno/enquire2006/api/impl/EditorApiImpl.java

== Add rdfs:label to everything ==
see that every resource has an rdfs:label, additionally to dc:title, etc

== Relate DataOpeners to DataSources ==
At the end, the dataopeners are tightly knit to datasources, not to URI-scheme.
The method DataOpenerRegistry.get(String urischeme) is not good.

As we have the uri scheme "gnowsis://" quite often for Outlook, Thunderbird, some other stuff.

Still, opening by URI scheme is a good fallback when I have a resource at hand from which i know only URI (and not the originating datasource) so I would keep it as fallback.

Idea:
 * Add a new method to DataSource - getDataOpener which returns an instance of DataOpener (or uses the DataOpenerRegistry internally, when not defining own DataOpeners). 

This blocks:
 * https://gnowsis.opendfki.de/repos/gnowsis/branches/gnowsis0.9/enquire2006/src/java/org/gnogno/enquire2006/api/impl/EditorApiImpl.java

== use reusable web-guis to configure datasources, reusable crawlers, reusable registry ==

both gnowsis, autofocus and possible aduna metadata server need guis to configure the datasources (restrictions, passwords, setting and enableing datasources).

 * we could say that datasource config is done using servlets - then it is easier to reuse 
 * we could use the same crawler / registration classes in Nepomuk, Autofocus, and AMS(what is ApertureCrawler and ApertureDataSourceRegistry in gnowsis)

== Vocabulary: use DC instead of data ==

We are using many properties of Dublic Core, but redefining them. For compabilities sake, we should inlcude the real DC vocabularies right from the start, and not use our own uris.

== build.xml ==
remove the "init" target, everything in there can be top-level.
- simplifies the ant file.


= old aperture pages =
 * ApertureArchitecture