[[PageOutline]] = Open Issues in Aperture = == java.io.File-Based Exctractors == We have some code-pieces (like MP3 extraction) that do not work on inputstreams but only on files. There are different approaches to solve that: === ideaA: rewrite all File-Based extractors using inputstream === Somebody writes new Extractors implementing the InputStream-based extraction interface. * issue: these have to be written completly new? * idea: have somebody else write them. * pro: They are probably more performant than the existing ones and have less overhead * con: they have to be written new === ideaB: add a new method to Extractor, passing in the file as argument === This is the existing Method: {{{extract(URI id, InputStream stream, Charset charset, String mimeType, RDFContainer result)}}} We could add a new one to the Interface Extractor: {{{extract(URI id, File file, Charset charset, String mimeType, RDFContainer result)}}} * pro: no new interface * issue: looking at the Interface, it is not clear what method to use and what is implemented. Should I call first the method with InputStream and see if it fails? hm * issue: this depends on ideaC === ideaB1: create a new Interface FileExtractor, passing in the file as argument === Create a new Interface FileExtractor, that implements only one method. Declare that this interface should only be used in cases, when there is no InputStream-based extraction library available and say that this FileExtractor is mediocre to the normal Extractor. {{{extract(URI id, File file, Charset charset, String mimeType, RDFContainer result)}}} * pro: developers can determine which kind of Extractor they face and which method to call * con: we need a new registry for FileExtractors * issue: this depends on ideaC === ideaC: Add a new method getFile() to FileDataObject ==) Add a new method getFile(), returning a file, to FileDataObject. This is easily implemented on File-based data objects (crawling local file system). For remote FileDataObjects, the method will be implemented using a buffering of the InputStream. ideaB and ideaB1 depend on this getFile() method. * pro: optimizes the implementation for file-system-crawler * issue: on some constellations (crawling remote MP3s), there will only be FileExtractors and everything will be buffered on local harddisk * idea: this is not so much an issue, the benefit for the end user of having more data outweights it == DataOpener.open(uri) == should throw a "NotFoundException" if the element does not exist. That is a fallback: if the element was moved, the calling gui that uses DataOpener could then search for the new location of the element and suggest a new location. The new URI could then work better. This blocks: * https://gnowsis.opendfki.de/repos/gnowsis/branches/gnowsis0.9/enquire2006/src/java/org/gnogno/enquire2006/api/impl/EditorApiImpl.java == Add rdfs:label to everything == see that every resource has an rdfs:label, additionally to dc:title, etc == Relate DataOpeners to DataSources == At the end, the dataopeners are tightly knit to datasources, not to URI-scheme. The method DataOpenerRegistry.get(String urischeme) is not good. As we have the uri scheme "gnowsis://" quite often for Outlook, Thunderbird, some other stuff. Still, opening by URI scheme is a good fallback when I have a resource at hand from which i know only URI (and not the originating datasource) so I would keep it as fallback. Idea: * Add a new method to DataSource - getDataOpener which returns an instance of DataOpener (or uses the DataOpenerRegistry internally, when not defining own DataOpeners). This blocks: * https://gnowsis.opendfki.de/repos/gnowsis/branches/gnowsis0.9/enquire2006/src/java/org/gnogno/enquire2006/api/impl/EditorApiImpl.java == use reusable web-guis to configure datasources, reusable crawlers, reusable registry == both gnowsis, autofocus and possible aduna metadata server need guis to configure the datasources (restrictions, passwords, setting and enableing datasources). * we could say that datasource config is done using servlets - then it is easier to reuse * we could use the same crawler / registration classes in Nepomuk, Autofocus, and AMS(what is ApertureCrawler and ApertureDataSourceRegistry in gnowsis) == Vocabulary: use DC instead of data == We are using many properties of Dublic Core, but redefining them. For compabilities sake, we should inlcude the real DC vocabularies right from the start, and not use our own uris. == build.xml == remove the "init" target, everything in there can be top-level. - simplifies the ant file. = old aperture pages = * ApertureArchitecture