Changes between Version 8 and Version 9 of ApertureDiscussion

07/04/06 11:15:35 (17 years ago)



  • ApertureDiscussion

    v8 v9  
    12= Open Issues in Aperture = 
     4== Exctractors == 
     6We have some code-pieces (like MP3 extraction) that do not work on inputstreams but only on files.  
     8There are different approaches to solve that: 
     9=== ideaA: rewrite all File-Based extractors using inputstream === 
     10Somebody writes new Extractors implementing the InputStream-based extraction interface. 
     11 * issue: these have to be written completly new? 
     12  * idea: have somebody else write them. 
     13 * pro: They are probably more performant than the existing ones and have less overhead 
     14 * con: they have to be written new 
     16=== ideaB: add a new  method to Extractor, passing in the file as argument === 
     17This is the existing Method: 
     18{{{extract(URI id, InputStream stream, Charset charset, 
     19                        String mimeType, RDFContainer result)}}} 
     20We could add a new one to the Interface Extractor: 
     21{{{extract(URI id, File file, Charset charset, 
     22                        String mimeType, RDFContainer result)}}} 
     24 * pro: no new interface 
     25 * issue: looking at the Interface, it is not clear what method to use and what is implemented. Should I call first the method with InputStream and see if it fails? hm 
     26 * issue: this depends on ideaC 
     28=== ideaB1: create a new Interface FileExtractor, passing in the file as argument === 
     29Create a new Interface FileExtractor, that implements only one method. Declare that this interface should only be used in cases, when there is no InputStream-based extraction library available and say that this FileExtractor is mediocre to the normal Extractor. 
     30{{{extract(URI id, File file, Charset charset, 
     31                        String mimeType, RDFContainer result)}}} 
     33 * pro: developers can determine which kind of Extractor they face and which method to call 
     34 * con: we need a new registry for FileExtractors 
     35 * issue: this depends on ideaC 
     37=== ideaC: Add a new method getFile() to FileDataObject ==) 
     38Add a new method getFile(), returning a file, to FileDataObject. This is easily implemented on File-based data objects (crawling local file system). For remote FileDataObjects, the method will be implemented using a buffering of the InputStream. ideaB and ideaB1 depend on this getFile() method. 
     40 * pro: optimizes the implementation for file-system-crawler 
     41 * issue: on some constellations (crawling remote MP3s), there will only be FileExtractors and everything will be buffered on local harddisk 
     42   * idea: this is not so much an issue, the benefit for the end user of having more data outweights it 
    344== ==