= Extractor = == Java Interface == Probably equal to: * source:trunk/gnowsis/src/org/gnowsis/data/extractor/ExtractorPlaintext.java and * source:/trunk/gnoDesktopSearch/src/java/org/gnowsis/desktopsearch/extractor/DocumentExtractor.java {{{ #!java /** * * Extractors are used to extract structured information from various data sources. * */ public interface Extractor { /** * return a plaintext representation of the file * @param source the file to look into * @param mimetype the mimetype that has been identified by gnowsis that this file is * @return null or a string. Null is returned, if no plaintext is in the file. If it could not be * extracted, an exception is thrown. * @throws ExtractionException when something goes wrong with extraction * @throws FileNotFoundException when the file is not existant */ public String getPlaintext(File source, String mimetype) throws FileNotFoundException, ExtractionException ; /** * create a lucene document. * To see what fields would be needed, look at the top of this class. * @param file * @param uri the uri identifying the passed file. You may need it when you add sophisticated rdf information * @param mimetype the mimetype of the passed file/stream. If your extractor can handle multiple mime-types, this can be handy. * @param options optional options that may help you. * @return a lucene document */ public Document createLuceneDocument(File file, String uri, String mimetype, Object options) throws IOException, DocumentExtractorException; public Document createLuceneDocument(InputStream stream, String uri, String mimetype, Object options) throws IOException, DocumentExtractorException; } }}}