= Extractor = Notes: * The mimetype is specified because the same Extractor can be used for several mimetypes (e.g. !OpenOfficeExtractor or !OpenDocumentExtractor) while there may be slight differences for different mimetypes. * A Sesame Repository is specified to the Extractor to put its contents in. The repository is neatly wrapped as RDFMap. The application can then decide whether this is an in-memory repository on which only this Extractor is operating, whether the statements go directly into a persistent storage, whether an application-specific optimized Repository implementation is used, etc. TODO: Leo questions if the rdfmap should be passed or not. Chris: specifying it as a parameter releaves the implementor from the burden of instantiating one himself, which may not be that trivial, depending on the chosen RDF interface. == Java Interface == {{{ #!java /** * * Extractors are used to extract metadata and fulltext from InputStreams, * the inputstream is in a format passed by Mime-Type. * These extractors can produce RDFMaps. */ public interface Extractor { /** * create extracted information into the passed RDFMap called "result" * To see what fields should be needed and which must be added, look at the * commments above * @param id the uri identifying the passed object. You may need it when you add sophisticated rdf information. It is also the topResource in the passed result * @param stream an opened inputstream which you can exclusively read. You must call the stream.close() operation when you are finished extracting. * @param charset the charset in which the inputstream is encoded * @param mimetype the mimetype of the passed file/stream. If your extractor can handle multiple mime-types, this can be handy. * @param result - the place where the extracted data is to be written to * @throws IOException when problems arise reading the stream. * @throws DocumentExctractorException when the metadata of the stream cannot be extracted, * when the stream does not conform to the MimeType's norms. */ public void extract(URI id, InputStream stream, Charset charset, String mimetype, RDFMap result) throws IOException, DocumentExtractorException; /* inferior ALTERNATIVE: public RDFMap extract(URI id, InputStream stream, Charset charset, String mimetype) throws IOException, DocumentExtractorException; inferior because with first, they only need to know the interface and with inferior they have to know how to instantiate a RDFMap. Also performace of first is better, if the RDF store is sneaked and passed through the method */ } }}}