= DataObject = '''ToDo''': the getMetadata method should probably return some kind of RDF statement container (the same interface that will be used in Extractor) instead of a Map with key-value pairs, the keys of which are specific to the type of !DataObject/!DataSource. Leo about TODO: ok with me to return an RDFMap ChangeLog: * 'Metadata' is one word, hence getMetadata, not getMetaData. == Java Interface == {{{ #!java /** * A general interface for data objects. A data object consists of an identifier, * binary content and metadata. The object is used primarily to extract * information from datasources. For the extraction, both the InputStream * returned by getContent() and * the RDF metadata returned by getMetadata() are important. * In structured data sources that are not file-based, the getContent() method * will return null, and all structured data of the object are represented * in the getMetadata object. * */ public interface DataObject { /** * Gets the data object's primary identifier. * * @return An identifier for this data object. */ public URI getID(); /** * Returns the byte size of the represented resource. This has been defined at * this global level due to the importance of this attribute for performance reasons. * @return the size of the binary resource in bytes, or a negative value when the * size is unknown or does not make sense for this particular DataObject implementation. */ public long getSize(); /** * Gets the DataSource from which this DataObject conceptually originated. * * @return The DataSource from which this DataObject conceptually originated. */ public DataSource getDataSource(); /** * Gets the data object's parent, if any. * * @return the parent DataObject, or null when this DataObject has no parent. */ public DataObject getParent(); /** * Gets the data object's children, if any. This may be null to indicate that there * are no children. */ public Iterator getChildren(); /** * Gets an InputStream containing the content represented by the DataObject. * The returned InputStream is required to support marking (markSupported() * returns true). Calling this method multiple times may references to * one-and-the-same InputStream instance. Care should therefore be taken to mark * and reset the stream when the stream's content is to be read again later. * * @return An InputStream from which the content of the data object can be read. * @throws IOException If an I/O error occurred. */ public InputStream getContent() throws IOException; /** * Instructs the DataObject that its content stream will most likely be used multiple * times in its entirety, making the mark-and-reset procedure difficult to work, * and that it better should cache the entire contents. * @throws IOException when an IOException occured during caching of the content. */ ยด public void cacheContent() throws IOException; /** * Get the source-specific metadata and data. * The used keys and values and implementation-dependent. * For java1.4 compability reasons, the map is untyped. * It is already titled RDFMap to reflect our ideas regarding RDF * * @return The scheme-specific metadata. */ public RDFMap getMetadata(); /** * what is the mime-type of the content, if there is content? * This is set by the DataAccessor * @return null or a mimetype identifier like "text/plain" */ public String getContentMimeType(); /** * what is the character-encoding (using ansi identifiers like "UTF-8" * or "ISO-8859-1") of the content, if there is content. Will * return null if not known or if content is null. * This is set by the DataAccessor * @return null or a encoding identifier like "UTF-8" */ public String getContentEncoding(); } }}}