| Version 8 (modified by sauermann, 20 years ago) (diff) |
|---|
DataObject
ToDo: the getMetadata method should probably return some kind of RDF statement container (the same interface that will be used in Extractor) instead of a Map with key-value pairs, the keys of which are specific to the type of DataObject/DataSource.
Leo about TODO: ok with me to return an RDFMap
ChangeLog:
- 'Metadata' is one word, hence getMetadata, not getMetaData.
Java Interface
/**
* A general interface for data objects. A data object consists of an identifier,
* binary content and metadata. The object is used primarily to extract
* information from datasources. For the extraction, both the InputStream
* returned by getContent() and
* the RDF metadata returned by getMetadata() are important.
* In structured data sources that are not file-based, the getContent() method
* will return null, and all structured data of the object are represented
* in the getMetadata object.
*
*/
public interface DataObject {
/**
* Gets the data object's primary identifier.
*
* @return An identifier for this data object.
*/
public URI getID();
/**
* Returns the byte size of the represented resource. This has been defined at
* this global level due to the importance of this attribute for performance reasons.
* @return the size of the binary resource in bytes, or a negative value when the
* size is unknown or does not make sense for this particular DataObject implementation.
*/
public long getSize();
/**
* Gets the DataSource from which this DataObject conceptually originated.
*
* @return The DataSource from which this DataObject conceptually originated.
*/
public DataSource getDataSource();
/**
* Gets the data object's parent, if any.
*
* @return the parent DataObject, or null when this DataObject has no parent.
*/
public DataObject getParent();
/**
* Gets the data object's children, if any. This may be null to indicate that there
* are no children.
*/
public Iterator<DataObject> getChildren();
/**
* Gets an InputStream containing the content represented by the DataObject.
* The returned InputStream is required to support marking (markSupported()
* returns true). Calling this method multiple times may references to
* one-and-the-same InputStream instance. Care should therefore be taken to mark
* and reset the stream when the stream's content is to be read again later.
*
* @return An InputStream from which the content of the data object can be read.
* @throws IOException If an I/O error occurred.
*/
public InputStream getContent() throws IOException;
/**
* Instructs the DataObject that its content stream will most likely be used multiple
* times in its entirety, making the mark-and-reset procedure difficult to work,
* and that it better should cache the entire contents.
* @throws IOException when an IOException occured during caching of the content.
*/
´ public void cacheContent() throws IOException;
/**
* Get the source-specific metadata and data.
* The used keys and values and implementation-dependent.
* For java1.4 compability reasons, the map is untyped.
* It is already titled RDFMap to reflect our ideas regarding RDF
*
* @return The scheme-specific metadata.
*/
public RDFMap getMetadata();
/**
* what is the mime-type of the content, if there is content?
* This is set by the DataAccessor
* @return null or a mimetype identifier like "text/plain"
*/
public String getContentMimeType();
/**
* what is the character-encoding (using ansi identifiers like "UTF-8"
* or "ISO-8859-1") of the content, if there is content. Will
* return null if not known or if content is null.
* This is set by the DataAccessor
* @return null or a encoding identifier like "UTF-8"
*/
public String getContentEncoding();
}
