wiki:ApertureDataObject

Version 8 (modified by sauermann, 19 years ago) (diff)

--

DataObject

ToDo: the getMetadata method should probably return some kind of RDF statement container (the same interface that will be used in Extractor) instead of a Map with key-value pairs, the keys of which are specific to the type of DataObject/DataSource.

Leo about TODO: ok with me to return an RDFMap

ChangeLog:

  • 'Metadata' is one word, hence getMetadata, not getMetaData.

Java Interface

/**
 * A general interface for data objects. A data object consists of an identifier,
 * binary content and metadata. The object is used primarily to extract 
 * information from datasources. For the extraction, both the InputStream 
 * returned by getContent() and
 * the RDF metadata returned by getMetadata() are important.
 * In structured data sources that are not file-based, the getContent() method
 * will return null, and all structured data of the object are represented
 * in the getMetadata object.
 * 
 */
public interface DataObject {

        /**
         * Gets the data object's primary identifier.
         * 
         * @return An identifier for this data object.
         */
        public URI getID();

        /**
         * Returns the byte size of the represented resource. This has been defined at
         * this global level due to the importance of this attribute for performance reasons.
         * @return the size of the binary resource in bytes, or a negative value when the
         * size is unknown or does not make sense for this particular DataObject implementation.
         */
        public long getSize();

        /**
         * Gets the DataSource from which this DataObject conceptually originated.
         * 
         * @return The DataSource from which this DataObject conceptually originated.
         */
        public DataSource getDataSource();
    
        /**
         * Gets the data object's parent, if any.
         * 
         * @return the parent DataObject, or null when this DataObject has no parent.
         */
        public DataObject getParent();
    
        /**
         * Gets the data object's children, if any. This may be null to indicate that there
         * are no children. 
         */
        public Iterator<DataObject> getChildren();
    
        /**
         * Gets an InputStream containing the content represented by the DataObject.
         * The returned InputStream is required to support marking (markSupported()
         * returns true). Calling this method multiple times may references to
         * one-and-the-same InputStream instance. Care should therefore be taken to mark
         * and reset the stream when the stream's content is to be read again later.
         * 
         * @return An InputStream from which the content of the data object can be read.
         * @throws IOException If an I/O error occurred.
         */
        public InputStream getContent() throws IOException;

        /**
         * Instructs the DataObject that its content stream will most likely be used multiple
         * times in its entirety, making the mark-and-reset procedure difficult to work, 
         * and that it better should cache the entire contents.
         * @throws IOException when an IOException occured during caching of the content.
         */
 ´      public void cacheContent() throws IOException;
    
        /**
         * Get the source-specific metadata and data.
         * The used keys and values and implementation-dependent. 
         * For java1.4 compability reasons, the map is untyped.
         * It is already titled RDFMap to reflect our ideas regarding RDF
         * 
         * @return The scheme-specific metadata.
         */
        public RDFMap getMetadata();

        /**
         * what is the mime-type of the content, if there is content?
         * This is set by the DataAccessor
         * @return null or a mimetype identifier like "text/plain"
         */
        public String getContentMimeType();

        /**
         * what is the character-encoding (using ansi identifiers like "UTF-8"
         * or "ISO-8859-1") of the content, if there is content. Will 
         * return null if not known or if content is null.
         * This is set by the DataAccessor
         * @return null or a encoding identifier like "UTF-8"
         */
        public String getContentEncoding();
}