wiki:ApertureDataAccessor

Version 5 (modified by sauermann, 18 years ago) (diff)

--

DataAccessors

  • using URI as identifier, not String. more type-safe. hence also the UriNotFoundException. Parallel to ApertureDataObject which is also based on Java.net.URI

TODO: Leo: from my perspective, DataAccessor, DataCrawler and CrawlData are too much coupled. The return value is far too complicated defined. @return A DataObject for the specified URI, or null when an AccessData instance has been specified and the binary resource has not been modified since the last access. The semantics of this return value contain too much semantics. If it is a generic framework, change detection could be entirely up to the DataCrawler, if it is programmed datasource-specific.

Java Interface

Probably equal to source:trunk/gnowsis/src/org/gnowsis/data/adapter/CBDAdapter.java

 /**
  * A DataAccessor provides access to physical resources by creating DataObjects
  * representing the resource, based on a url and optionally data about a previous access
  * and other parameters.
  */
public interface DataAccessor {

        /**
         * Get a DataObject for the specified url. The resulting DataObject's ID may differ
         * from the specified url due to normalization schemes, following of redirected URLs, etc. 
         *
         * An AccessData instance can optionally be specified with which the DataAccessor can store
         * and retrieve information about previous accesses to resources. This is mostly useful
         * for DataCrawlers who want to be able to incrementally scan a DataSource.
         * When an AccessData instance is specified, the resulting DataObject can be null,
         * indicating that the binary resource has not been modified since the last access.
         * 
         * A DataAccessor is always required to store something in the AccessData when a
         * url is accessed, so that afterwards AccessData.isKnownId will return true.
         * 
         * Specific DataAccessor implementations may accept additional parameters through the params Map.
         * 
         * @param uri         The uri used to address the resource.
         * @param dataSource  The source that will be registered as the source of the DataObject.
         * @param accessData  Optional database containing information about previous accesses.
         * @param params      Optional additional parameters needed to access the physical resource.
         * @return A DataObject for the specified URI, or null when an AccessData instance has been
         * specified and the binary resource has not been modified since the last access.
         * @throws UrlNotFoundException when the binary resource could not be found
         * @throws IOException When any other kind of I/O error occurs.
         */
        public DataObject get(URI uri, DataSource source,
            AccessData accessData, Map<?,?> params) throws UriNotFoundException, IOException;
}