Context Navigation

Changes between Version 3 and Version 4 of ApertureDataSource

Timestamp:: 10/12/05 13:04:33 (20 years ago)
Author:: anonymous
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ApertureDataSource

-                      v3
+                      v4
 A !DataCrawler is responsible for actually accessing the physical source and
 reporting the individual information items as DataObjects. Each DataObject
+reporting the individual information items as !DataObjects. Each !DataObject
 contains all metadata provided by the data source, such as file names,
 modification dates, etc., as well as the InputStream providing access to
+modification dates, etc., as well as the !InputStream providing access to
 physical resource.
 We have chosen to distinguish between a DataSource and a DataCrawler as there
 may be several alternative crawling strategies for a single DataSource type.
 Consider for example a generic FileSystemCrawler that handles any kind of
 file system accessible through java.io.File versus a WindowsFileSystemCrawler
+We have chosen to distinguish between a !DataSource and a !DataCrawler as there
+may be several alternative crawling strategies for a single !DataSource type.
+Consider for example a generic !FileSystemCrawler that handles any kind of
+file system accessible through java.io.File versus a !WindowsFileSystemCrawler
 using OS-native functionality to get notified about file additions, deletions
 and changes. Another possibility is various DataCrawler implementations that
+and changes. Another possibility is various !DataCrawler implementations that
 have different trade-offs in speed and accuracy.
 Currently, A DataSource also contains support for writing its configuration
+Currently, A !DataSource also contains support for writing its configuration
 to or initializing it from an XML file. We might consider putting this in a
 separate utility class, because the best way to store such information is
 often application dependent.
 A DataCrawler creates DataObjects for the individual information items it
 encounters in the data source. These DataObjects are reported to
 DataCrawlerListeners registered at the DataCrawler. An abstract base class
 (DataCrawlerBase) is provided that provides base functionality for
+A !DataCrawler creates !DataObjects for the individual information items it
+encounters in the data source. These !DataObjects are reported to
+!DataCrawlerListeners registered at the !DataCrawler. An abstract base class
+(!DataCrawlerBase) is provided that provides base functionality for
 maintaining information about which files have been reported in the past,
 allowing for incremental scanning.
 In order to create a DataObject for a single resource encountered by the
 DataCrawler, a DataAccessor is used. This functionality is kept out of the
 DataCrawler implementations on purpose because there may be several crawlers
+In order to create a !DataObject for a single resource encountered by the
+!DataCrawler, a !DataAccessor is used. This functionality is kept out of the
+!DataCrawler implementations on purpose because there may be several crawlers
 who can make good use of the same data accessing functionality. A good
 example is the FileSystemCrawler and HypertextCrawler, which both make use of
 the FileDataAccessor. Although they arrive at the physical resource in
+example is the !FileSystemCrawler and !HypertextCrawler, which both make use of
+the !FileDataAccessor. Although they arrive at the physical resource in
 different ways (by traversing folder trees vs. following links from other
 documents), they can use the same functionality to turn a java.io.File into a
 FileDataObject.
+!FileDataObject.
 It should be clear now that a DataCrawler is specific for the kind of
 DataSource it supports, whereas a DataAccessor is specific for the url
+It should be clear now that a !DataCrawler is specific for the kind of
+!DataSource it supports, whereas a !DataAccessor is specific for the url
 scheme(s) it supports.
 The AccessData instance used in DataCrawlerBase maintains the information
+The !AccessData instance used in !DataCrawlerBase maintains the information
 about which objects have been scanned before. This instance is passed to the
 DataAccessor as this is the best class to do this detection. For example,
 this allows the HttpDataAccessor to use HTTP-specific functionality to let
+!DataAccessor as this is the best class to do this detection. For example,
+this allows the !HttpDataAccessor to use HTTP-specific functionality to let
 the webserver decide on whether the resource has changed since the last scan,
 preventing an unchanged file from being transported to the crawling side in
 …
 == HypertextCrawler ==
 The HypertextCrawler makes use of two external compoments: a mime type
+The !HypertextCrawler makes use of two external compoments: a mime type
 identifier and a hypertext link extractor. The latter component is required
 to know which resources are linked from a specific resource and should be