Changes between Version 3 and Version 4 of ApertureDataSource


Ignore:
Timestamp:
10/12/05 13:04:33 (19 years ago)
Author:
anonymous
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ApertureDataSource

    v3 v4  
    1111 
    1212A !DataCrawler is responsible for actually accessing the physical source and 
    13 reporting the individual information items as DataObjects. Each DataObject 
     13reporting the individual information items as !DataObjects. Each !DataObject 
    1414contains all metadata provided by the data source, such as file names, 
    15 modification dates, etc., as well as the InputStream providing access to 
     15modification dates, etc., as well as the !InputStream providing access to 
    1616physical resource. 
    1717 
    18 We have chosen to distinguish between a DataSource and a DataCrawler as there 
    19 may be several alternative crawling strategies for a single DataSource type. 
    20 Consider for example a generic FileSystemCrawler that handles any kind of 
    21 file system accessible through java.io.File versus a WindowsFileSystemCrawler 
     18We have chosen to distinguish between a !DataSource and a !DataCrawler as there 
     19may be several alternative crawling strategies for a single !DataSource type. 
     20Consider for example a generic !FileSystemCrawler that handles any kind of 
     21file system accessible through java.io.File versus a !WindowsFileSystemCrawler 
    2222using OS-native functionality to get notified about file additions, deletions 
    23 and changes. Another possibility is various DataCrawler implementations that 
     23and changes. Another possibility is various !DataCrawler implementations that 
    2424have different trade-offs in speed and accuracy. 
    2525 
    26 Currently, A DataSource also contains support for writing its configuration 
     26Currently, A !DataSource also contains support for writing its configuration 
    2727to or initializing it from an XML file. We might consider putting this in a 
    2828separate utility class, because the best way to store such information is 
    2929often application dependent. 
    3030 
    31 A DataCrawler creates DataObjects for the individual information items it 
    32 encounters in the data source. These DataObjects are reported to 
    33 DataCrawlerListeners registered at the DataCrawler. An abstract base class 
    34 (DataCrawlerBase) is provided that provides base functionality for 
     31A !DataCrawler creates !DataObjects for the individual information items it 
     32encounters in the data source. These !DataObjects are reported to 
     33!DataCrawlerListeners registered at the !DataCrawler. An abstract base class 
     34(!DataCrawlerBase) is provided that provides base functionality for 
    3535maintaining information about which files have been reported in the past, 
    3636allowing for incremental scanning. 
    3737 
    38 In order to create a DataObject for a single resource encountered by the 
    39 DataCrawler, a DataAccessor is used. This functionality is kept out of the 
    40 DataCrawler implementations on purpose because there may be several crawlers 
     38In order to create a !DataObject for a single resource encountered by the 
     39!DataCrawler, a !DataAccessor is used. This functionality is kept out of the 
     40!DataCrawler implementations on purpose because there may be several crawlers 
    4141who can make good use of the same data accessing functionality. A good 
    42 example is the FileSystemCrawler and HypertextCrawler, which both make use of 
    43 the FileDataAccessor. Although they arrive at the physical resource in 
     42example is the !FileSystemCrawler and !HypertextCrawler, which both make use of 
     43the !FileDataAccessor. Although they arrive at the physical resource in 
    4444different ways (by traversing folder trees vs. following links from other 
    4545documents), they can use the same functionality to turn a java.io.File into a 
    46 FileDataObject. 
     46!FileDataObject. 
    4747 
    48 It should be clear now that a DataCrawler is specific for the kind of 
    49 DataSource it supports, whereas a DataAccessor is specific for the url 
     48It should be clear now that a !DataCrawler is specific for the kind of 
     49!DataSource it supports, whereas a !DataAccessor is specific for the url 
    5050scheme(s) it supports. 
    5151 
    52 The AccessData instance used in DataCrawlerBase maintains the information 
     52The !AccessData instance used in !DataCrawlerBase maintains the information 
    5353about which objects have been scanned before. This instance is passed to the 
    54 DataAccessor as this is the best class to do this detection. For example, 
    55 this allows the HttpDataAccessor to use HTTP-specific functionality to let 
     54!DataAccessor as this is the best class to do this detection. For example, 
     55this allows the !HttpDataAccessor to use HTTP-specific functionality to let 
    5656the webserver decide on whether the resource has changed since the last scan, 
    5757preventing an unchanged file from being transported to the crawling side in 
     
    6060== HypertextCrawler == 
    6161 
    62 The HypertextCrawler makes use of two external compoments: a mime type 
     62The !HypertextCrawler makes use of two external compoments: a mime type 
    6363identifier and a hypertext link extractor. The latter component is required 
    6464to know which resources are linked from a specific resource and should be