Changes between Version 3 and Version 4 of ApertureDataSource
- Timestamp:
- 10/12/05 13:04:33 (19 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ApertureDataSource
v3 v4 11 11 12 12 A !DataCrawler is responsible for actually accessing the physical source and 13 reporting the individual information items as DataObjects. EachDataObject13 reporting the individual information items as !DataObjects. Each !DataObject 14 14 contains all metadata provided by the data source, such as file names, 15 modification dates, etc., as well as the InputStream providing access to15 modification dates, etc., as well as the !InputStream providing access to 16 16 physical resource. 17 17 18 We have chosen to distinguish between a DataSource and aDataCrawler as there19 may be several alternative crawling strategies for a single DataSource type.20 Consider for example a generic FileSystemCrawler that handles any kind of21 file system accessible through java.io.File versus a WindowsFileSystemCrawler18 We have chosen to distinguish between a !DataSource and a !DataCrawler as there 19 may be several alternative crawling strategies for a single !DataSource type. 20 Consider for example a generic !FileSystemCrawler that handles any kind of 21 file system accessible through java.io.File versus a !WindowsFileSystemCrawler 22 22 using OS-native functionality to get notified about file additions, deletions 23 and changes. Another possibility is various DataCrawler implementations that23 and changes. Another possibility is various !DataCrawler implementations that 24 24 have different trade-offs in speed and accuracy. 25 25 26 Currently, A DataSource also contains support for writing its configuration26 Currently, A !DataSource also contains support for writing its configuration 27 27 to or initializing it from an XML file. We might consider putting this in a 28 28 separate utility class, because the best way to store such information is 29 29 often application dependent. 30 30 31 A DataCrawler createsDataObjects for the individual information items it32 encounters in the data source. These DataObjects are reported to33 DataCrawlerListeners registered at theDataCrawler. An abstract base class34 ( DataCrawlerBase) is provided that provides base functionality for31 A !DataCrawler creates !DataObjects for the individual information items it 32 encounters in the data source. These !DataObjects are reported to 33 !DataCrawlerListeners registered at the !DataCrawler. An abstract base class 34 (!DataCrawlerBase) is provided that provides base functionality for 35 35 maintaining information about which files have been reported in the past, 36 36 allowing for incremental scanning. 37 37 38 In order to create a DataObject for a single resource encountered by the39 DataCrawler, aDataAccessor is used. This functionality is kept out of the40 DataCrawler implementations on purpose because there may be several crawlers38 In order to create a !DataObject for a single resource encountered by the 39 !DataCrawler, a !DataAccessor is used. This functionality is kept out of the 40 !DataCrawler implementations on purpose because there may be several crawlers 41 41 who can make good use of the same data accessing functionality. A good 42 example is the FileSystemCrawler andHypertextCrawler, which both make use of43 the FileDataAccessor. Although they arrive at the physical resource in42 example is the !FileSystemCrawler and !HypertextCrawler, which both make use of 43 the !FileDataAccessor. Although they arrive at the physical resource in 44 44 different ways (by traversing folder trees vs. following links from other 45 45 documents), they can use the same functionality to turn a java.io.File into a 46 FileDataObject.46 !FileDataObject. 47 47 48 It should be clear now that a DataCrawler is specific for the kind of49 DataSource it supports, whereas aDataAccessor is specific for the url48 It should be clear now that a !DataCrawler is specific for the kind of 49 !DataSource it supports, whereas a !DataAccessor is specific for the url 50 50 scheme(s) it supports. 51 51 52 The AccessData instance used inDataCrawlerBase maintains the information52 The !AccessData instance used in !DataCrawlerBase maintains the information 53 53 about which objects have been scanned before. This instance is passed to the 54 DataAccessor as this is the best class to do this detection. For example,55 this allows the HttpDataAccessor to use HTTP-specific functionality to let54 !DataAccessor as this is the best class to do this detection. For example, 55 this allows the !HttpDataAccessor to use HTTP-specific functionality to let 56 56 the webserver decide on whether the resource has changed since the last scan, 57 57 preventing an unchanged file from being transported to the crawling side in … … 60 60 == HypertextCrawler == 61 61 62 The HypertextCrawler makes use of two external compoments: a mime type62 The !HypertextCrawler makes use of two external compoments: a mime type 63 63 identifier and a hypertext link extractor. The latter component is required 64 64 to know which resources are linked from a specific resource and should be