Changes between Version 13 and Version 14 of ApertureSimpleDataCrawler


Ignore:
Timestamp:
10/20/05 11:42:35 (19 years ago)
Author:
sauermann
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ApertureSimpleDataCrawler

    v13 v14  
    7373Here's a new idea that in my opinion merges this idea with our own architecture. Create a super interface of !DataObject (Resource? - has a strong RDF association. Entity? - has other associations here at Aduna). !DataObject then gets a sibling named Folder. Crawlers do not only produce !DataObject instances, they produce instances of its supertype. This way, crawlers that crawl data sources with an intrinsic hierarchy can return Folder instances, which contain all metadata of the Folder, similar to how !DataObjects contain metadata of that object. Similarly, we can introduce other !DataObject siblings for capturing table- or graph-related metadata that is not specific to a single !DataObject. Crawler-using applications that have no interest in this information can simply ignore these events. Also, the crawler interface itself does not need to specify folder-/graph-/table-specific information. 
    7474 
    75 '''Leo> I like this very much, the superclass and sibling idea. I will create objects accordingly''' 
     75'''Leo> I like this very much, the superclass and sibling idea. I will create objects accordingly'''. We had exactly the same problem of the "graph" structure that was hidden somewhere inside files or folder objects. If we use something like ApertureDataObjectFile and ApertureDataObjectFolder objects to capture and divide this semantic information, perfect. I would argue for ApertureDataObjectFile for things that are 'like a file', so attachments,web pages, web files and local files would all fall into this category. For the ApertureDataObjectFolder I would suggest they are restricted to something like real folders, like file folders, outlook folders or IMAP folders. For things like "attachments inside an email" I would still use the getChildren() idea, and not the Folder thing. Although it may be nice if an email with attachments is both a Folder and a File. 
    7676 
    7777In our use case this also facilitates metadata indexing in because currently our !MetadataFetcher (the class transforming the information inside a !DataObject to RDF statements) interprets the document URIs and "reinvents" the folder hierarchy, modeling it as Resources with a partOf relation. This would then no longer be necessary, the Folder instance would already contain all necessary information.