64 | | The !HypertextCrawler makes use of two external compoments: a mime type |
65 | | identifier and a hypertext link extractor. The latter component is required |
66 | | to know which resources are linked from a specific resource and should be |
67 | | crawled next. This functionality is realized as a separate component/service |
68 | | as there are many document types that support links (PDF might be a nice one |
69 | | to support next). A specific link extractor is thus mimetype-specific. |
70 | | However, in order to know which link extractor to use, one first needs to |
71 | | know the mime type of the starting resource, which is handled by the first |
72 | | component. |
| 64 | The !HypertextCrawler makes use of two external components: a mime type |
| 65 | identifier and a hypertext link extractor. |
| 66 | |
| 67 | The latter component is required to know which resources are linked from a specific resource and should be crawled next. This functionality is realized as a separate component/service as there are many document types that support links (PDF might be a nice one to support next). |
| 68 | |
| 69 | A specific link extractor is therefore mimetype-specific. In order to know which link extractor to use, one first needs to know the mime type of the starting resource, which is handled by the first component. |