Web Crawler Test

I am the seed / home page: index.html

The purpose of this site is to be able to easily see what effect changing heritrix crawl settings have on what is captured. The filenames listed in the log file, along with a site diagram, should be enough to tell you exactly what the crawler did and did not capture.

I link to:

Also, just for kicks, I link to:

Robots Exclusions

The robots exclusions steps have been taken: These are the files you should NOT be able to capture:

Links

Each directory contains 7 files that link a successive number of hops from the home page. Additionally, there are a few files that use other linking patterns.