The purpose of this site is to be able to easily see what effect changing heritrix crawl settings have on what is captured. The filenames listed in the log file, along with a site diagram, should be enough to tell you exactly what the crawler did and did not capture.
I link to:
- Same directory as home: 1hop.html
- 1 subdirectory below home: 1SubDirA_1hop.html
- 1 subdirectory below home: 1SubDirB_1hop.html
- 2 subdirectories below home: 2SubDirA1_1hop.html
- 2 subdirectories below home: 2SubDirA2_1hop.html
- 2 subdirectories below home: 2SubDirB_1hop.html
- 3 subdirectories below home: 3SubDirA_1hop.html
- 3 subdirectories below home (1 below me): 3SubDirB_1hop.html
- 4 subdirectories below home: 4SubDir_1hop.html
- 5 subdirectories below home: 5SubDir_1hop.html
- 6 subdirectories below home: 6SubDirA_1hop.html
- 6 subdirectories below home: 6SubDirB_1hop.html
Also, just for kicks, I link to:
Robots Exclusions
The robots exclusions steps have been taken:- A META tag has been placed directly in 7hops.html preventing it from being captured. It does not link to anything.
- A META tag has been placed in 1subDirA/2subDirA2/2subDirA2_1hop.html preventing the capture of this file and any dependent files.
- A robots.txt file has been placed in the top directory preventing the capture of all files from 1subDirB/2subDirB/, and any dependent files.
- 7hops.html
- 1subDirB/2subDirB/2subDirB_1hop.html
- 1subDirB/2subDirB/2subDirB_2hops.html
- 1subDirB/2subDirB/2subDirB_3hops.html
- 1subDirB/2subDirB/2subDirB_4hops.html
- 1subDirB/2subDirB/2subDirB_5hops.html
- 1subDirB/2subDirB/2subDirB_6hops.html
- 1subDirB/2subDirB/2subDirB_7hops.html
- 1subDirA/2subDirA1/3subDirA/4subDir/5subDir/6subDirA/6subDirA_4hops.html
- 1subDirA/2subDirA1/3subDirA/4subDir/5subDir/6subDirA/6subDirA_5hops.html
- 1subDirA/2subDirA1/3subDirA/4subDir/5subDir/6subDirA/6subDirA_6hops.html
- 1subDirA/2subDirA1/3subDirA/4subDir/5subDir/6subDirA/6subDirA_7hops.html
- 1subDirA/2subDirA2/2subDirA2_1hop.html
- 1subDirA/2subDirA2/2subDirA2_2hops.html
- 1subDirA/2subDirA2/2subDirA2_3hops.html
- 1subDirA/2subDirA2/2subDirA2_4hops.html
- 1subDirA/2subDirA2/2subDirA2_5hops.html
- 1subDirA/2subDirA2/2subDirA2_6hops.html
- 1subDirA/2subDirA2/2subDirA2_7hops.html
Links
Each directory contains 7 files that link a successive number of hops from the home page. Additionally, there are a few files that use other linking patterns.
- Each directory contains a "not linked" file that is not directly or indirectly linked to the home page. The "not linked" file in 2subDirA1 links to the "not linked" file in the top directory as well as the one in 6subDirA, forming a small web not connected to the home page.
- 2subDirB_3hops.html links to 6subDirB_4hops.html. 6subDirB_3hops.html does not.
- 4subDir_1hop.html links to 6subDirB_2hops.html. 6subDirB_1hop.html does not.
- 2subDirA1_2hops.html links to 2subDirB_3hops.html. So does 2subDirB_2hops.html