Unless I am mistaken, I have not found on use of Robots Exclusion Protocol when crawling site.
The Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable.
It would be interesting to have an easy-to-use class to work with robots.txt files (a robots.txt parser).
The Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable.
It would be interesting to have an easy-to-use class to work with robots.txt files (a robots.txt parser).