Wednesday, November 7, 2007

Spider.NET - a .NET console application which crawls websites

Spider.NET

Spider is a .NET console application which crawls websites and saves content and links to a Microsoft SQL Server Database.
This database can then be full-text indexed and queried by ASP (or other applications) for example in a site search. Multiple sites can be crawled and each has different settings (defined as a Project) such has how deep to crawl and what rules to follow. The crawler can also be used in a special mode to crawl a site but not save the content. This is great for link checking.