I came across Carlos Perez’s blog, manageability.org, while Googling for some research today. Carlos had a great list of open source web crawlers that included JSpider, a tool I have used for error checking on web sites.
我在Googling进行今天的一些研究时遇到了Carlos Perez的博客manageability.org。 卡洛斯(Carlos)拥有大量的开源Web爬网程序 ,其中包括JSpider,这是我用来对网站进行错误检查的工具。
JSpider is written entirely in Java and can be configured extensively for spidering, error checking and downloading. It of course obeys robots.txt files (http://www.robotstxt.org/wc/norobots-rfc.txt) and additional options included in configuration.
JSpider完全用Java编写,可以进行广泛配置以进行爬网,错误检查和下载。 当然,它遵循robots.txt文件( http://www.robotstxt.org/wc/norobots-rfc.txt )和配置中包含的其他选项。
I thought the added downloading option was nice as I had been using a separate application to pull down entire web sites for offline use. Now this can be accomplished with the JSpider engine.
我认为添加的下载选项很好,因为我一直在使用单独的应用程序将整个网站下拉以供离线使用。 现在,可以使用JSpider引擎完成此操作。
The tool has a plug-in architecture that opens the door for custom development from users to extend JSpider to meet their needs (and perhaps contribute to the project). JSpider is released under the LGPL license.
该工具具有插件体系结构,为用户进行自定义开发打开了方便之门,以扩展JSpider以满足他们的需求(并可能对项目有所贡献)。 JSpider是根据LGPL许可发布的。
JSpider does require J2SE 1.3+ Runtime and an XMLParser (Xerces, …) installed (comes with JDK1.4). The app will run on any system supporting Java and these requirements.
JSpider确实需要J2SE 1.3+ Runtime和安装的XMLParser(Xerces,...)(JDK1.4附带)。 该应用程序将在支持Java和这些要求的任何系统上运行。
There is even a simple sample site JSpider has created for testing purposes once you get up and running. Additionally, a fairly comprehensive 120 page user manual is available in PDF format.
一旦启动并运行,JSpider甚至创建了一个简单的示例站点来进行测试。 此外,还有一份相当详尽的120页用户手册,以PDF格式提供。
翻译自: https://www.sitepoint.com/run-your-own-spider/
相关资源:Win8运行Win7蜘蛛纸牌