Are web crawlers illegal?

Can I use Octoparse for free?

Yes!

You can download the Octoparse mobile application on your iPhone, iPad, or Android devices for free. In addition to being available for download at AppStore or GooglePlay, you can also use the mobile app if you are going abroad for a trip to find accommodation or other locations by the map. The mobile version is easy to operate as well and has some more features than its computer version which makes it a better choice than downloading from Google maps (only).

Can I update/install my GPS device? Yes, Octoparse provides some standard apps that comes with the GPS device, like Navigation (Android only), or you can download it from another APP Store to the device for the installation. Does it support multiple languages? Octoparse offers support for more than 200 countries and regions. English is the default supported language of the application, but all supported language apps are also available.

How does it work? When you start the Android app, you will be guided to the settings. Can I update without Internet connection? Yes, but it will need data connection for the download process. What is difference between GPS & Waypoints? Waypoints are recorded and stored during a navigation route, while GPS will record location updates as per your route. Can I import GPS files into the app? Yes, but you must remember to convert those files to GPX format first. If you have GPS files in other formats such as CSV or KML, you must first import them into GPX before they can be converted to a GPX file.

Are web crawlers illegal?

I'm a big proponent of making websites accessible, but I'm also a huge believer that websites should have a human-readable front end.

If all of the time and effort spent making websites accessible is going to be replaced by an automated web crawler, there's no point. The only thing that a web crawler will do is scrape pages for other people, without really doing anything useful for the owner of the site.

I'm sure it's already been covered here before, but is a web crawler considered illegal? @curtis-nelson:disqus I see your point, but I think that it could still be done legally, as long as you give users a link to the page you've copied. Of course, I can't guarantee that Google, Bing, or any other search engine would do that.

I don't know why people think web scrapers are bad - I'm sure they could benefit many businesses in various ways. It's always struck me as more a "the user never knew that their data was being copied" problem than a "the user never knew that their data was being copied" problem.

Well that's one way to look at it. If the consumer doesn't know that they've been copied, then it's perfectly legal (unless something like this happens, ).

However, if the consumer does know about the scrapping, then it's not so legal. Also, I think the law may vary a lot country to country.

Does Google have a web crawler?

I'm looking for some documentation/proof of the existence of a web crawler within Google.

Does anyone have any knowledge as to whether or not Google does have such a thing? I'm looking for proof because I'm trying to justify why we're using Google's search engine for our web crawler and I want to make sure that Google has a web crawler before spending time implementing one ourselves.

Google does have a webcrawler, that's the reason you see so many links to the Google Toolbar in the toolbar. I just found this by Googling it.

Which web crawler is best?

I have a very small website I'd like to crawl.

The target for the crawler is to extract information from the pages, but not to read or download the content of the page. The crawler should be able to crawl every page in the site and have to stop if a certain page or directory is found.

I've been looking at spidermonkey and lscrawlr, but I'm not sure which one I should use. Are there any other options? Well, the obvious answer is "the one you like the most". There's no such thing as "best". But I'd say you're in for a challenge here, as the whole point of a crawler is to be able to search for information. It's a bit like the difference between a car and a map - the former is supposed to get you from A to B, while the latter is just supposed to give you directions.

If you don't want to search for information, don't use a web crawler. If you're doing a small site with only a few hundred pages, then spidermonkey might be OK. But if it were me, I'd be looking at something like Nutch.