How does ScrapingBee work?

How to scrape data using ScrapingBee?

(A basic Guide)

How to scrape data from a website with scrapy? Well, it's quite easy actually. What you just need is an API URL that allows you to fetch the dataset in a json/xml/csv/json format.

You could follow the tutorial here but if you are still new to scraping, I will suggest you use scrapy library instead. Go to Python terminal and type pip install scrapy into your command prompt.

Scrapy has some prerequisites such as urllib3, requests, pyOpenSSL, lxml for encoding response from the target website, pytz and requests-nose. Please update these packages first. Then you can run the spider like this:

Python scrapy crawl spiderName --allow-domain=www.example.com

It is the same as run spider but in python. The -allow-domain is because by default scraping won't work without authentication. In scrapy to allow it use the cookie you just give an argument. You can change the argument - it can be one or more comma-separated list.

As a last step before we can use scrapy to retrieve data from the target website, we need to configure start up files. You could create starturls inside the directory where you are placing your project. Here is an example configuration of setting starting urls from the target website:

From scrapy.settings import Settings DSPIDER = 'spiderName' STARTURL = 'www.com' DBSPIDER = 'dbSpider' DB = 'example' EXCEPTIONSTARTURL = 'www.exception.com'

Just like this we have configured the urls and their corresponding spiders. These are the spiders that we will be using for the target website, where we are going to parse the data.

Let's look at the function that scrapy uses to crawl data.

How does ScrapingBee work?

ScrapingBee is a fully featured scraper designed to allow an in-site user to search and extract information from various websites (eg Wikipedia) that otherwise would be very difficult to scrape because of the various privacy, copyright and licensing issues that they face. This software helps to mitigate these issues and allows us as well as other website owners, a chance to monetize our valuable content.

The ScrapingBee software is distributed as freeware. What's in the box? You get 3 items in the ScrapingBee box: ScrapingBee Server: The ScrapingBee Server is what enables a user to search and scrape various sites. It is based on the concept of using a web scraping library such as PhantomJS (or another Javascript implementation) and is designed to work either locally or in a local network where a number of users are sharing a single server.

The ScrapingBee Server automatically detects the operating system and architecture of the computer running it and can be used by either a web browser or via terminal application. It comes with a configuration manager allowing you to enable or disable features as well as to configure them.

It is possible to start your ScrapingBee instance listening on a specific ip address and/or port number. The instance can also optionally listen on another port which can be useful if you are in a private network where your scraped site is hosted on another box, or when you're hosting your own server as a server farm.

PhantomJS: The ScrapingBee Server relies on a web scraper library called PhantomJS. PhantomJS is an open source scriptable WebKit/WebView based browser, originally developed at the Google Chrome team and released in October 2025. It is used to create full-featured web browsers without a GUI and it is a great candidate for a library to make a real-time or in-process page viewer.

PhantomJS is used by ScrapingBee Server to load and parse external pages on behalf of users or your code. It is made specifically to interact with all kinds of websites, and it features built-in networking support so no extra plugins are required, just send an url to this object and you're done.

Is data scraping legal?

Is it ethical?

And how about privacy?

In the past I've been writing about the importance of getting data out of websites. So why is it that we're still having so much trouble with data scraping, and what can we do to make it legal, ethical and safe for you to scrape the data you want? This article will be an attempt to provide a few answers to these questions, and a little guidance on where to start. The Legal Basics. The first thing to say is that the whole idea of scraping data from web pages is not actually illegal in itself. If you're just pulling text or images, or a few images that you own the rights to, then you're fine.

However, if you do any kind of scraping or data mining without permission, then you may be breaking the law. You don't need to scrape every site on the internet - just scrape what's useful for your project, and leave the rest alone.

So how does this work? Well, let's consider a few ways: Data Mining. If you scrape data for any of these projects, then you're usually not going to be running any scraping projects. Instead you're likely to have scripts sitting in a virtual machine on your network doing the scraping for you.

I'll talk more about this later, but you're most likely going to be running these scripts on your own server. You may not be 'mining' anything at all. A lot of projects are just creating maps or timelines using information already scraped on other websites. But you may be trying to generate some new kind of data, such as a map of all the public transportation options in your city, or a timeline of how many tickets people buy.

Either way, you're generating new data from existing data. It's unlikely to be illegal.

Web Scraping. You might also be interested in a project where you take an existing dataset and use it to build something new. This is where you can start to run into problems.

Sometimes this kind of project isn't really trying to create any new data. It's just copying data from a website and pasting it into a spreadsheet.