Forbes is one of the most well-known and trusted sources for business news, insights, and data. The Forbes website contains a wealth of valuable information on companies, people, markets and trends that can provide a competitive advantage if leveraged effectively.
Web scraping allows us to programmatically extract this data at scale so it can be analyzed to uncover insights. Common applications include market research, lead generation, financial analysis, brand monitoring and more. Scraped Forbes data offers a 360-degree view into the business world.
However, scraping data from Forbes is not always straightforward. The dynamic nature of modern websites, anti-bot measures like CAPTCHAs, and legal grey areas around web scraping pose challenges that need to be navigated.
In this guide, we‘ll dive into the different ways to scrape data from Forbes, with a focus on using the official Forbes APIs. You‘ll learn the step-by-step process and technical details for extracting clean, structured data that is ready for analysis. Let‘s get started!
Methods for Scraping Forbes Data
There are three main ways to scrape data from Forbes and other websites:
Manual Scraping – Copying and pasting data from web pages into a spreadsheet. This is only feasible for small amounts of data.
Web Scraping Software – Using tools like Octoparse, Parsehub, and Mozenda to automate the process of navigating websites and extracting specific data points. This is a good choice for non-technical users.
APIs – Accessing structured data directly from Forbes databases via Application Programming Interfaces (APIs). This is the most efficient and reliable way, when APIs are available.
Since Forbes provides official APIs, that will be the focus of this guide. APIs allow you to retrieve data in a standardized format like JSON or XML, which is much easier to parse and use than raw HTML.
Forbes APIs Overview
Forbes offers several RESTful APIs that provide access to their data on companies, people, lists, articles and more. Here are some of the key Forbes APIs:
Forbes Company API – Provides data on over 2,000 global companies including location, industry, executives, financials and social media accounts.
Forbes People API – Provides data on Forbes lists of people including billionaires, highest-paid athletes, 30 under 30, and more.
Forbes Lists API – Metadata about 50+ Forbes lists on companies, people, colleges, places and more.
Forbes Articles API – Provides headlines, descriptions, authors, published dates, images and more for articles published on Forbes.com.
To access any of the Forbes APIs, you first need to sign up for a free API key. Visit the Forbes developer portal, create an account, and generate your API key. You‘ll use this key to authenticate your requests to the Forbes API by including it in the headers.
Making API Requests to Forbes
The basic format for Forbes API requests is:
http://api.forbes.com/v1/resource?apikey=YOUR_API_KEY
Let‘s look at an example Python code snippet that retrieves the Forbes Global 2000 company list and extracts some key data points:
import requests
api_key = "YOUR_API_KEY"
api_url = f"https://api.forbes.com/v1/company/global2000/2022?apikey={api_key}"
response = requests.get(api_url)
data = response.json()
for company in data["companies"]:
name = company["name"]
rank = company["rank"]
market_value = company["marketValue"]
print(f"{rank}. {name} - ${market_value} B")
This code does the following:
- Sets the API key and constructs the URL for the Forbes Global 2000 list
- Makes a GET request to the API endpoint
- Parses the JSON response
- Loops through each company and extracts/prints the name, rank, and market value
You can customize the fields retrieved by specifying them in the URL parameters. For example, to get more details on each company, you could use:
api_url = f"https://api.forbes.com/v1/company/global2000/2022?fields=name,rank,uri,country,industry,marketValue,sales,profits,assets&limit=2000&apikey={api_key}"
Refer to the Forbes API documentation for the full list of available fields, parameters, and example requests for each API.
Handling Rate Limits
To prevent abuse and ensure fair usage, the Forbes APIs are rate limited. Free API keys are limited to 100 requests per day across all Forbes APIs.
If you exceed the rate limit, your API requests will return a 429 "Too Many Requests" error. To avoid this, best practice is to add a sleep or wait time between requests.
Here‘s an example of adding a 5-second wait between each request:
import requests
import time
for i in range(100):
response = requests.get(api_url)
data = response.json()
time.sleep(5)
For higher rate limits, contact the Forbes team to discuss enterprise licensing. Always comply with Forbes‘ API terms of use.
Storing and Analyzing Forbes API Data
Once you‘ve extracted the desired data from the Forbes APIs, you‘ll likely want to store it for future analysis. Some common storage options are:
- CSV files – For small to medium data sets, saving the data to a CSV file is quick and easy
- SQL databases – For larger data sets, using a SQL database like MySQL or PostgreSQL provides fast querying
- NoSQL databases – If your data is unstructured or semi-structured, a NoSQL database like MongoDB may be a better fit
- Cloud storage – For easy scaling and access from anywhere, you can store your data in cloud platforms like AWS S3, Google Cloud Storage or Azure Blob Storage
The best choice depends on the structure, volume, and querying needs of your Forbes data.
With the data extracted and stored, you can begin analyzing it for insights. Some examples of analyses you could perform:
- Identifying the fastest growing companies or industries
- Analyzing diversity among executives and boards
- Comparing company financials across regions
- Sentiment analysis on company news and articles
- Building machine learning models to predict stock prices or corporate actions
Popular analytics tools for working with web scraped data include Excel, R, Python (pandas, numpy, matplotlib), Tableau, and PowerBI.
Conclusion
Web scraping Forbes data offers immense value for business intelligence when done properly. While there are challenges to extracting data from dynamic websites, the Forbes APIs provide a convenient way to access clean, structured data at scale.
By following this guide and the Forbes API documentation, you can efficiently scrape valuable data on companies, markets and business leaders. As always, be sure to respect any website‘s terms of service, robots.txt and applicable laws when scraping.
We‘d love to hear how you‘re using Forbes data! Share your applications and insights in the comments.