How to Log In to Websites Using ScrapingBee with Node.js

Web scraping often requires logging into websites in order to access pages and data only available to authenticated users. However, programmatically logging into websites can be challenging. You need to handle form submissions, manage cookies and authentication tokens, and deal with CAPTCHAs and other anti-bot measures.

Navi.

Fortunately, the ScrapingBee web scraping API service provides some handy features to simplify logging into websites when scraping with Node.js. In this tutorial, we‘ll walk through three different methods to log in to websites using ScrapingBee and Node.js:

Automating logins with a JavaScript scenario
Logging in via a direct POST request
Authenticating by attaching cookies

We‘ll provide detailed, up-to-date code examples for each approach. To follow along, you‘ll need a ScrapingBee API key.

Method 1: JavaScript Login Scenario

With this method, we instruct ScrapingBee to run a series of actions to automate interacting with a website‘s login form, just like a human user would. Here‘s an example using the official scrapingbee Node.js SDK:

const scrapingbee = require(‘scrapingbee‘);
const fs = require(‘fs‘);

const client = new scrapingbee.ScrapingBeeClient(‘YOUR_API_KEY‘);

const loginScenario = {
  instructions: [
    {fill: ["#username", "your-username"]},
    {fill: ["input[name=‘password‘]", "your-password"]}, 
    {click: "button[type=‘submit‘]"}, 
    {wait: 1000}
  ]
};

client.get({
  url: ‘https://example.com/login‘,
  params: {
    js_scenario: loginScenario, 
    screenshot: ‘true‘
  }
})
.then((response) => fs.writeFileSync(‘after-login.png‘, response.data))
.catch(console.error);

This tells ScrapingBee to:

Navigate to the login page
Fill in the username and password fields
Click the submit button
Wait 1 second for the page to load
Take a screenshot to confirm we‘re logged in

The CSS selectors used to target the form fields and button may need to be adapted for the specific site you‘re logging into.

Pros:

Fully automates the login process
Mimics normal user behavior, less likely to be detected as a bot

Cons:

May not work if login form is complicated or uses dynamic IDs
Slow, as it loads pages in a full browser environment

Method 2: Direct Login POST Request

Inspecting the network activity when logging in shows that ultimately the login form submits a POST request with the username and password to a specific URL. We can replicate that POST request to log in more directly.

Here‘s how to implement this using the node-fetch library:

const fetch = require(‘node-fetch‘);

fetch(‘https://app.scrapingbee.com/api/v1‘, {
  method: ‘POST‘,
  headers: {
    ‘Content-Type‘: ‘application/json‘  
  },
  body: JSON.stringify({
    api_key: ‘YOUR_API_KEY‘,
    url: ‘https://example.com/login‘,
    method: ‘POST‘,  
    body: {
      username: ‘your-username‘,  
      password: ‘your-password‘
    }
  })
})
.then(response => response.text())
.then(body => console.log(body)) 
.catch(console.error);

We send a POST request to ScrapingBee with our API key, the login URL, specifying it should use the POST method, and providing the login credentials to submit.

The HTTP response will include any cookies set, which you could extract and pass to subsequent requests to stay logged in.

Pros:

Faster, no need to load pages or run JavaScript
More reliable for simple login forms

Cons:

Requires figuring out the exact POST parameters to send
More detectable as a bot, riskier

Method 3: Login with Cookies

If you can get ahold of the session cookie set when logged into a site, you can attach that to your scraping requests and it will authenticate you.

First, log into the target site manually with your browser. Then use the developer tools to find and copy the value of the session cookie. It will likely have a name like sessid or auth_token.

Here‘s how to use that cookie value with ScrapingBee to access a page requiring authentication:

const scrapingbee = require(‘scrapingbee‘);
const client = new scrapingbee.ScrapingBeeClient(‘YOUR_API_KEY‘);

client.get({
  url: ‘https://example.com/private‘, 
  cookies: { 
    name: ‘session_id‘,
    value: ‘your-session-cookie-value‘
  }
})
.then(response => console.log(response.data))
.catch(console.error);

Pros:

Very simple, no need to fiddle with complex login sequences
Efficient, just attach cookie to requests as needed

Cons:

Requires manually logging in and extracting the session cookie
Session cookies can expire requiring you to repeat the process

Tips for Handling Login Challenges

Some websites employ additional challenges for logins to prevent bots and other abuse. Some common ones and how to deal with them:

CAPTCHAs – ScrapingBee has built-in support for solving reCAPTCHA v2 and v3 challenges. Enable it by adding solve_recaptcha: true to your requests.
2 Factor Authentication – If you have access to the email/phone number, you can manually retrieve the code and provide it as part of a JavaScript scenario. ScrapingBee also supports automating 2FA with its 2captcha integration.
Rare User Agents/Devices – ScrapingBee allows you to specify a custom User-Agent header and even a specific device to emulate. Use the user_agent and device parameters.
Disabling JavaScript – Some sites require JavaScript to be enabled to work properly. Make sure to set js_enabled: true in your ScrapingBee requests.

Choosing a Login Method

Which login approach to use depends on the complexity of the site you‘re logging into. In general, start by trying the direct POST method as it‘s the simplest and most efficient.

If the login requires executing JavaScript or has additional fields and interactions, try the automated JavaScript scenario approach.

Use the cookies approach for simple authentication if you‘re able to easily grab a session cookie manually.

Mix and match these techniques as needed. For example, you could log in with a JavaScript scenario once to get a session cookie, then attach that cookie to future requests.

With the flexibility of ScrapingBee and Node.js, you can handle logging in to most websites to scrape data behind authentication.

Let me know if you have any other questions! Happy scraping!