Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

Note: You should have prior hands-on knowledge with Scrapy.

  1. Lets start with installing Scrapy.
    pip install Scrapy

  2. Generate spider project
    scrapy startproject githubspider

  3. Generate spider
    scrapy genspider githubligin

  4. Open the file in the spider folder in your favourite code editor, and paste the below contents.

# -*- coding: utf-8 -*-
import scrapy

class GithubloginSpider(scrapy.Spider):
    name = 'githublogin'
    allowed_domains = ['']
    start_urls = ['']

    login in to github
    def parse(self, response):
        hidden_utf8 = response.css('#login > form > input[type="hidden"]:nth-child(1)::attr(value)').extract_first()
        hidden_authenticity_token = response.css('#login > form > input[type="hidden"]:nth-child(2)::attr(value)').extract_first()
        return scrapy.FormRequest.from_response(
            formdata = {
                'hidden_utf8': hidden_utf8,
                'hidden_authenticity_token': hidden_authenticity_token,
                'login': 'githubuser',
                'password': 'githubpass',
                'commit': 'Sign in'

    def scrape_homepage(self, response):
        # yield {'response':response.text}
        yield scrapy.Request('', callback=self.scrape_profilepage)

    def scrape_profilepage(self, response):
        yield {'response':response.text}

  1. Replace the 'githubuser' and 'githubpass' with your GitHub username and password.

  2. Run the spider i.e., using below command
    scrapy crawl githublogin -o githublogin.json


Please share if you enjoyed reading this blog, and subscribe to get update on latest blog.