Blinkit is an instant delivery service provider that delivers groceries and everyday essentials to customers’ doorsteps in minutes. It has transformed the unorganised grocery landscape through the latest technology and innovation. Blinkit covers numerous cities to meet customer demand. In the evolving retail market, data is currency. This quick commerce company is a good place to extract valuable datasets at a large scale. However, it is difficult to do so because Blinkit uses JavaScript-rendered content and has a session for user management. Furthermore, its location lock feature means the service depends on the customer’s PIN code.
In today’s blog, we will use Python because it is easy to learn, has a rich standard library, and is strong in data science. To overcome the above issues, we will use an innovative approach by leveraging Python libraries to achieve our goal of scraping Blinkit product data.
Importance of Scraping Blinkit Product Data
Scrapping Blinkit has many advantages. These advantages are mentioned below:
Competitive Pricing
Blinkit is the largest online grocery delivery website. Scraping Blonkit product pricing can help you effectively monitor market rates. It enables you to track discounts to collect live offers. Extracting this quick delivery service provider platform helps to develop a dynamic strategy to adjust promotions quickly. It contains a massive amount of consumer insights that can be used to analyze discount trends.
Product Catalogue
The Blinkit product page contains various information on names, sizes, specifications, and more. It helps you manage your inventory without any hassle. Blinkit’s product catalogue can empower consumers to easily find the products they need. Scraping the Blikit product detail specification enables you to gather pack size information.
Customer Sentiment
Knowing customer preferences about your products or services is important. It helps you to analyze your consumers’ reviews to track feedback patterns. Blinkit is a useful site to judge and identify happy customers. It can help you understand service quality by evaluating delivery performance.
Category Coverage
You have to provide more varieties of products so that users can get more choices to choose from. Collecting products of a high-demand category is beneficial to identify consumers’ buying options.
Delivery Benchmarks
You can leverage Blinkit to use delivery benchmarks to measure service speed. It helps you check whether this service delivers products in the desired time slot or not. You can compare this time to spot bottleneck areas and smartly build and build brand loyalty.
Technical Essentials for Scraping Blinkit Product Data
- Python Latest Version: Always use the latest Python version to scrape Blinkit product data. It will provide the latest libraries and more reliable ones.
- Selenium: It is a robust library to handle dynamic content and PIN code entry.
- WebDriver: This is a free and open-source framework that is essentially required for automating Selenium.
- BeautifulSoup: It will help us in parsing HTML pages to work on static content.
- CSV/JSON Handling: We will store scraped data in a CSV file or JSON array.
- Proxy/Headers Setup: It will mimic user behaviour, helping in seamlessly extracting product data from Blinkit.
Step-by-Step Approach To Scrape Blinkit Product Information
Blinkit’s product pages are loaded dynamically to give consumers an interactive and personalized experience and are also location-aware. For seamless data scraping, we will develop the best scraper that parses, stores, and is resilient. Let’s move forward.
Step 1: Install Selenium
The first step is to install the necessary Python libraries.
pip install selenium webdriver-manager beautifulsoup4 lxml
This code is used to install beautifulsoup4, lxml, and Selenium. Here, webdriver-manager will automatically install the needed ChromeDriver.
Step 2: Imports and configuration
In the second step, we will use constants, logging, and a product dataset. It will be used to store scraped Blinkit product data.
import time, logging from dataclasses import dataclass, asdict from typing import List, Optional from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # Config START_URL = "https://blinkit.com/" PIN_CODE = "380001" # Example: Ahmedabad CATEGORY_HINT = "Dairy" # Example category WAIT_SEC = 15 SCROLL_PAUSE_SEC = 0.8 MAX_SCROLL_BATCHES = 30 logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s") @dataclass class Product: product_name: str brand: Optional[str] price: Optional[str] mrp: Optional[str] quantity: Optional[str] delivery_eta: Optional[str] product_url: Optional[str] image_url: Optional[str] category: Optional[str] location_pin: str
The config section shows various parameters we are going to scrape, like URL, pin code, category, wait time, and scroll limits.
Step 3: Build a Selenium driver
In the 3rd step, we will build a Selenium driver.
def build_driver(headless: bool = True) -> webdriver.Chrome:
opts = Options()
if headless:
opts.add_argument("--headless=new")
opts.add_argument("--window-size=1440,900")
opts.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=opts)
driver.set_page_load_timeout(30)
return driver
Step 4: Open Blinkit and set PIN
def open_site(driver):
driver.get(START_URL)
WebDriverWait(driver, WAIT_SEC).until(EC.presence_of_element_located((By.CSS_SELECTOR, "body")))
def enter_pin_code(driver, pin_code):
wait = WebDriverWait(driver, WAIT_SEC)
try:
btn = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,
"button[data-testid='location-button'], [aria-label*='location']")))
btn.click()
except Exception:
logging.info("Location button not found; modal may already be open.")
try:
pin_input = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,
"input[placeholder*='PIN'], input[type='tel']")))
pin_input.clear()
pin_input.send_keys(pin_code)
confirm = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,
"button[type='submit'], [data-testid*='confirm']")))
confirm.click()
except Exception:
logging.warning("PIN entry flow not matched; proceeding if inventory shows.")
Here, open_site(driver) will open the Blinkit homepage, and then it will wait for the page to be entirely loaded. enter_pin_code(driver, pin_code) function will find, click the location button, and enter the pin code. The code written in try and except ensures that if Selenium does not find the PIN, then it matches with it will display a warning message or log an info.
Step 5: Navigate to the category
In this step, we will navigate to the product category. To be specific, we will navigate to the category “Dairy”.
def navigate_to_category(driver, category_hint: Optional[str]):
wait = WebDriverWait(driver, WAIT_SEC)
try:
search = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,
"input[placeholder*='Search'], input[type='search']")))
search.clear()
search.send_keys(category_hint)
time.sleep(0.4)
search.submit()
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-testid*='product-card']")))
except Exception:
logging.warning("Category navigation failed; provide direct category URL if needed.")
Step 6: Infinite scroll and capture HTML
The next step is to scroll down product pages until no more products are found. It will provide full HTML code.
def infinite_scroll_and_capture_html(driver) -> str:
last_count, same_count_streak = 0, 0
for i in range(MAX_SCROLL_BATCHES):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(SCROLL_PAUSE_SEC)
cards = driver.find_elements(By.CSS_SELECTOR, "[data-testid*='product-card']")
count = len(cards)
logging.info(f"Scroll batch {i+1}: {count} products")
if count == last_count:
same_count_streak += 1
else:
same_count_streak = 0
last_count = count
if same_count_streak >= 2:
break
return driver.page_source
Step 7: Parse HTML with BeautifulSoup
This is a time to parse HTML using BeautifulSoup and extract product details.
def parse_products(html: str, category: str, pin_code: str) -> List[Product]:
soup = BeautifulSoup(html, "lxml")
cards = soup.select("[data-testid*='product-card']")
products = []
for c in cards:
name = c.select_one("[data-testid*='name']")
price = c.select_one("[data-testid*='price']")
brand = c.select_one("[data-testid*='brand']")
mrp = c.select_one("[data-testid*='mrp']")
qty = c.select_one("[data-testid*='quantity']")
eta = c.select_one("[data-testid*='delivery']")
purl = c.select_one("a[href]")
img = c.select_one("img[src]")
products.append(Product(
product_name=name.get_text(strip=True) if name else None,
brand=brand.get_text(strip=True) if brand else None,
price=price.get_text(strip=True) if price else None,
mrp=mrp.get_text(strip=True) if mrp else None,
quantity=qty.get_text(strip=True) if qty else None,
delivery_eta=eta.get_text(strip=True) if eta else None,
product_url=purl["href"] if purl else None,
image_url=img["src"] if img else None,
category=category,
location_pin=pin_code
))
return products
The above code creates a BeautifulSoup object, finds all product cards, loops through each card, builds Product datclass object, and returns a list of Product objects. It will transform raw data into clean, structured datasets.
Step 8: Save scraped results
Now, we will save our extracted data.
def save_to_csv(products: List[Product], filename: str):
import csv
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=list(asdict(products[0]).keys()))
writer.writeheader()
for p in products:
writer.writerow(asdict(p))
logging.info(f"Saved {len(products)} products to {filename}")
The code written in this step stores data in a CSV file. We have used Python’s built-in csv module to fulfil this need.
Step 9: Orchestrate the run
This is our last step. It is all about bringing all the steps mentioned above together and saving to CSV.
def run(pin_code: str, category_hint: str, out_csv: str): driver = build_driver(headless=True) try: open_site(driver) enter_pin_code(driver, pin_code) navigate_to_category(driver, category_hint) html = infinite_scroll_and_capture_html(driver) products = parse_products(html, category_hint, pin_code) if products: save_to_csv(products, out_csv) finally: driver.quit() if __name__ == "__main__": run(PIN_CODE, CATEGORY_HINT, "blinkit_dairy.csv")
Code Limitations
Our code can efficiently extract all important product data from Blinkit. However, here you can note that it has some limitations.
- Inventory is tied to PIN; So you have to run code for each PIN if you want to cover more than one region.
- If your session expires, PIN re‑entry will be required.
- Blinkit renders the product page dynamically, which makes it very tricky to scrape the needed data. Here, the code may run slowly if you collect large datasets.
Important Legal and Ethical Considerations
The process of scraping Blinkit product data should be done ethically. This is mainly to avoid server strain and maintain brand trust.
- Ensure Data Accuracy: Ensure that you get trustworthy insights to make better business decisions. Prevent errors to avoid wrong analysis.
- Protect User Privacy: You should not scrape any personal consumer or supplier data, such as name, email, address, etc., to prevent identity theft and financial fraud.
- Respect Site Terms of Services: No matter what website data you are scraping, you have to ensure adhere to its data usage policy. You have to scrape data thoughtfully by following its ToS for the ease of smooth data scraping process.
- Stay Compliance: While scraping Blinkit data, you have to stay compliant with data laws such as the CPA and GDPR. You do not have to scrape copyrighted images. Extracting copyrighted product photos may lead to expensive lawsuits.
Best Practices
If you want to gain accurate results, it is good that you follow some methods. Let’s discuss them.
- Use Broad Selectors: You have to use general CSS to match flexible elements and stable scraping logic. Include multiple CSS paths in the code to increase resilience. Always try to avoid brittle IDs. This helps you prevent quick failure.
- Add Error Handling: Add an error handling mechanism called try and except to prevent code failure. It helps to catch unexpected errors and improve scraper resilience. You can track problem sources to prevent log parsing issues.
- Limit Scraping Frequency: You have to limit your scraping frequency to prevent server overload. Once the server is overloaded, you cannot scrape data from Blinkit because it will ban your IP. So always control the rate at which your scraper is extracting data.
- Validate Scraped Data: After scraping data, you have to validate it to detect wrong values, remove duplication and check missing fields. You can also standardise outputs by confirming format consistency.
Wrapping Up
It’s time to conclude the blog. This is a comprehensive blog post in which you learned how to scrape Blinkit, the popular quick commerce website. We explained the importance of scraping Blinkit and some technical essentials needed to extract this platform. We wrote a simple code to extract data product name, brand name, price, product image, product URL, and more from Blinkit.
iWeb Scraping is a trusted organisation, helps you ethically extract data from the quick commerce site. It empowers businesses to stay ahead in hyper hyper-competitive market landscape. Want to discuss your data needs with iWeb Scraping? Just contact them.
Parth Vataliya