Social Media

How Web Scraping is Used for Extracting LinkedIn Companies Using Selenium and BeautifulSoup?

Parth Vataliya

6 min read

July 27, 2021

For starting a new project, using web scraping services is the best option that can provide data sets offered by various sites. You wish to scrape the specific number of posts from every company and then execute Machine Learning methods.

What Kind of Information Will You Scrape from the Website?

Name
Date
Post
Likes

Examine the Web Page

Go to examine by pressing F12 or right-clicking on the page.

A basic understanding of “HTML” is required. However, you can click on almost any data on the page that interests you, and you will be taken to the exact spot on the HTML lines coding.

Python Script

The magic will be performed by these two Python languages (BeautifulSoup and Selenium). By following the instructions, you can configure Selenium and the web driver.

The first step is to import the libraries into Python.

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import pandas as pd

We’ll start by creating an example, then run a browser in private mode and expand the window.

options=webdriver.ChromeOptions()
options.add_argument('--incognito')
driver=webdriver.Chrome(options=options)
driver.get('https://www.linkedin.com/uas/login')
driver.maximize_window()

To log in, we’ll send our credentials (username and password).

username = driver.find_element_by_id('username')
username.send_keys('your_username')
password = driver.find_element_by_id('password')
password.send_keys('your_password')
log_in_button = driver.find_element_by_class_name('from__button--floating')
log_in_button.click()

You will need to add the URLs of the firms that you want to scrape to the list.

urls = [
'https://www.linkedin.com/company/alicorp-saa/posts/?feedView=all','https://www.linkedin.com/company/backus/posts/?feedView=all'
]

Let us create a dictionary for saving the data
data = {
"name": [],
"date": [],
"post": [],
"likes": [],
"count_posts":[]
}

You’ll need to modify this and see how many scrolls you’ll have, and we’ll also have to include a variable delay to reload the site.

for i in range(max(0,40)):
driver.execute_script('window.scrollBy(0, 500)')
sleep(1)

BeautifulSoup will parse HTML when the vehicle has completed the above step, allowing us to collect whatever we require. For instance, let’s say we require post content.

posts=soup.find_all('div',{'class':"occludable-update ember-view"})post = posts[2].find('div',{'class':"feed-shared-update-v2__description-wrapper ember-view"}).span.get_text()
if post:
print(post)

Output:

Put Data into DataFrame

df = pd.DataFrame(data)
df.head(10)

For Any Queries, Contact iWeb Scraping!!

import os
import selenium.webdriver
import csv
import time
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup

url_sets=["https://www.walmart.com/browse/tv-video/all-tvs/3944_1060825_447913",
"https://www.walmart.com/browse/computers/desktop-computers/3944_3951_132982",
"https://www.walmart.com/browse/electronics/all-laptop-computers/3944_3951_1089430_132960",
"https://www.walmart.com/browse/prepaid-phones/1105910_4527935_1072335",
"https://www.walmart.com/browse/electronics/portable-audio/3944_96469",
"https://www.walmart.com/browse/electronics/gps-navigation/3944_538883/",
"https://www.walmart.com/browse/electronics/sound-bars/3944_77622_8375901_1230415_1107398",
"https://www.walmart.com/browse/electronics/digital-slr-cameras/3944_133277_1096663",
"https://www.walmart.com/browse/electronics/ipad-tablets/3944_1078524"]

categories=["TVs","Desktops","Laptops","Prepaid_phones","Audio","GPS","soundbars","cameras","tablets"]


# scraper
for pg in range(len(url_sets)):
# number of pages per category
top_n= ["1","2","3","4","5","6","7","8","9","10"]
# extract page number within sub-category
url_category=url_sets[pg]
print("Category:",categories[pg])
final_results = []
for i_1 in range(len(top_n)):
print("Page number within category:",i_1)
url_cat=url_category+"?page="+top_n[i_1]
driver= webdriver.Chrome(executable_path='C:/Drivers/chromedriver.exe')
driver.get(url_cat)
body_cat = driver.find_element_by_tag_name("body").get_attribute("innerHTML")
driver.quit()
soupBody_cat = BeautifulSoup(body_cat)


for tmp in soupBody_cat.find_all('div', {'class':'search-result-gridview-item-wrapper'}):
final_results.append(tmp['data-id'])

# save final set of results as a list 
codelist=list(set(final_results))
print("Total number of prods:",len(codelist))
# base URL for product page
url1= "https://walmart.com/ip"


# Data Headers
WLMTData = [["Product_code","Product_name","Product_description","Product_URL",
"Breadcrumb_parent","Breadcrumb_active","Product_price", 
"Rating_Value","Rating_Count","Recommended_Prods"]]

for i in range(len(codelist)):
#creating a list without the place taken in the first loop
print(i)
item_wlmt=codelist[i]
url2=url1+"/"+item_wlmt
#print(url2)


try:
driver= webdriver.Chrome(executable_path='C:/Drivers/chromedriver.exe') # Chrome driver is being used.
print ("Requesting URL: " + url2)


driver.get(url2) # URL requested in browser.
print ("Webpage found ...")
time.sleep(3)
# Find the document body and get its inner HTML for processing in BeautifulSoup parser.
body = driver.find_element_by_tag_name("body").get_attribute("innerHTML")
print("Closing Chrome ...") # No more usage needed.
driver.quit() # Browser Closed.


print("Getting data from DOM ...")
soupBody = BeautifulSoup(body) # Parse the inner HTML using BeautifulSoup


h1ProductName = soupBody.find("h1", {"class": "prod-ProductTitle prod-productTitle-buyBox font-bold"})
divProductDesc = soupBody.find("div", {"class": "about-desc about-product-description xs-margin-top"})
liProductBreadcrumb_parent = soupBody.find("li", {"data-automation-id": "breadcrumb-item-0"})
liProductBreadcrumb_active = soupBody.find("li", {"class": "breadcrumb active"})
spanProductPrice = soupBody.find("span", {"class": "price-group"})
spanProductRating = soupBody.find("span", {"itemprop": "ratingValue"})
spanProductRating_count = soupBody.find("span", {"class": "stars-reviews-count-node"})

################# exceptions #########################
if divProductDesc is None:
divProductDesc="Not Available"
else:
divProductDesc=divProductDesc

if liProductBreadcrumb_parent is None:
liProductBreadcrumb_parent="Not Available"
else:
liProductBreadcrumb_parent=liProductBreadcrumb_parent

if liProductBreadcrumb_active is None:
liProductBreadcrumb_active="Not Available"
else:
liProductBreadcrumb_active=liProductBreadcrumb_active

if spanProductPrice is None:
spanProductPrice="NA"
else:
spanProductPrice=spanProductPrice


if spanProductRating is None or spanProductRating_count is None:
spanProductRating=0.0
spanProductRating_count="0 ratings"


else:
spanProductRating=spanProductRating.text
spanProductRating_count=spanProductRating_count.text




### Recommended Products
reco_prods=[]
for tmp in soupBody.find_all('a', {'class':'tile-link-overlay u-focusTile'}):
reco_prods.append(tmp['data-product-id'])


if len(reco_prods)==0:
reco_prods=["Not available"]
else:
reco_prods=reco_prods
WLMTData.append([codelist[i],h1ProductName.text,ivProductDesc.text,url2,
liProductBreadcrumb_parent.text, 
liProductBreadcrumb_active.text, spanProductPrice.text, spanProductRating, 
spanProductRating_count,reco_prods])


except Exception as e:
print (str(e))

# save final result as dataframe
df=pd.DataFrame(WLMTData)
df.columns = df.iloc[0]
df=df.drop(df.index[0])

# Export dataframe to SQL
import sqlalchemy
database_username = 'ENTER USERNAME'
database_password = 'ENTER USERNAME PASSWORD'
database_ip = 'ENTER DATABASE IP'
database_name = 'ENTER DATABASE NAME'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'. 
format(database_username, database_password, database_ip, base_name))
df.to_sql(con=database_connection, name='‘product_info’', if_exists='replace',flavor='mysql')

You may always add additional complexity into this code for adding customization to the scraper. For example, the given scraper will take care of the missing data within attributes including pricing, description, or reviews. The data might be missing because of many reasons like if a product get out of stock or sold out, improper data entry, or is new to get any ratings or data currently.

For adapting different web structures, you would need to keep changing your web scraper for that to become functional while a webpage gets updated. The web scraper gives you with a base template for the Python’s scraper on Walmart.

Want to extract data for your business? Contact iWeb Scraping, your data scraping professional!

Frequently Asked Questions

The primary advantage is scalability and real-time business intelligence. Manually reading tweets is inefficient. Sentiment analysis tools allow you to instantly analyze thousands of tweets about your brand, products, or campaigns. This provides a scalable way to understand customer feelings, track brand reputation, and gather actionable insights from a massive, unfiltered source of public opinion, as highlighted in the blog’s “Advantages” section.

By analyzing the sentiment behind tweets, businesses can directly understand why customers feel the way they do. It helps identify pain points with certain products, gauge reactions to new launches, and understand the reasons behind positive feedback. This deep insight into the “voice of the customer” allows companies to make data-driven decisions to improve products, address complaints quickly, and enhance overall customer satisfaction, which aligns with the business applications discussed in the blog.

Yes, when using advanced tools, it provides reliable and consistent criteria. As the blog notes, manual analysis can be inconsistent due to human bias. Automated sentiment analysis using Machine Learning and AI (like the technology used by iWeb Scraping) trains models to tag data uniformly. This eliminates human inconsistency, provides results with a high degree of accuracy, and offers a reliable foundation for strategic business decisions.

Businesses can use a range of tools, from code-based libraries to dedicated platforms. As mentioned in the blog, popular options include Python with libraries like Tweepy and TextBlob, or dedicated services like MeaningCloud and iWeb Scraping’s Text Analytics API. The choice depends on your needs: Python offers customization for technical teams, while off-the-shelf APIs from web scraping services provide a turnkey solution for automatically scraping Twitter and extracting brand insights quickly and accurately.

Share this Article :

Build the scraper you want123

We’ll customize your concurrency, speed, and extended trial — for high-volume scraping.

Continue Reading

Business

Why Web Scraping Alone Is No Longer Enough for Modern Businesses?

Web scraping is an effective way to gather data from websites, but businesses are increasingly seeking more advanced methods of …

Parth Vataliya Reading Time: 10 min

E-Commerce

How to Scrape Personal Care & Beauty Product Data from Sephora.com?

Sephora.com hosts over 300 brands and thousands of beauty products. Extracting this data helps businesses analyze pricing trends, track competitor …

Parth Vataliya Reading Time: 13 min

Other

How to Extract AI Overviews for Multiple Queries: A Technical Guide

What Are AI Overviews and Why Should You Extract Them? AI Overviews represent Google’s latest innovation in search technology. These …

Parth Vataliya Reading Time: 10 min

Get in Touch with Us

iWeb Scraping eliminates manual data entry with AI-powered extraction for businesses.

Address

Web scraping is an efficien

Address

Web scraping is an efficien

Address

Web scraping is an efficien

Address

Web scraping is an efficien

Expert Consultation

Discuss your data needs with our specialists for tailored scraping solutions.

Expert Consultation

Discuss your data needs with our specialists for tailored scraping solutions.

Expert Consultation

Discuss your data needs with our specialists for tailored scraping solutions.

Social Media :

Managed Extraction

Managed Extraction

By Use Case

By Industry

Categories

APIs

Web Scraping API

APIs

Web Scraping API

Web Scraping API

Web Scraping API

Frequently Asked Questions

Table of Contents

Build the scraper you want123

Why Web Scraping Alone Is No Longer Enough for Modern Businesses?

How to Scrape Personal Care & Beauty Product Data from Sephora.com?

How to Extract AI Overviews for Multiple Queries: A Technical Guide

Get in Touch with Us

Get in Touch with Us

Address

Address

Address

Address

Expert Consultation

Expert Consultation

Expert Consultation

Managed Extraction

Managed Extraction

By Use Case

By Industry

Categories

APIs

Web Scraping API

APIs

Web Scraping API

Web Scraping API

Web Scraping API

How Web Scraping is Used for Extracting LinkedIn Companies Using Selenium and BeautifulSoup?

What Kind of Information Will You Scrape from the Website?

Examine the Web Page

Python Script

Output:

Frequently Asked Questions

What is the main advantage of using Twitter sentiment analysis for business?

How can Twitter sentiment analysis improve customer experience?

Is automated Twitter sentiment analysis reliable for business decisions?

What tools can a business use to perform Twitter sentiment analysis?

Table of Contents

Build the scraper you want123

Continue Reading

Why Web Scraping Alone Is No Longer Enough for Modern Businesses?

How to Scrape Personal Care & Beauty Product Data from Sephora.com?

How to Extract AI Overviews for Multiple Queries: A Technical Guide

Get in Touch with Us

Get in Touch with Us

Address

Address

Address

Address

Expert Consultation

Expert Consultation

Expert Consultation