As one of the most recognized restaurant reservation and review websites, OpenTable has opinions from millions of diners worldwide. This site has a wealth of reviews that provide opportunities for data analysis, including consumer satisfaction and service, and providing feedback on food products. The data has a number of uses; a business can use it to keep the status quo or improve, analysts can use it to analyze industry trends, and researchers can use it to understand consumer behaviour.
This case study will demonstrate how to scrape and analyze restaurant reviews from OpenTable in Python. We’ll design a scraper, scrape the data, clean the data, save the data, and perform a simple sentiment analysis. We actually want to do more than just scrape the data. We want to derive insights from the data we scrape.
What Are The Step-by-Step Guide to Scraping and Analyzing OpenTable Reviews with Python?
Understanding OpenTable’s Website Structure
To make our case study tangible, we will need a specific restaurant listing from OpenTable. We will only be scraping a single public listing and will not be scraping very many reviews, perhaps the first three pages, to ensure a representative and ethical demonstration. The process will be similar regardless of which restaurant you use.
However, ethical considerations require you to obtain permission or confirm you are using this case study in a fair-use educational environment.
Organize OpenTable restaurant pages in the URL pattern:
https://www.opentable.com/r/-?reviews=true
The ?reviews=true parameter will take you directly to the social proof (review) portion of the page.
Before undertaking the scrape, we need to inspect the page in our browser with the developer tools (right-click on the page -> inspect). In this view, you will be able to identify the HTML tags and classes that contain the reviewer’s name, date, rating, and review. You’ll also see whether you wrote the page in static HTML (which is easier to scrape) or whether the review content uses JavaScript for dynamic loading (note that dynamic content would require the use of Selenium or another browser automation).
We’ll document these findings before writing any code. This preparation step is critical for ensuring our scraper targets the correct elements, handles pagination correctly, and collects clean, structured data that’s ready for analysis.
Setting Up Your Python Scraping Environment
Before we dive into coding, we need to set up a Python environment with all the necessary libraries for scraping, cleaning, and analysis. We will be using the following stack:
- Selenium – for helping to automate some actions in the browser and to help load dynamic reviews
- BeautifulSoup – for parsing the HTML we will have loaded after Selenium has done its work
- Pandas – for cleaning and storing our data
- TextBlob – for sentiment analysis
- Matplotlib and Seaborn – for visualization
You can install them all in one command:
pip install selenium beautifulsoup4 pandas textblob matplotlib seaborn fake-useragent
Selenium uses a browser driver, so you also need to download the ChromeDriver (or GeckoDriver if you are using Firefox) for your version of the browser. Please put it in your PATH (preferred) or tell Selenium where it is when you initialize. Also, please consider making a virtual environment for your environment to avoid dependency confusion:
pip install selenium beautifulsoup4 pandas textblob matplotlib seaborn fake-useragent
Selenium uses a browser driver, so you also need to download the ChromeDriver (or GeckoDriver if you are using Firefox) for your version of the browser. Please put it in your PATH (preferred) or tell Selenium where it is when you initialize. Also, please consider making a virtual environment for your environment to avoid dependency confusion:
python -m venv opentable_env source opentable_env/bin/activate # Mac/Linux opentable_env\Scripts\activate # Windows
Once your new environment is ready, you should test the imports in Python to make sure everything works as expected. It is beneficial for eliminating surprises at runtime. A well-prepared environment can encourage smooth development and will make it easier to port should you or a colleague need to run the scraper on a different machine.
Inspecting the Page Structure
Before we write our first line of scraping code, we need to understand how OpenTable formats its review pages. It is drawing a map before we start the trip, to know where we can get our data from, and navigate to it.
Let’s open the review page of the target restaurant in Chrome (or another browser) and right-click on a review and select Inspect. It will bring up Developer Tools, allowing us to see the HTML structure. You will probably find that each review is within a tag with a class name like “review”. Within the review, you may see if it’s OpenTable:
- Reviewer name: <span class=”reviewer-name”>
- Review date: <span class=”review-date”>
- Star rating: <div class=”star-rating” aria-label=”5 stars”>
- Review text: <p class=”review-text”>
Be sure to scroll to see if the reviews load automatically, or if they all load once you click a “More Reviews” button. If the start page doesn’t initially contain all the reviews in the HTML, it is an indication that the reviews are loading dynamically, with JavaScript.
Finally, check to see if pagination applies. Some review pages use query parameters such as ?page=2; others may have some interaction that forces a new page. Now you will know if it makes sense to use a static requests scraper or to use Selenium to scrape data dynamically.
Building a Selenium Web Scraper for OpenTableWriting Scraper
Because many OpenTable reviews are loaded dynamically, we will use Selenium to control an actual browser and scrape the fully rendered HTML. Selenium mimics what a human user would: opening the page, waiting for elements to render, and clicking buttons to paginate.
Our scraper is going to:
- Open the restaurant’s review pages.
- Wait until the elements for the reviews can load on the page.
- For each review block, our scraper will extract the reviewer name, date, rating, and text.
- Click the “Next” button, scroll through the page, and extract more reviews until we have scraped the number of pages we want.
Example Setup:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.opentable.com/r/example-restaurant?reviews=true")
WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CLASS_NAME, "review"))
)
We also want to add random delays in the scraping because this will make the browsing look like a human, rather than a machine, which will lessen the chances of getting caught by the anti-bot activity from the OpenTable site.
Store the data as a list of lists, and then eventually cast and add to a Pandas DataFrame for cleaning and analysis. Now we can move forward with automating the pagination.
Handling Pagination to Capture Multiple Review Pages
Most of the time, OpenTable restaurant pages won’t show every review in one go–the website usually is set to display “x” number of reviews by using pagination (“Next” button or infinite scroll). If you do not handle pagination, you will only pull the first page of reviews and will miss a lot of valuable data.
When working with Selenium, pagination is simple; you locate the “Next” button (sometimes the button will be a class called “pagination-next”) and simulate a click. After we click the button, we wait for the old reviews to disappear and for the new reviews to load. It is essential so that we don’t extract the same slice of reviews repeatedly.
Here’s an example for grabbing pages:
while True:
try:
next_button = driver.find_element(By.CLASS_NAME, "pagination-next")
driver.execute_script("arguments[0].click();", next_button)
WebDriverWait(driver, 10).until(EC.staleness_of(reviews_elements[0]))
except:
break
We also set to limit the number of pages we scrape to ensure that we respect the rate limit for the OpenTable browsing session, and in between click actions, we added random delays (`time.sleep(random.uniform(2, 4))`), to imitate the human browsing characteristics.
The pagination logic allows us to collect enough reviews for a substantive analysis, while allowing for review volume under scraping ethics, while being respectful of the OpenTable website.
Cleaning and Preprocessing the Scraped Data
Once we scrape our reviews into a Pandas DataFrame, the raw data will most likely require cleaning: review text might contain extraneous spaces, new line characters, and non-UTF-8 symbols. Ratings might be characters instead of integers, and dates might not be in a required format.
The basic steps we would want to take when cleaning would include:
df["Review"] = df["Review"].str.replace("\n", " ").str.strip()
df["Rating"] = df["Rating"].str.extract(r"(\d+)").astype(int)
df["Date"] = pd.to_datetime(df["Date"])
We are taking care of removing line breaks, cleaning up spaces, extracting numeric ratings, and changing dates into a date-time format that will allow us to sort, filter, and use time functions.
We would consider normalizing reviewer names if necessary (e.g., using “Anonymous” for missing values) and removing duplicate rows that may have resulted from pagination overlap.
Clean, normalized data is the foundation of trustworthy analysis. An inconsistently cleaned dataset can yield inaccurate visualizations and/or inaccurate sentiment analysis outcomes! If we take the time to clean the data, we at least know we are building the insights on solid ground. The cleaning also prepares the dataset for us to export as either a CSV or JSON file for either storing or processing.
Performing Sentiment Analysis with Python
Now that our data is clean, we can do sentiment analysis to see how positive or negative the overall tone of the reviews is. We are going to use TextBlob, a Python library that assigns a polarity score from -1 = (negative) to +1 = (positive) for each type of text.
To get TextBlob installed:
pip install textblob
The function to score the sentiment:
from textblob import TextBlob def get_sentiment(text): return TextBlob(text).sentiment.polarity df["Sentiment"] = df["Review"].apply(get_sentiment)
Classify scores:
def label_sentiment(score): if score > 0.1: return "Positive" elif score < -0.1: return "Negative" else: return "Neutral" df["SentimentLabel"] = df["Sentiment"].apply(label_sentiment)
This function provides a numeric sentiment measure and a label that you can easily add to the summary statement of the sentiment. While TextBlob is easy and quick to use, there are more sophisticated models like VADER or transformer-based approaches that can give you better accuracy on nuanced restaurant sentiment.
Visualizing Restaurant Review Insights
The numbers themselves can be helpful, but visualizing the data will help us to see those patterns more clearly. We will use Seaborn and Matplotlib to create three plots quickly:
Distribution of Star Ratings – plots how many of each star rating were present in the reviews.
sns.countplot(x="Rating", data=df, palette="viridis")
plt.title("Star Rating Distribution")
plt.show()
Sentiment Trend Over Time – plots the time series trend of the polarity grouping over the months.
sns.lineplot(x="Date", y="Sentiment", data=df)
plt.title("Sentiment Trend Over Time")
plt.show()
Counts Sentiment Categories – plots the proportions of positive, neutral, and negative reviews.
sns.countplot(x="SentimentLabel", data=df, palette="coolwarm")
plt.title("Sentiment Categories")
plt.show()
Visualizations help us to see seasonal spikes in sentiment, as well as changes in the distribution of ratings or increases in the count of negative reviews. Exploratory data visualizations are beneficial because they can make the results feel more engaging and exciting to stakeholders who may be less inclined to comb through raw tables.
What Are The Key Findings from the Sample Data?
After looking at three pages of feedback for our sample restaurant, we found some useful insights:
- High Satisfaction: Roughly 80% of the review ratings are five stars indicating that customers are happy.
- Positive Tone: Most reviews have a positive/neutral tone. There is very little in the way of negative comments.
- Seasonal Trends: The sentiment score varies considerably by season. Sentiment scores are highest in December suggesting that there is something about holiday specials or events that leads to positive experiences!
- Recurring Themes: Finally, there are major reports of friendliness from the staff and only a few key themes relating to the uniqueness of signature dishes!
While this is just a small amount of data we can still draw a fair amount of insights! Management can focus on strengths, such as staff training and popular dishes, while also working on weaknesses, such as improving service speed.
If we apply this process to more restaurants, we could find broader trends, such as cities that consistently get higher ratings or seasonal changes in customer satisfaction linked to outside factors. This data helps both as a performance measure and as a plan for improving operations.
Expanding the Case Study: Scaling and Advanced Techniques
Our Frameworks lend themselves well to larger-scale projects. For example, imagine scraping not just a restaurant, but several listings and comparing the sentiment trends, ratings, and commonly appearing keywords among competing restaurants.
We could add in keyword extraction (probably in the form of either collections, Counter, or the use of NLP libraries such as spaCy) to extract common phrases such as “great service” or “too noisy,” and count how often they appear. We can evaluate how frequently some of these keywords occur in combination with the overall sentiment score, allowing us to conclude some of the topics with the most positive or negative weight.
We should develop an automated system to scrape updates weekly or monthly and save the results to a database. In that case, we can monitor sentiment and rating trends over time. For example, we could create a performance dashboard with the restaurant’s ratings, sentiment trends, and keywords in customer feedback on an ongoing basis. By incorporating geospatial data, we could also develop regional dining trends, informing the restaurant’s strategy both locally and nationally.
Because we have now built this modular scraper, the expansion opportunities are virtually limitless; we can easily increase the scale or apply it to other platforms for reviews with minimal adjustments. It makes this method widely applicable for anyone in hospitality analytics or market intelligence/research.
Conclusion
In this case study, we took you through an end-to-end workflow for scraping and analyzing OpenTable restaurant reviews. Starting with looking at the site structure, we built a scraper using Selenium (to parse the dynamically loaded content), then performed some cleaning and formatting of our scraped results, and finally, we conducted some sentiment analysis using TextBlob.
The resulting visualizations allowed us to make simple assessments of things like rating distribution, sentiment over time, and categorization of tone around reviews. We acted on a tiny sample of data (ethically), yet we were able to extract valuable business intelligence, demonstrating how customer feedback data may help develop an actionable strategy.
Final points:
- Always read the terms of service and scrape ethically.
- Make sure to set up your environment and inspect the HTML form before you code anything.
- Clean your data thoroughly before you analyze anything.
- Utilize the inspected data to create sentiment scoring and visualizations to provide insights from raw text data.
At iWeb Scraping, we position ourselves as builders of ethical and robust data collection solutions like this, and help businesses convert raw online reviews into actionable insights. From here, you could extend the project to scrape more restaurants, perform more in-depth NLP analysis, track keywords, or automate dashboards, subject to the data you collect. Apply the tools and techniques to a wide range of other review platforms.
Parth Vataliya
