Scraping the Web in Python — IMDb

Hoda Saiful
2 min readMay 16, 2021

--

In this post, we scrape through IMDb web page and extract the name , runtime and genre of the movie using urllib and Beautiful soup libraries in Python.

For an understanding of HTML tags, refer the attached .

Layout of the page.

Filter movies by Released date between January’1950 and December’2012 ordered by “Number of Votes” descending.

Results home page comprises of 50 movies out of a total of 3,711,876 .

Inspect the HTML and locate the data of interest i.e name, runtime and genre .

Import the libraries

Request data from the URL and dump the HTML into a variable page_html

Extract the value from the tags and direct the output to a . CSV file

Parse the HTML dump, iterate through the items and extract the name, year and runtime tags. Direct the output to a .CSV file

Check the contents of the .csv file and it should have the data

--

--

Hoda Saiful
Hoda Saiful

Written by Hoda Saiful

When I have the time to write

No responses yet