Python Web Scrapping – Scrap Javascript-based websites

Eric Cheng — Sat, 14 Nov 2020 18:37:23 +0000

Python web scrapping is a important skill for data sciencists. Most websites are written in Javascript framework or making use of Javascript to generate their content. If we would like to scrap data from modern website, we have to know how to scrap data from Javascript-based websites.

There is no difficulty for the code. We just need to aware that it is a Javascript-based website and choose appropriate tools to scrap this efficiently.

In this article, I would use Beautiful Soup + Selenium to scrap a simple javascript website and hope this can help you.

Prepare development environment

Install required libraries

You also have to install web drive for parsing Javascript in selenium.

https://pypi.org/project/selenium/

Download the web driver you want and place that driver to your PATH (i.e. /usr/bin, /usr/local/bin for linux and MacOS, $PATH for Windows)

Ready to code

Import required libraries

If we use the “requests” library, we only get the source code from the website, but not the generated content.

Output from codes above

If we use selenium with Chrome web driver, we can get the Javascript generated content.

Output from codes above

Then we can pass the rendered content to Beautiful Soup object and scrap data we want

Output from codes above

Hope this guide can help you to scrap web content from Javascript-based websites.

If you would like to learn Beautiful Soup and Selenium, you can find their documentations below or leave a comment here.

Both of them are powerful libraries for web scrapping and browser automation projects.

Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Selenium: https://www.selenium.dev/documentation/en/

Web Scrapping – Eric Cheng’s Home page

Python Web Scrapping – Scrap Javascript-based websites

Prepare development environment

Ready to code