<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web Scrapping &#8211; Eric Cheng&#8217;s Home page</title>
	<atom:link href="https://techhkg.com/tag/web-scrapping/feed/" rel="self" type="application/rss+xml" />
	<link>https://techhkg.com</link>
	<description></description>
	<lastBuildDate>Tue, 24 Nov 2020 16:50:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Python Web Scrapping &#8211; Scrap Javascript-based websites</title>
		<link>https://techhkg.com/2020/11/15/python-web-scrapping-scrap-data-from-javascript-based-websites/</link>
					<comments>https://techhkg.com/2020/11/15/python-web-scrapping-scrap-data-from-javascript-based-websites/#respond</comments>
		
		<dc:creator><![CDATA[Eric Cheng]]></dc:creator>
		<pubDate>Sat, 14 Nov 2020 18:37:23 +0000</pubDate>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web Scrapping]]></category>
		<guid isPermaLink="false">https://techhkg.com/?p=13233</guid>

					<description><![CDATA[Python web scrapping is a important skill for data sciencists. Most websites are written in Javascript framework or making use of Javascript to generate their content. If we would like to scrap data from modern website, we have to know how to scrap data from Javascript-based websites. There is no difficulty for the code.  [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1216.8px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-1"><p class="p1">Python web scrapping is a important skill for data sciencists. Most websites are written in Javascript framework or making use of Javascript to generate their content. If we would like to scrap data from modern website, we have to know how to scrap data from Javascript-based websites.</p>
<p class="p1">There is no difficulty for the code. We just need to aware that it is a Javascript-based website and choose appropriate tools to scrap this efficiently.</p>
<p class="p1">In this article, I would use Beautiful Soup + Selenium to scrap a simple javascript website and hope this can help you.</p>
</div><div class="fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><p class="p1">Prepare development environment</p></h1></div><div class="fusion-text fusion-text-2"><p>Install required libraries</p>
</div><script src="https://gist.github.com/ericcheng201168/b4ceae4e02f9771bd562b9fa765752a1.js"></script><div class="fusion-text fusion-text-3"><p class="p1">You also have to install web drive for parsing Javascript in selenium.</p>
<p class="p2"><a href="https://pypi.org/project/selenium/">https://pypi.org/project/selenium/</a></p>
<p class="p1">Download the web driver you want and place that driver to your PATH (i.e. /usr/bin, /usr/local/bin for linux and MacOS, $PATH for Windows)</p>
</div><div class="fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><p class="p1">Ready to code</p></h1></div><div class="fusion-text fusion-text-4"><p class="p1">Import required libraries</p>
</div><script src="https://gist.github.com/ericcheng201168/5334209b4537937b02bba1ed912bad21.js"></script><div class="fusion-text fusion-text-5"><p>If we use the &#8220;requests&#8221; library, we only get the source code from the website, but not the generated content.</p>
</div><script src="https://gist.github.com/ericcheng201168/e5f7f07e06c77b1c657370f6e316c9d0.js"></script><div class="fusion-text fusion-text-6"><p>Output from codes above</p>
</div><script src="https://gist.github.com/ericcheng201168/6d1ace205fd373be822a22fdf2d64419.js"></script><div class="fusion-text fusion-text-7"><p>If we use selenium with Chrome web driver, we can get the Javascript generated content.</p>
</div><script src="https://gist.github.com/ericcheng201168/7d1e23012654f7d3d4d44b540d73808f.js"></script><div class="fusion-text fusion-text-8"><p>Output from codes above</p>
</div><script src="https://gist.github.com/ericcheng201168/e29ab528918193cb270f4a1521d65fcd.js"></script><div class="fusion-text fusion-text-9"><p>Then we can pass the rendered content to Beautiful Soup object and scrap data we want</p>
</div><script src="https://gist.github.com/ericcheng201168/c08b9c5e8ff1445ea3365dba71d026e3.js"></script><div class="fusion-text fusion-text-10"><p>Output from codes above</p>
</div><script src="https://gist.github.com/ericcheng201168/d4f5e9d5f9bad8cebc8273510858baca.js"></script><div class="fusion-text fusion-text-11"><p>Hope this guide can help you to scrap web content from Javascript-based websites.</p>
<p>If you would like to learn Beautiful Soup and Selenium, you can find their documentations below or leave a comment here.</p>
<p>Both of them are powerful libraries for web scrapping and browser automation projects.</p>
<p>Beautiful Soup: <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">https://www.crummy.com/software/BeautifulSoup/bs4/doc/</a></p>
<p>Selenium: <a href="https://www.selenium.dev/documentation/en/">https://www.selenium.dev/documentation/en/</a></p>
</div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://techhkg.com/2020/11/15/python-web-scrapping-scrap-data-from-javascript-based-websites/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
