In this tutorial, You will learn to install Beautiful Soup and parse any web page you like.
Python & Beautiful Soup Installation On Ubuntu 12.04:
1. Open your terminal ( Alt + Ctrl + T ). Install Python & Soup by using these commands.
sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python2.7
sudo apt-get install python-bs4
Scraping A Web Page: Lets start our programm by importing Beautiful soup. Since we are going to open a web page, we need urllib2 ( This is for Python 2, For Python 3 see urllib.request ). So import that library also.
Select any url to parse. I have selected the home page of this blog and opened that page with urlopen().
Pass the web page contents of 'page' variable to beautiful soup.
Lets print all the links which are present in this page.So, the final code is
Go ahead and run this file. I have name the file as scrap_web_page.py
from bs4 import BeautifulSoup
url = "https://www.goodreads.com/quotes/tag/love"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for anchor in soup.find_all('a'):
It will produce all the links in the terminal itself. I have added a screenshot of the output I have received.
anchor.get() is just one method to get all links, you can grab any element or class or name or anything from the page. For complete details, read the documentation of Beautiful Soup.