[Python] Iterate Over All Elements In BeautifulSoup!


Most of the time BeautifulSoup is used to parse HTML and it is really good at it. But occasionally instead of parsing HTML, you may need to iterate over it and do something with every element. For this purpose, there is an useful function recursiveChildGenerator which does the job.

For example, if You want to get text out of html content into a list, You cant do a get_text() method on soup object because it returns entire text into a string. But you can do this.
from bs4 import BeautifulSoup
from bs4.element import NavigableString

html = 'some text <p> more </p> <code> x = 2 </code> more text'
soup = BeautifulSoup(html)
text = [i for i in soup.recursiveChildGenerator() if type(i) == NavigableString]
This iterates over the given html and if a particular element is a string it adds it to list.



I am Chillar Anand. I daydream a lot and write about the things that interest me here. You can read more about this blog here.

See all articles

RSS Feed for the blog

Edit this page