Did you know that you can fetch data from any webpage around the world using Selenium in Python? Are you new to web scraping? Don’t worry! In this article, we will explore web scraping, its importance, and how to achieve it using Python with Selenium.
Web Scraping and Its Importance
Web scraping means getting information from websites. Imagine the internet is like a huge library full of information. Web scraping is like a robot that can go into this library, find specific books (or data) you’re interested in, and bring them back to you.
Now, why is this robot, or web scraping, important? Well, think about all the useful information available online: prices of products, weather forecasts, news articles, and much more. Web scraping helps us access and use this information in different ways. For example, businesses can scrape data to monitor their competitors’ prices, researchers can gather data for analysis, and developers can build applications that rely on up-to-date information from the web.
Selenium helps us control web browsers programmatically, allowing us to interact with websites just like a human would. This is quite helpful for web scraping because it lets us navigate through web pages, fill out forms, click buttons, and extract the data we need—all automatically! So, with the help of Selenium, web scraping becomes even more effective and efficient.
Setting Up for Web Scraping
Before we start extracting data from web pages, we need to install Selenium for use in our projects. To install Selenium on your computer, simply type ‘pip install selenium‘ in your command prompt or terminal. This command will automatically download and install Selenium for you.
pip install selenium
After you’ve put Selenium on your computer, you have what you need. But to make sure it works smoothly with your web browser, you’ll need some extra tools i.e. web drivers. Here are the links to download them:
Click here to learn more about the basics of Selenium
Fetching Data from Website Using Selenium
Data scraping becomes very easy when we use Selenium. Let’s try learning with an example. Suppose we need to fetch the data of the ‘List of highly paid models in the world’ and print it in our console. Let’s see how we can program this idea.
Step 1: First, we need to import Selenium so we can use it in our Python code.
from selenium import webdriver
Step 2: We need to set up a web driver for the browser we want to use. In this example, we’re using Chrome.
driver = webdriver.Chrome()
Step 3: Now, we tell the web driver to open the webpage from which we want to scrape data. For example, here we are using ‘lofficielusa.com‘.
driver.get('https://www.lofficielusa.com/fashion/highest-paid-models-in-the-world-kendall-jenner-gisele-bundchen')
Step 4: We use Selenium to find the HTML elements that contain the data we want. In this case, we’re looking for <figcaption> elements. You can use Chrome’s developer tools to inspect the HTML element from which you want to fetch data.
driver.get('https://www.lofficielusa.com/fashion/highest-paid-models-in-the-world-kendall-jenner-gisele-bundchen')
Step 5: Once we’ve found the elements, we extract the text from them.
models_list = [model_element.text for model_element in model_elements]
Step 6: Finally, we print the extracted data to the console.
print("Found model elements:")
for model_text in models_list:
print(model_text)
Step 7: After we’re done, we need to close the web driver to free up resources.
driver.quit()
Output: The result of this code would be a list of names and earnings of highly paid models, printed line by line in the console.
Summary
And yes, it’s the end! I hope you enjoyed this blog and found clarification on the concept of fetching data from the web using Selenium. We learned the basics and importance of web scraping and also saw an example where we fetched data from the web and stored it in Python lists. It was easy, wasn’t it? You can not only play with lists but also store data in any format you want. Feel free to experiment and try!
Also Read: 15 Useful Methods from Python OS Module
Reference
https://www.selenium.dev/documentation/webdriver/elements/information