Wikipedia Module in Python: An In-Depth Guide

Welcome to a new tutorial, here we will learn about the Wikipedia Module in Python. We will see how we can achieve Data scraping using the Wikipedia API. Data scraping is the automated extraction of information from websites or other sources on the internet. Let’s see how we can use the most informative site on the internet i.e. Wikipedia in our Python applications.

An Introduction to Wikipedia Module in Python

Wikipedia is a big online encyclopedia where people can work together to write and edit articles on many subjects. It’s a widely used reference site available in multiple languages. Wikipedia API is a Python tool that makes it easy to work with Wikipedia using code. It helps you find articles, get content and summaries, and access different details about Wikipedia entries. This tool lets you include Wikipedia information in your Python programs for various uses. Let’s learn about its installation and import.

Installation Statement

To get data from Wikipedia, start by installing the Wikipedia library. It wraps up the official Wikipedia API. Use the command below in your command prompt or terminal to install it:

pip install wikipedia
Installation

After installing, we can utilize the Wikipedia API in Python to gather information from Wikipedia. To access the methods of the Wikipedia module, simply import it using the following command:

import wikipedia

Getting Started with Wikipedia Module

Let’s now look at different use cases of what we can do using this Wikipedia module.

Getting Wikipedia Articles Summary

Now, we will see how to use the Wikipedia module in Python. Let’s start with the basics. We can use the summary() method to extract a Wikipedia article’s summary in Python. We provide the article title as a parameter to this method, and it returns a specified number of sentences for the given title. To limit the stored data, you can include the desired number of sentences as a parameter, as shown in the following code.

Example:

import wikipedia

Title = "A. P. J. Abdul Kalam"

# Extract the summary with a specified number of sentences
Summary = wikipedia.summary(Title, sentences=5)
print("According to wikipedia : ")
print(Summary)

Output:

Summary
Summary of the given title

The summary for the given title will be printed in a specified number of sentences i.e. 5.

However, it is very important to note that the title provided must match the exact wording of the Wikipedia page’s title. If not, it will throw a disambiguation error, which means that the page does not exist. For example, if the title were:

Title = "Dr APJ Abdul Kalam"
Summary 2

The output will display a disambiguation error and say that the given title does not match any pages.

If there are many different articles with different meanings for the same title word, for example, ‘com,’ where the page is disambiguated, it will show a disambiguation error.

Title = "com"
Summary 3

Many results match the title ‘com’. Suppose we want to summarize ‘Center of mass’, then, we need to specify it in the title to get accurate results, like:

Title = "Center of mass"
Summary 4

Now, with a more specific query, the output displays the accurate summary.

Customizing the Page Language

The set_lang function in the Python Wikipedia module is used to choose the language for future queries. You can specify the Wikipedia edition’s language from which you want to get information.

Example:

import wikipedia
# Set the language to Hindi
wikipedia.set_lang("hi")

Summary = wikipedia.summary("Tiger")
print("According to wikipedia : ")
print(Summary)

Output:

Set Lang

In this example, using wikipedia.set_lang(“hi”) sets the language to Hindi. Afterwards, any queries with the Wikipedia module will get information from the Hindi edition. The summary function is then used to fetch a summary for the Hindi Wikipedia page titled “Tiger.”

Getting Wikipedia Page Data

We utilize the page function to obtain an entire Wikipedia page by providing the page title as a parameter. To extract specific information from the page object, we specify the exact details needed. The page function enables us to retrieve contents, categories, coordinates, images, links, and other metadata from a Wikipedia page. Let’s see the use of each page object one by one.

1) .content

When we use the page function, we retrieve the main content of a Wikipedia page using the .content attribute. Keep in mind that this content may include not only the main text but also sections, references, and other information from the page.

Example:

import wikipedia

Title = "William Shakespeare"

Content = wikipedia.page(Title).content
print("According to wikipedia : ")
print(Content)

Output:

Content

2) .url

If you wish to obtain the URL of the given page, you can use the .url attribute to fetch and display it.

Example:

import wikipedia

Title = "Walt Disney World"

URL = wikipedia.page(Title).url
print("According to wikipedia : ")
print(URL)

Output:

URL

3) .references

When we use the page function, employing the .references attribute is intended to retrieve the reference links or citations from a Wikipedia page.

Example:

import wikipedia

Title = "International Women's Day"

References = wikipedia.page(Title).references
print("According to wikipedia : ")
print(References)

Output:

References

In this example, you have a list of URLs or identifiers representing the references or citations from the Wikipedia page for “International Women’s Day.” This information is helpful if you want to analyze or display the sources used in creating the Wikipedia page content.

4) .links

The .links attribute is used to retrieve a list of links present on a Wikipedia page.

Example:

import wikipedia

Title = "Santorini"

Connected_links = wikipedia.page(Title).links
print("According to wikipedia : ")
print(Connected_links)

Output:

Links

In this example, using wikipedia.page(Title).links gives you a list of links from the Wikipedia page for “Santorini.” Each element in the list represents a link found on the page. This information is helpful if you want to extract and analyze the links within the Wikipedia page or explore related topics. Note that the list may include different types of links, like internal links to other Wikipedia pages, external links, and references.

5) .categories

The .categories attribute is used to get a list of categories to which a Wikipedia page belongs.

Example:

import wikipedia

Title = "Hill Forts of Rajasthan"

Belonged_categories = wikipedia.page(Title).categories
print("According to wikipedia : ")
print(Belonged_categories)

Output:

Categories

In this example, using wikipedia.page(Title).categories gives you a list of categories related to the Wikipedia page for “Hill Forts of Rajasthan.” Each element in the list represents a category to which the page belongs. This is useful if you want to categorize Wikipedia pages based on their topics. Remember, the list of categories reflects how the Wikipedia community has organized and tagged the page’s content.

Getting a Random Wikipedia Page

The random method in Python’s Wikipedia module is used to get a random Wikipedia page. When you use wikipedia.random(), it gives you the title of a randomly chosen Wikipedia page, letting you explore various topics.

Example:

import wikipedia

Random = wikipedia.random()

Title = wikipedia.page(Random).title
Summary = wikipedia.summary(Random)
print("According to wikipedia : ")
print(Title)
print(Summary)

The code randomly selects a Wikipedia page title using the wikipedia module, retrieves the title and a summary of the corresponding Wikipedia page, and prints this information to the console. It provides a quick way to explore diverse topics from Wikipedia.

Output:

Random

The above result shows that the program has randomly selected the topic “Stielgranate 41” and has displayed the summary for the same.

Getting a List of Titles

The search method in Python’s Wikipedia module is used for searching on Wikipedia and getting a list of titles that match the query. It helps find Wikipedia pages related to a specific topic.

Example:

import wikipedia

Query = wikipedia.search("Rajasthan")

print("Search results:")
for result in Query:
    print(result)

Output:

Search

In this example, using wikipedia.search(“Rajasthan”) gives a list of Wikipedia page titles related to “Rajasthan.” This is helpful when you want to find relevant Wikipedia pages on a specific topic. The obtained titles can then be used to get more detailed information about the corresponding pages.

Conclusion

And here we are at the end of this article. I hope that you are clear about the role of Wikipedia module in Python programming. Now you can confirm your understanding of the Wikipedia module by using it in your programs.

If you enjoyed reading this, be sure to check out some of our other articles –

Reference

https://stackoverflow.com/questions/63682231/using-wikipedia-module-in-python

Snigdha Keshariya
Snigdha Keshariya
Articles: 48