Python PDFKit Module: Convert HTML, URL, and Text to PDFs

Did you know that generating PDFs using Python is so easy and simple? This is possible because of the PDFkit module in Python. We know that PDF ensures documents look the same on any device, making it a widely used and reliable format for sharing information. Hence, conversion into PDF is very necessary. So, in this article, let us explore how we can convert webpages, URLs, and textual formats into a PDF format.

Installing and Setting Up PDFKit in Python

Before using the module, we need to install it on our systems. To achieve the same we can use the following command:

pip install pdfkit
Installation

Along with the pdfkit module, we need to install wkhtmltopdf, a free tool that converts HTML to PDF and various image formats using the Qt WebKit rendering engine.

To install wkhtmltopdf in Windows:

  • Click here to download
  • Set the PATH variable to include the binary folder in Environment variables, and remember to add the ‘wkhtmltopdf’ path to avoid errors.
Environment Variable

To install wkhtmltopdf in Ubuntu follow the given command:

sudo apt-get install wkhtmltopdf

To install wkhtmltopdf in macOS follow the given command:

brew install homebrew/cask/wkhtmltopdf

Converting to PDF with PDFKit in Python

Now that we are done with our setup, let’s see what types of PDF conversions are possible:

3 ways of conversion

Let’s discuss each of the conversions one by one with code examples.

1. Converting HTML to PDF

We use the ‘.from_file’ to convert a file (HTML) format to PDF. Converting HTML to PDF simplifies and ensures easy access, making your document viewable on any platform with consistency.

Example Code:

//index.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Demo</title>
</head>
<body>
    <h1>Python Programming Language</h1>
    <p>Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured, object-oriented and functional programming.</p>
    
</body>
</html>
import pdfkit

config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')

pdfkit.from_file('index.html', 'converted.pdf', configuration=config)

Working:

  • The code imports the pdfkit library for PDF conversion in Python.
  • It sets the configuration for wkhtmltopdf.exe by specifying its path with double backslashes.
  • It converts the content of ‘index.html‘ into a PDF named ‘converted.pdf‘ using the configured settings.

Output:

A new PDF file is created in the same directory once we run the program.

Fs Before
File structure before
Fs After
File structure after
Converted Pdf
converted.pdf

2. Converting URL to PDF

We use the ‘.from_url’ to convert a URL or web page to PDF. Turning a URL into a PDF helps save web content for offline use, ensures consistent formatting, and makes sharing information straightforward.

Example Code:

import pdfkit

config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')

pdfkit.from_url('https://www.visitgreece.gr/islands/cyclades/santorini/', 'converted2.pdf', configuration=config)

Working:

  • The code uses the pdfkit library for working with PDFs in Python.
  • It sets the path to the ‘wkhtmltopdf.exe‘ executable in the pdfkit configuration.
  • It converts the content from the specified URL (‘https://www.visitgreece.gr/islands/cyclades/santorini/‘) into a PDF named ‘converted2.pdf’ using the specified configuration.

Output:

The program creates a new PDF file in the current directory when it runs.

After Conversion
Newly created PDF file
Converted Pdf
converted2.pdf

3. Converting Text to PDF

We use the ‘.from_string’ to convert a string (Textual) format to PDF. Turning Text into PDF ensures easy sharing and consistent viewing for universal accessibility.

Example Code:

import pdfkit

config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')

pdfkit.from_file('Hello and welcome to CodeForGeek', 'converted3.pdf', configuration=config)

Working:

  • The code uses the pdfkit library to perform PDF operations in Python.
  • It sets up the path for the ‘wkhtmltopdf.exe‘ executable in the pdfkit configuration.
  • It converts the content “Hello and welcome to CodeForGeek” into a PDF named ‘converted3.pdf‘ using the specified configuration.

Output:

When the program runs, a new PDF file is generated in the same directory.

After Conversion
Newly created PDF file
Converted Pdf
converted3.pdf

Conclusion

So, that’s it for this article. I hope you are clear about the three types of PDF conversions using the Python PDFKit module. It helps us convert webpages to PDFs, ensuring offline access with preserved layout and content integrity. Additionally, it transforms text content into PDFs, facilitating easy sharing, printing, and standardized document presentation. Furthermore, it allows us to save entire websites as PDFs, providing comprehensive snapshots for archiving, reference, and offline viewing.

Further Reading:

Reference

https://stackoverflow.com/questions/62438296/creating-pdf-file-with-python-using-pdfkit

Snigdha Keshariya
Snigdha Keshariya
Articles: 104