Did you know that generating PDFs using Python is so easy and simple? This is possible because of the PDFkit module in Python. We know that PDF ensures documents look the same on any device, making it a widely used and reliable format for sharing information. Hence, conversion into PDF is very necessary. So, in this article, let us explore how we can convert webpages, URLs, and textual formats into a PDF format.
Installing and Setting Up PDFKit in Python
Before using the module, we need to install it on our systems. To achieve the same we can use the following command:
pip install pdfkit
Along with the pdfkit module, we need to install wkhtmltopdf, a free tool that converts HTML to PDF and various image formats using the Qt WebKit rendering engine.
To install wkhtmltopdf in Windows:
- Click here to download
- Set the PATH variable to include the binary folder in Environment variables, and remember to add the ‘wkhtmltopdf’ path to avoid errors.
To install wkhtmltopdf in Ubuntu follow the given command:
sudo apt-get install wkhtmltopdf
To install wkhtmltopdf in macOS follow the given command:
brew install homebrew/cask/wkhtmltopdf
Converting to PDF with PDFKit in Python
Now that we are done with our setup, let’s see what types of PDF conversions are possible:
Let’s discuss each of the conversions one by one with code examples.
1. Converting HTML to PDF
We use the ‘.from_file’ to convert a file (HTML) format to PDF. Converting HTML to PDF simplifies and ensures easy access, making your document viewable on any platform with consistency.
Example Code:
//index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Demo</title>
</head>
<body>
<h1>Python Programming Language</h1>
<p>Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured, object-oriented and functional programming.</p>
</body>
</html>
import pdfkit
config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')
pdfkit.from_file('index.html', 'converted.pdf', configuration=config)
Working:
- The code imports the pdfkit library for PDF conversion in Python.
- It sets the configuration for wkhtmltopdf.exe by specifying its path with double backslashes.
- It converts the content of ‘index.html‘ into a PDF named ‘converted.pdf‘ using the configured settings.
Output:
A new PDF file is created in the same directory once we run the program.
2. Converting URL to PDF
We use the ‘.from_url’ to convert a URL or web page to PDF. Turning a URL into a PDF helps save web content for offline use, ensures consistent formatting, and makes sharing information straightforward.
Example Code:
import pdfkit
config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')
pdfkit.from_url('https://www.visitgreece.gr/islands/cyclades/santorini/', 'converted2.pdf', configuration=config)
Working:
- The code uses the pdfkit library for working with PDFs in Python.
- It sets the path to the ‘wkhtmltopdf.exe‘ executable in the pdfkit configuration.
- It converts the content from the specified URL (‘https://www.visitgreece.gr/islands/cyclades/santorini/‘) into a PDF named ‘converted2.pdf’ using the specified configuration.
Output:
The program creates a new PDF file in the current directory when it runs.
3. Converting Text to PDF
We use the ‘.from_string’ to convert a string (Textual) format to PDF. Turning Text into PDF ensures easy sharing and consistent viewing for universal accessibility.
Example Code:
import pdfkit
config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')
pdfkit.from_file('Hello and welcome to CodeForGeek', 'converted3.pdf', configuration=config)
Working:
- The code uses the pdfkit library to perform PDF operations in Python.
- It sets up the path for the ‘wkhtmltopdf.exe‘ executable in the pdfkit configuration.
- It converts the content “Hello and welcome to CodeForGeek” into a PDF named ‘converted3.pdf‘ using the specified configuration.
Output:
When the program runs, a new PDF file is generated in the same directory.
Conclusion
So, that’s it for this article. I hope you are clear about the three types of PDF conversions using the Python PDFKit module. It helps us convert webpages to PDFs, ensuring offline access with preserved layout and content integrity. Additionally, it transforms text content into PDFs, facilitating easy sharing, printing, and standardized document presentation. Furthermore, it allows us to save entire websites as PDFs, providing comprehensive snapshots for archiving, reference, and offline viewing.
Further Reading:
Reference
https://stackoverflow.com/questions/62438296/creating-pdf-file-with-python-using-pdfkit