Coming Up for Air

Converting Many Images to One PDF

Wednesday, September 05, 2012 |

I recently had the need to convert several scanned images into one multi-page PDF. While there are probably tools to help do this manually, I knew that there was a good chance I’d have to do something like this again, quite possibly with a large number of images, so I did what any good geek would do: I scripted it. In this entry, I’ll show how I went about that.

For starters, let’s take a look at the very small, simple Python script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/python

import os,sys, PythonMagick
from pyPdf import PdfFileReader,PdfFileWriter

if  not ((len(sys.argv) > 2) and sys.argv[1].endswith('.pdf')):
    print "usage: images_to_pdf.py <finalname.pdf> <image1.pdf> <imagen.pdf>"
else:
    final_name = sys.argv[1]
    merged = PdfFileWriter()

    for file in sys.argv[2:]:
        print "Processing %s..." % (file)
        img = PythonMagick.Image()
        img.read(file)
        img.write('temp.pdf')
        pdf = PdfFileReader(open('temp.pdf'))
        for page in pdf.pages:
            merged.addPage(page)
        os.remove('temp.pdf')

    merged_file = open(final_name, mode='wb')
    merged.write(merged_file)
    merged_file.close()

There’s not a lot to it, thanks in large part to PythonMagick and pyPDF. This script takes at least two parameters: the final name of the PDF, and at least on image file. The bulk of the work flow is this:

  • Create a PdfFileWriter object. This handles the heavy lifting in actually writing the PDF

  • Iterate over the image file names given

    • Create an Image object and read the image source into it

    • Write the image to a temporary PDF file. This implicitly converts the image to a PDF.

    • Read the temporary PDF into memory via PdfFileReader

    • For each page in the temporary PDF (which should be exactly 1), add it to the real, final PDF

    • Delete the temporary PDF

  • Write the newly constructed PDF to disk and exit

It’s very simple, and pretty dumb (I added only enough error checking to make it work for me ;), and it may be a suboptimal use of the APIs, but it works pretty well for me. Hopefully, it will help someone else out.

Search

    Quotes

    Sample quote

    Quote source

    About

    My name is Jason Lee. I am a software developer living in the middle of Oklahoma. I’ve been a professional developer since 1997, using a variety of languages, including Java, Javascript, PHP, Python, Delphi, and even a bit of C#. I currently work for Red Hat on the WildFly/EAP team, where, among other things, I maintain integrations for some MicroProfile specs, OpenTelemetry, Micrometer, Jakarta Faces, and Bean Validation. (Full resume here. LinkedIn profile)

    I am the president of the Oklahoma City JUG, and an occasional speaker at the JUG and a variety of technical conferences.

    On the personal side, I’m active in my church, and enjoy bass guitar, running, fishing, and a variety of martial arts. I’m also married to a beautiful woman, and have two boys, who, thankfully, look like their mother.

    My Links

    Publications