Question

What is a good tool for PDF report generation in Python? I've checked out ReportLab, but it seems to be awfully low-level for what I want to do. My current hunch is to call TeX on the command-line and let it produce the PDF, but if there is something that is easier to work with, I'd very much like a prod in the right direction.

Answer 1

I'm a big fan of pod ^[1]. Design your report templates in OpenOffice Writer (or Microsoft Word + Sun's ODF plugin) and then combine it with your data in a simple and flexible way. Very abstract and of course just a few lines of code.

You can generate ODT documents this way with no external dependencies, for PDF generation you need to have OpenOffice running in server mode.

[1] http://appyframework.org/pod.html

Answer 2

Take a look at Sphinx ^[1]. A lot of Python projects are starting to use Sphinx, including Python itself. You type your documentation in reStructuredText, and get good-looking HTML and PDF output. Now that Matplotlib ^[2] is using Sphinx, it even has a TeX-like equation formatting engine; see this pdf file ^[3] for some more information.

[1] http://sphinx.pocoo.org/
[2] http://matplotlib.sourceforge.net
[3] http://new.scipy.org/proceedings/SciPy2008/paper_6/full_text.pdf

Answer 3

When you looked at ReportLab, did you check out the Platypus section? It's really very easy to use (Platypus is high-level, whereas pdfgen is fairly low-level). There's a good "Hello World" example in the developer's FAQ ^[1].

[1] http://www.reportlab.com/software/opensource/rl-toolkit/faq/#2.3.2

Answer 4

I use pisa ^[1] to generate PDF's from html files (which in turn get generated using xslt). It is very easy to use, but the official site is online last time i checked.

[1] http://pypi.python.org/pypi/pisa/3.0.27

Answer 5

I would second mmaibaum's suggestion of generating HTML. It, along with CSS, will allow for much better positioning and layout. You can then use an HTML->PDF engine, such as PrinceXML ^[1] (not free, but the output is amazing... actually there is a free version but it will put a PrinceXML logo on at least one of the pages) or an XML/XHTML->XSL-FO->PDF engine, such as CSSToXSLFO ^[2]. This second option offers a bit more flexability, but you'll still need to choose an XSL-FO processor to turn this intermediate output into a PDF. Fop ^[3] from the Apache project is a free one, but I can't vouch for how good the output is.

[1] http://princexml.com/
[2] http://www.re.be/css2xslfo/
[3] http://xmlgraphics.apache.org/fop/

Answer 6

Is it feasible to generate your report content as reStructuredText ^[1]? If so, check out the rst2pdf ^[2] project.

Disclaimer: I have not used rst2pdf myself.

[1] http://docutils.sourceforge.net/rst.html
[2] http://code.google.com/p/rst2pdf/

Answer 7

The only reason not to use LaTeX in this layer is that the installation is large and unwieldy, particularly on Windows. You are not going to get a reporting engine without either having a formatting system or working in low level graphics primitives.

If you want a higher level formatting toolkit that's a bit more lightweight than LaTeX you might look at Lout ^[1].

[1] http://en.wikipedia.org/wiki/Lout_%28software%29

Answer 8

You could let Python create your report in HTML, and then use wkhtmltopdf ^[1] to render it into a PDF file.

Update: I'm now using the python xhtml2pdf ^[2] module to convert from html to pdf, so maybe that's also a solution. pip install xhtml2pdf. But it may certainly not be as performant as wkhtmltopdf.

Here's a script which inserts jpgs which reside in a folder into a generated pdf file, when that folder is dragged and dropped onto that script.

# -*- coding: latin-1 -*- 

print '-----------------------------------------------------'

import os
import sys
import glob
import jinja2
import xhtml2pdf.pisa
xhtml2pdf.pisa.showLogging()
import urllib
from PIL import Image

cwd = os.path.dirname(__file__)
os.chdir(cwd.encode('latin-1'))
print os.getcwd()


quality = 85
size = 720, 5000

directory = sys.argv[1].decode('latin-1')
globbed = glob.glob(directory + os.sep + "*.jpg")

try:
  images = []
  for path in globbed:
    newpath = path[:-3] + u'scaled.' + str(quality) + u'.jpg'
    im = Image.open(path)
    im.thumbnail(size, Image.ANTIALIAS)
    im.save(newpath, 'JPEG', quality=quality)
    path = path.replace('\\', '/')
    path = path[:-3] + u'scaled.' + unicode(quality) + u'.jpg'
    quoted = urllib.quote( path.encode('latin-1') , ':/')
    images.append('file:///' + quoted)
  template = jinja2.Environment(loader=jinja2.FileSystemLoader('')).get_template('template.html')
  content = template.render(images=images, directory=directory)

  pdf_filename = directory + ' - ' + str(size[0]) + '.' + str(quality) + '.pdf'
  pdf = xhtml2pdf.pisa.CreatePDF(content, file(pdf_filename, 'wb'), encoding="latin-1")

  if not pdf.err:
    xhtml2pdf.pisa.startViewer(pdf_filename)

except:
  import traceback
  traceback.print_exc()

for path in globbed:
  try:
    os.remove(path[:-3] + u'scaled.' + str(quality) + u'.jpg')
  except:
    pass

And the template.html looks like this

<style>
  body, img, p { }
  h1 {
    -pdf-outline: true;
    -pdf-level: 0;
    -pdf-open: true;
  }
  p {
    -pdf-outline: true;
  }
  div {
    -pdf-outline: false;
    -pdf-level: 0;
    -pdf-open: true;
  }
  img {
    page-break-after:always;
    -pdf-outline: false;
  }
  @page {
    @frame {
      margin:0cm;
      margin-top:1cm;
    }
  } 
</style>
<center>
  <p>
    <!--h1 style="font-size:150%;">{{directory}}</h1-->
    <div>
      {% for image in images %}<img src="{{image}}" /><div style="page-break-after:always;-pdf-page-break:always;"></div>
      {% endfor %}
    </div>
  </p>
</center>

[1] http://code.google.com/p/wkhtmltopdf/
[2] https://pypi.python.org/pypi/xhtml2pdf/

Answer 9

If you don't like ReportLab I would suggest generating HTML - there are dozens of ways to do this and converting to PDF for final output (html2pdf for example).

Answer 10

You can create PDFs easily (in my opinion) with just a Cairo binding.

Sure, it is low level: you don't have a GUI form editor and need to calculate coordinates by hand. But it also is extremely lightweight and direct; you have absolute control. Doing it via HTML does not give you that.

The outcome looks great and is very tiny file size wise.

Answer 11

I worked on a system years ago that used ReportLab. It really wasn't too bad. All of our reports were pretty much of the same style so I created a base class that handled most of the formatting. The only thing sub classes had to do was set some properties and hand the data over to the base class. It worked out pretty well. After I did the ground work another programmer was able to come behind me and bang out a couple dozen reports in a couple weeks with no prior Python experience. So if most of your reports fit one or two formats using ReportLab should just require some up front work and then the rest is drudgery.

Answer 12

If you like ReportLab but think it's too low-level, try trml2pdf ^[1] (or here ^[2]). You write your report in RML (Report Markup Language) as described here ^[3] with something like ElementTree and then use trml2pdf to convert it. You can use it from the command line or in a python script with something like the following:

open('test.pdf', 'w').write(trml2pdf.parseString(xmlstring))

I have found this to be the best way to go even though not all of the RML tags are supported by this open-source library.

If you are using this for Django you can simply mark up an .rml file with the template system and then render it.

[1] http://packages.debian.org/squeeze/python-trml2pdf
[2] http://ftp.debian.org/debian/pool/main/p/python-trml2pdf/
[3] http://www.reportlab.com/docs/rml2pdf-userguide.pdf

Answer 13

I recommend you take another look at ReportLab. It has high-level constructs for page layout and "flowing" text, images, frames etc. It also has a nice styles system and, most importantly, a nice python API. Sure, it also provides access to low-level pdf, but you don't need to use this until you need it.

Of the other suggestions, they mostly suggest using some other format as an intermediary. This doesn't really help you. It just pushes the problem to finding a good library for writing the intermediate format.

I've also tried the Eclipse "BIRT" report generator as another OSS reporting tool but it was hard to learn, a resource hog and a pain to deploy. Python + ReportLab was way easier.

Answer 14

http://matplotlib.sourceforge.net/api/backend_pdf_api.html#matplotlib.backends.backend_pdf.PdfPages

There is an easy way to create multiple page PDF using matplotlib. If you need lots of plots, this would be quite useful.

from pylab import *
from matplotlib.backends.backend_pdf import PdfPages

pp = PdfPages('twopages.pdf')

x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)
pp.savefig()

figure()
plot(x, -y)
pp.savefig()

pp.close()

Answer 15

There is also PyReport ^[1], but that uses Latex for pdf-generation.

[1] http://gael-varoquaux.info/computers/pyreport/

Answer 16

My company's built an HTML to PDF API called DocRaptor that uses PrinceXML for PDF generation. We've got a couple of Python examples in our documentation, as well.

DocRaptor ^[1]

DocRaptor Python examples ^[2]

[1] https://docraptor.com
[2] https://docraptor.com/documentation#python_example

Answer 17

using phantomjs resterize.js easily you can generate the pdf files ... here advantage is it will wait until ajax calls loaded ... remaining are bit difficult to implement