share
Stack OverflowGood PDF report generator tool for Python
[+159] [17] I GIVE CRAP ANSWERS
[2008-10-07 09:50:39]
[ python pdf latex reporting pdf-generation ]
[ http://stackoverflow.com/questions/177799/good-pdf-report-generator-tool-for-python ] [DELETED]

What is a good tool for PDF report generation in Python? I've checked out ReportLab, but it seems to be awfully low-level for what I want to do. My current hunch is to call TeX on the command-line and let it produce the PDF, but if there is something that is easier to work with, I'd very much like a prod in the right direction.

[+47] [2008-10-07 11:04:13] Toni Ruža [ACCEPTED]

I'm a big fan of pod [1]. Design your report templates in OpenOffice Writer (or Microsoft Word + Sun's ODF plugin) and then combine it with your data in a simple and flexible way. Very abstract and of course just a few lines of code.

You can generate ODT documents this way with no external dependencies, for PDF generation you need to have OpenOffice running in server mode.

[1] http://appyframework.org/pod.html

(1) pod sample: stackoverflow.com/questions/16943597/… - danihp
Unfortunately it's GPL. - Anton Chikin
1
[+20] [2008-10-28 20:01:41] Jouni K. Seppänen

Take a look at Sphinx [1]. A lot of Python projects are starting to use Sphinx, including Python itself. You type your documentation in reStructuredText, and get good-looking HTML and PDF output. Now that Matplotlib [2] is using Sphinx, it even has a TeX-like equation formatting engine; see this pdf file [3] for some more information.

[1] http://sphinx.pocoo.org/
[2] http://matplotlib.sourceforge.net
[3] http://new.scipy.org/proceedings/SciPy2008/paper_6/full_text.pdf

Sphinx seems to use LaTex and he said that he prefer not to use LaTex - Davide
(5) My impression is that his problem with TeX is ease of use - and while Sphinx does use LaTeX as the intermediate format for producing PDF, it generates all of it from the reStructuredText sources, so the user doesn't have to deal with LaTeX directly. But, yeah, it would be nice to have a ReportLabs backend for Sphinx so there would be no intermediate format where things can go wrong. In principle it should not be too hard, since Sphinx can already generate nice HTML, with mathematical formulas thanks to the Matplotlib mathtext engine. - Jouni K. Seppänen
(1) FYI, the rst2pdf package now provides such a ReportLab builder for Sphinx - Kevin Horn
(3) The 'this pdf file' link seems to be broken. - Component 10
2
[+17] [2008-10-08 10:14:31] Tony Meyer

When you looked at ReportLab, did you check out the Platypus section? It's really very easy to use (Platypus is high-level, whereas pdfgen is fairly low-level). There's a good "Hello World" example in the developer's FAQ [1].

[1] http://www.reportlab.com/software/opensource/rl-toolkit/faq/#2.3.2

and for those looking - reportlab appears to work on dotcloud too. Django apps with pdf outputs? Looks good. - Danny Staple
3
[+14] [2008-10-07 09:58:32] TheCowSaysMoo

I use pisa [1] to generate PDF's from html files (which in turn get generated using xslt). It is very easy to use, but the official site is online last time i checked.

[1] http://pypi.python.org/pypi/pisa/3.0.27

pisa is based on Reportlab. Demo site at htmltopdf.org/demo - gimel
This is really nice and convenient. For Hardy users, you cannot install the latest version, due to missing dependences (if you install everything else from Hardy repositories). Just use 3.0.24 which you can download and install with: sudo easy_install pypi.python.org/packages/2.5/p/pisa/… - Davide
Pisa is now named xhtml2pdf. website is now xhtml2pdf.com (and pypi.python.org/pypi/xhtml2pdf on pypi) - Pierre H.
gimel: the links seems to be broken - Odin
4
[+6] [2008-10-07 14:45:49] technomalogical

I would second mmaibaum's suggestion of generating HTML. It, along with CSS, will allow for much better positioning and layout. You can then use an HTML->PDF engine, such as PrinceXML [1] (not free, but the output is amazing... actually there is a free version but it will put a PrinceXML logo on at least one of the pages) or an XML/XHTML->XSL-FO->PDF engine, such as CSSToXSLFO [2]. This second option offers a bit more flexability, but you'll still need to choose an XSL-FO processor to turn this intermediate output into a PDF. Fop [3] from the Apache project is a free one, but I can't vouch for how good the output is.

[1] http://princexml.com/
[2] http://www.re.be/css2xslfo/
[3] http://xmlgraphics.apache.org/fop/

5
[+4] [2008-10-07 11:08:41] codeape

Is it feasible to generate your report content as reStructuredText [1]? If so, check out the rst2pdf [2] project.

Disclaimer: I have not used rst2pdf myself.

[1] http://docutils.sourceforge.net/rst.html
[2] http://code.google.com/p/rst2pdf/

I also like reST, but the general work flow is reST -> LaTeX -> PDF. Though it looks like we're trying to avoid a LaTeX intermediary. - Aaron Maenpaa
I have used rst2pdf, and it does cut out the LaTeX step (although the output won't look quite as good as output going through LaTeX.) Probably would be ok for basic reports, but tables and layout aren't really the strong suit of reST. - technomalogical
6
[+4] [2008-10-07 14:59:27] ConcernedOfTunbridgeWells

The only reason not to use LaTeX in this layer is that the installation is large and unwieldy, particularly on Windows. You are not going to get a reporting engine without either having a formatting system or working in low level graphics primitives.

If you want a higher level formatting toolkit that's a bit more lightweight than LaTeX you might look at Lout [1].

[1] http://en.wikipedia.org/wiki/Lout_%28software%29

(4) LaTeX instalations do not have to be large. The full tex-life is large, but I used LateX on a PC-AT with a 2M-Byte ram-disk, and used it from there. 20 years later it has grown of course, but you can eliminate mos of what ou don't need - Stephan Eggermont
7
[+4] [2012-08-16 10:51:46] Daniel F

You could let Python create your report in HTML, and then use wkhtmltopdf [1] to render it into a PDF file.

Update: I'm now using the python xhtml2pdf [2] module to convert from html to pdf, so maybe that's also a solution. pip install xhtml2pdf. But it may certainly not be as performant as wkhtmltopdf.

Here's a script which inserts jpgs which reside in a folder into a generated pdf file, when that folder is dragged and dropped onto that script.

# -*- coding: latin-1 -*- 

print '-----------------------------------------------------'

import os
import sys
import glob
import jinja2
import xhtml2pdf.pisa
xhtml2pdf.pisa.showLogging()
import urllib
from PIL import Image

cwd = os.path.dirname(__file__)
os.chdir(cwd.encode('latin-1'))
print os.getcwd()


quality = 85
size = 720, 5000

directory = sys.argv[1].decode('latin-1')
globbed = glob.glob(directory + os.sep + "*.jpg")

try:
  images = []
  for path in globbed:
    newpath = path[:-3] + u'scaled.' + str(quality) + u'.jpg'
    im = Image.open(path)
    im.thumbnail(size, Image.ANTIALIAS)
    im.save(newpath, 'JPEG', quality=quality)
    path = path.replace('\\', '/')
    path = path[:-3] + u'scaled.' + unicode(quality) + u'.jpg'
    quoted = urllib.quote( path.encode('latin-1') , ':/')
    images.append('file:///' + quoted)
  template = jinja2.Environment(loader=jinja2.FileSystemLoader('')).get_template('template.html')
  content = template.render(images=images, directory=directory)

  pdf_filename = directory + ' - ' + str(size[0]) + '.' + str(quality) + '.pdf'
  pdf = xhtml2pdf.pisa.CreatePDF(content, file(pdf_filename, 'wb'), encoding="latin-1")

  if not pdf.err:
    xhtml2pdf.pisa.startViewer(pdf_filename)

except:
  import traceback
  traceback.print_exc()

for path in globbed:
  try:
    os.remove(path[:-3] + u'scaled.' + str(quality) + u'.jpg')
  except:
    pass

And the template.html looks like this

<style>
  body, img, p { }
  h1 {
    -pdf-outline: true;
    -pdf-level: 0;
    -pdf-open: true;
  }
  p {
    -pdf-outline: true;
  }
  div {
    -pdf-outline: false;
    -pdf-level: 0;
    -pdf-open: true;
  }
  img {
    page-break-after:always;
    -pdf-outline: false;
  }
  @page {
    @frame {
      margin:0cm;
      margin-top:1cm;
    }
  } 
</style>
<center>
  <p>
    <!--h1 style="font-size:150%;">{{directory}}</h1-->
    <div>
      {% for image in images %}<img src="{{image}}" /><div style="page-break-after:always;-pdf-page-break:always;"></div>
      {% endfor %}
    </div>
  </p>
</center>
[1] http://code.google.com/p/wkhtmltopdf/
[2] https://pypi.python.org/pypi/xhtml2pdf/

this works great with django-wkhtmltopdf. thanks for the tip - Hassek
8
[+3] [2008-10-07 10:00:10] mmaibaum

If you don't like ReportLab I would suggest generating HTML - there are dozens of ways to do this and converting to PDF for final output (html2pdf for example).


9
[+3] [2008-10-07 10:04:39] akauppi

You can create PDFs easily (in my opinion) with just a Cairo binding.

Sure, it is low level: you don't have a GUI form editor and need to calculate coordinates by hand. But it also is extremely lightweight and direct; you have absolute control. Doing it via HTML does not give you that.

The outcome looks great and is very tiny file size wise.


I'm also interested in this question. Any chance of further details or a link regarding Cairo binding? I believe this is related to Lua or are you referring to something else. Thanks! - Jarod Elliott
Sorry my initial search results pointed at Lua but Cairo seems to be a graphics library. Assuming i've found the right thing (cairographics.org) it looks promising. +1 - Jarod Elliott
I've also had good results with cairo, I've used it plus Cairo Plot to do plotting. launchpad.net/cairoplot - Aaron Maenpaa
I have nothing against cairo but this doesn't really answer the question for higher level report generation. - Toni Ruža
To Jarod: I'm indeed using Lua but thought not to mention, since it's a Python thread. :) The binding I use is Lua oocairo (daizucms.org/lua/library/oocairo) , but there are -unfortunately- others. For Python, I think the situation is less fragmented cairographics.org/pycairo - akauppi
To Toni: I know, and I did mention in my comment that the solution is low level. Personally, I'd like to have a programmable PDF form editor where field values can be thrown in. But then again for my needs Cairo+Lua was enough. - akauppi
10
[+3] [2008-10-07 14:55:29] Sam Corder

I worked on a system years ago that used ReportLab. It really wasn't too bad. All of our reports were pretty much of the same style so I created a base class that handled most of the formatting. The only thing sub classes had to do was set some properties and hand the data over to the base class. It worked out pretty well. After I did the ground work another programmer was able to come behind me and bang out a couple dozen reports in a couple weeks with no prior Python experience. So if most of your reports fit one or two formats using ReportLab should just require some up front work and then the rest is drudgery.


11
[+3] [2011-06-10 07:00:52] freb

If you like ReportLab but think it's too low-level, try trml2pdf [1] (or here [2]). You write your report in RML (Report Markup Language) as described here [3] with something like ElementTree and then use trml2pdf to convert it. You can use it from the command line or in a python script with something like the following:

open('test.pdf', 'w').write(trml2pdf.parseString(xmlstring))

I have found this to be the best way to go even though not all of the RML tags are supported by this open-source library.

If you are using this for Django you can simply mark up an .rml file with the template system and then render it.

[1] http://packages.debian.org/squeeze/python-trml2pdf
[2] http://ftp.debian.org/debian/pool/main/p/python-trml2pdf/
[3] http://www.reportlab.com/docs/rml2pdf-userguide.pdf

12
[+2] [2010-04-29 12:39:25] bc.

I recommend you take another look at ReportLab. It has high-level constructs for page layout and "flowing" text, images, frames etc. It also has a nice styles system and, most importantly, a nice python API. Sure, it also provides access to low-level pdf, but you don't need to use this until you need it.

Of the other suggestions, they mostly suggest using some other format as an intermediary. This doesn't really help you. It just pushes the problem to finding a good library for writing the intermediate format.

I've also tried the Eclipse "BIRT" report generator as another OSS reporting tool but it was hard to learn, a resource hog and a pain to deploy. Python + ReportLab was way easier.


13
[+2] [2011-05-11 13:49:36] otterb

http://matplotlib.sourceforge.net/api/backend_pdf_api.html#matplotlib.backends.backend_pdf.PdfPages

There is an easy way to create multiple page PDF using matplotlib. If you need lots of plots, this would be quite useful.

from pylab import *
from matplotlib.backends.backend_pdf import PdfPages

pp = PdfPages('twopages.pdf')

x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)
pp.savefig()

figure()
plot(x, -y)
pp.savefig()

pp.close()

14
[0] [2010-10-06 14:00:56] Björn Pollex

There is also PyReport [1], but that uses Latex for pdf-generation.

[1] http://gael-varoquaux.info/computers/pyreport/

15
[0] [2013-09-12 18:47:47] illbzo1

My company's built an HTML to PDF API called DocRaptor that uses PrinceXML for PDF generation. We've got a couple of Python examples in our documentation, as well.

DocRaptor [1]

DocRaptor Python examples [2]

[1] https://docraptor.com
[2] https://docraptor.com/documentation#python_example

16
[0] [2014-08-28 06:26:07] user2387567

using phantomjs resterize.js easily you can generate the pdf files ... here advantage is it will wait until ajax calls loaded ... remaining are bit difficult to implement


17