Converting HTML To RML
HTML often forms part of the input for systems, though sometimes this can cause issues when
trying directly to generate a PDF containing HTML. ReportLab have developed tools within our
package to deal with two related issues:
1.Cleaning input before the data is saved - removing tags and other content that might cause problems
2.Writing out HTML content to a PDF
Cleaning input before the data is saved
This section covers some content also included on this site under XML helper utilities.
html_cleaner does not handle the extensive plethora of HTML tags and attributes but instead
focuses on a smaller subset tags and attributes. We can use the functionality that exists in
rlextra/radxml/html_cleaner.py. Some basic examples follow, though for more comprehensive examples, look directly at the
test function in
html_cleaner.py. These examples assume you have the necessary imports.
>>> from rlextra.radxml.html_cleaner import cleanPlain, cleanBlocks, cleanInline
Accept markup as one or more blocks. Example
>>> data = "<p>This is <unkown>raw data</unkown> with HTML</em> <b>paragraph</b></p>" >>> cleanBlocks(data) '<p>This is raw data with HTML <b>paragraph</b></p>'
Accept and normalize markup for use inline. Example
>>> data = "<img width='100' unkown='x' src='photo.png'/>" >>> cleanInline(data) '<img width="100" src="photo.png" alt=""/>'
Remove all tags to output plain text. Example
>>> from rlextra.radxml.html_cleaner import cleanPlain >>> data = "<p>This is raw data with <em>HTML</em> <b>paragraph</b></p>" >>> cleanPlain(data) 'This is raw data with HTML paragraph'
Writing out HTML content to a PDF
Here we detail rendering html in a PDF but also include the aforementioned
There are a number of approaches that can be taken depending on your input.
In these snippets we use the following imports;
from preppy import SafeString from rlextra.radxml.xhtml2rml import xhtml2rml from rlextra.radxml.html_cleaner import cleanPlain
The input examples are:
data = "<p>This is raw data with <em>HTML</em> <b>paragraph</b></p>" data2 = "This is raw data with <em>HTML</em> <b>paragraph</b>"
1: Raw XHTML data example; preppy quoting escapes the tags
2: cleanPlain Strips XHTML tags example
3: XHTML data without
para tags but with inline tags, ensure the data is enclosed in an RML
SafeString tells preppy not to xml escape the contents
xhtml2rml converts the XHTML to RML
4: XHTML to RML data example - without a specified
paraStyle, ensure there are no RML para tags around the data.
When there no
paraStyles specified with the content,
bulletStyle='bullet' exists in your style sheets
5: XHTML to RML data example - with a specified
paraStyle, ensure there are no RML para tags around the data