Converting HTML To RML
HTML often forms part of the input for systems, though sometimes this can cause issues when
trying directly to generate a PDF containing HTML. ReportLab have developed tools within our rlextra
package to deal with two related issues:
1.Cleaning input before the data is saved - removing tags and other content that might cause problems
2.Writing out HTML content to a PDF
Cleaning input before the data is saved
This section covers some content also included on this site under XML helper utilities.
Note: rlextra
and html_cleaner
does not handle the extensive plethora of HTML tags and attributes but instead
focuses on a smaller subset tags and attributes. We can use the functionality that exists in rlextra/radxml/html_cleaner.py
. Some basic examples follow, though for more comprehensive examples, look directly at the test
function in html_cleaner.py
. These examples assume you have the necessary imports.
>>> from rlextra.radxml.html_cleaner import cleanPlain, cleanBlocks, cleanInline
cleanBlocks
Accept markup as one or more blocks. Example
>>> data = "<p>This is <unkown>raw data</unkown> with HTML</em> <b>paragraph</b></p>"
>>> cleanBlocks(data)
'<p>This is raw data with HTML <b>paragraph</b></p>'
cleanInline
Accept and normalize markup for use inline. Example
>>> data = "<img width='100' unkown='x' src='photo.png'/>"
>>> cleanInline(data)
'<img width="100" src="photo.png" alt=""/>'
cleanPlain
Remove all tags to output plain text. Example
>>> from rlextra.radxml.html_cleaner import cleanPlain
>>> data = "<p>This is raw data with <em>HTML</em> <b>paragraph</b></p>"
>>> cleanPlain(data)
'This is raw data with HTML paragraph'
Writing out HTML content to a PDF
Here we detail rendering html in a PDF but also include the aforementioned cleanPlain
There are a number of approaches that can be taken depending on your input.
In these snippets we use the following imports;
from preppy import SafeString
from rlextra.radxml.xhtml2rml import xhtml2rml
from rlextra.radxml.html_cleaner import cleanPlain
The input examples are:
data = "<p>This is raw data with <em>HTML</em> <b>paragraph</b></p>"
data2 = "This is raw data with <em>HTML</em> <b>paragraph</b>"
1: Raw XHTML data example; preppy quoting escapes the tags
<para>{{data}}</para>
2: cleanPlain Strips XHTML tags example
<para style="normal">{{cleanPlain(data)}}</para>
3: XHTML data without para
tags but with inline tags, ensure the data is enclosed in an RML para
tag
SafeString
tells preppy not to xml escape the contents
xhtml2rml
converts the XHTML to RML
<para style="normal">{{SafeString(xhtml2rml(data2))}}</para>
4: XHTML to RML data example - without a specified paraStyle
, ensure there are no RML para tags around the data.
When there no paraStyles
specified with the content, xhtml2rml
assumes paraStyle='normal'
, tableStyle='noPaddingStyle'
, bulletStyle='bullet'
exists in your style sheets
{{SafeString(xhtml2rml(data))}}
5: XHTML to RML data example - with a specified paraStyle
, ensure there are no RML para tags around the data
{{SafeString(xhtml2rml(data, paraStyle="normal"))}}