Chapter 2: Pages and page structures
2.1. XML syntax and RML
As with every XML dialect, RML requires correct XML syntax. If you are familiar with HTML, you should pay special attention to the differences between XML syntax and some of the more forgiving constructs allowed in HTML.
- Attribute values must be enclosed in quotation marks. (e.g. you would have to use
<document filename="outfile.pdf">
, since you couldn't get away with<document filename=outfile.pdf>
- A non-empty element must have both an opening and a closing tag. (e.g. a
<document>
tag must be matched by a matching</document>
tag). "Empty" elements are those that don't have any content, and are closed with a "/>" at the end of the same tag rather than having a separate closing tag. (e.g.<getName id="Header.Title"/>
) - Tags must be nested correctly. (i.e. "
<b><i>text</b></i>
" isn't valid, but "<b><i>text</i></b>
" is.) - On the whole, whitespace is ignored in RML. Except inside strings, you can format and indent your RML documents in whatever way you consider most readable. (Inside text strings, whitespace is seen as equivalent to a single space and line breaks are added automatically as needed during formatting. Other than that, what you type is what is displayed on the page).
- RML is case-sensitive. "Upper Case" is different from "upper case", "UPPER CASE" and "UpPeR CaSe". The capitalization in the tag names is important.
2.2. The prolog
Every RML document must start with a number of lines:
- This is called the prolog - you can think of it as the document 'header'.
<?xml ... standalone="no" ?>
- This line is the XML declaration. This is optional, but recommended.
version="1.0"
- This attribute tells the parser which version of XML it should use - in this case 1.0.
standalone="no"
- This tells the parser that it needs an external Document Type Definition (more on DTDs below).
encoding="iso-8859-1"
- The "encoding" attribute sets the encoding you want the PDF file to use. The ISO-8859-1 encoding covers the character set known as "US-ASCII", plus things like the accented characters used in most Western European Languages and some control characters and graphical characters. ISO-8859-1 is also known as "Latin-1"(or "Latin Alphabet No 1"). Other common encodings are utf-8 (same as US-ASCII for "normal" characters like A-Z and 0-9, but also covers the whole Unicode character set) and cp1252 (a Microsoft Windows variant of ISO-8859-1). You may use any encoding you wish with RML, as long as the encoding attribute here matches the encoding you actually used to write the RML file!
<!DOCTYPE... "rml.dtd">
- This line tells the parser where the Document Type Definition is located. The DTD formally specifies the syntax of RML.
- For documents written in RML, the DTD should always be the current version of rml.dtd. (The rml DTD should always be called rml.dtd.
- Unlike other dialects of XML, RML does not allow you to provide relative paths to the DTD, nor a full URL. It must always be the name of the DTD, which must live in the same directory as the exe or python program rml2pdf.
- This makes it easy to predict where the RML DTD will be and prevents you using an old DTD that happens to be sitting around your disk somewhere. It also allows us to make sure that when you create a file with RML, the PDF document will be created in the same directory as the RML file, and to allow relative pathnames in the document tag.
The prolog section is common to all XML documents. In addition to this, RML requires another line following the prolog:
<document filename="outfile.pdf">
-
This line gives the name that you want the output PDF file created with. This line also starts the document proper - and must be matched by a
</document>
tag as the last line in the document, in the same way that an HTML file is bracketed by<HTML>
and</HTML>
. -
The filename you give can just be a simple filename, a relative path (eg ..\..\myDoc.pdf will create it in the directory two levels up from the one your RML document is in), or a full pathname (eg C:\output_files\pdf\myProject\myDocument.pdf or /tmp/user1/myScratchFile.pdf ). If you just supply a filename, the output file will be created in the same directory as your RML file. (The same principle works with anywhere else you may need to give a filename - they are relative to where the document lives on your disk, not to where rml2pdf is).
The <document>
tag has three other attributes:
- compression specifies whether the produced PDF should be compressed if at all possible. It can take the values
0 | 1 | default
for off, on or use the site-wide default (as specified in reportlab_rl_config). - invariant determines whether the produced PDF should be invariant with respect to the date and the exact contents. It can take the values
0 | 1 | default
for off, on or use the site-wide default (as specified in reportlab_rl_config). - debug determines whether debugging/logging mode should be used during document production. It can take the values
0 | 1
for off or on.
2.3. Document forms: stylesheet/pageDrawing vs template/stylesheet/story
There are two possible valid structures for your document to have, depending on how simple you want it to be.
For very simple documents, you need the prolog, followed by a <stylesheet>
and any number of <pageDrawing>
tags.
A pageDrawing
is a graphical element on the page, or simple text string (i.e. it is just placed onto the page in
the location you specify, and no attempt is made to check if it flows off the page).
EXAMPLE 1
<!DOCTYPE document SYSTEM "rml.dtd">
<document filename="example_1.pdf">
<stylesheet>
</stylesheet>
<pageDrawing>
<drawCentredString x="4.1in" y="5.8in">
Hello World.
</drawCentredString>
</pageDrawing>
</document>
Figure 1: Output from EXAMPLE 1
(All the examples given in this document can be found online at http://www.reportlab.com/documentation/rml-samples/ or in the mercurial repository at https://hg.reportlab.com/hg-public/rlextra-examples which is a copy of the tests in our main repository.)
This is the most basic RML document you can get. It is the traditional "Hello World". All it does is place the string of text "Hello World" into the middle of your A4 page. Not very useful in the real world, but enough to show you how simple RML can be.
Notice how it does have a stylesheet
, but it is empty. Stylesheets are mandatory, but they don't need to
actually contain anything. Also notice how in the drawCenteredString
tag,
the co-ordinates are enclosed in quotation marks - they are attributes, and so need to live inside quotes.
And if you look at the drawCenteredString
tag, these attributes are inside the tag
(actually inside the angle brackets), then the content of the string comes after it, then the tag is closed by its
matching </drawCenteredString>
tag. All tags with content need their matching closing tag - the
<document>
and <stylesheet>
tags are also parts of matching pairs.
One last thing to notice is the DOCTYPE line - for all these examples, we are assuming that the DTD is in the same directory as the example file itself. This may not always be the case.
For a more complex RML document, you can use the more powerful template/stylesheet/story form of document. In this, a file contains the following three sections:
- a
template
- a
stylesheet
- a
story
The template tells rml2pdf what should be on the page: headers, footers, any graphic elements you use as a background.
The stylesheet is where the styles for a document are set. This tells the parser what fonts to use for paragraphs and paragraph headers, how to format tables and other things of that nature.
The story is where the "meat" of the document is. Just like in a newspaper, the story is the bit you want people to read, as opposed to design elements or page markup. As such, this is where headers, paragraphs and the actual text is contained.
EXAMPLE 2
<!DOCTYPE document SYSTEM "rml.dtd">
<document filename="example_2.pdf">
<template>
<pageTemplate id="main">
<frame id="first" x1="72" y1="72" width="451" height="698"/>
</pageTemplate>
</template>
<stylesheet>
</stylesheet>
<!-- The story starts below this comment -->
<story>
<para>
This is the "story". This is the part of the RML document where
your text is placed.
</para>
<para>
It should be enclosed in "para" and "/para" tags to turn it into
paragraphs.
</para>
</story>
</document>
Figure 2: Output from EXAMPLE 2
The <pageTemplate>
, <pageGraphics>
,
<frame>
and <paraStyle>
tags will all be covered in
more detail later on in this guide.
Paragraphs start with a <para>
tag and are closed with a </para>
tag.
Their appearance can be controlled with the <paraStyle>
tag.
RML allows you to use comments in the RML code. These are not displayed in the output PDF file. Just like in HTML, they start with a "<!--" and are terminated with a "-->". Unlike other tags, comments cannot be nested. In fact, you can't even have the characters "--" inside the section.