forked from GNUsocial/gnu-social
909 lines
32 KiB
Plaintext
909 lines
32 KiB
Plaintext
========================
|
||
eZ Components - Document
|
||
========================
|
||
|
||
.. contents:: Table of Contents
|
||
:depth: 3
|
||
|
||
Introduction
|
||
============
|
||
|
||
The document component offers transformations between different semantic markup
|
||
languages, like:
|
||
|
||
- `ReStructured text`__
|
||
- `XHTML`__
|
||
- `Docbook`__
|
||
- `eZ Publish XML markup`__
|
||
- Wiki markup languages, like: Creole__, Dokuwiki__ and Confluence__
|
||
- `Open Document Text`__ as used by `OpenOffice.org`__ and other office suites
|
||
|
||
Like shown in figure 1, each format supports conversions from and to docbook
|
||
as a central intermediate format and may implement additional shortcuts for
|
||
conversions from and to other formats. Not each format can express the same
|
||
semantics, so there may be some information lost, which is `documented in a
|
||
dedicated document`__.
|
||
|
||
.. figure:: img/document-architecture.png
|
||
:alt: Conversion architecture in document component
|
||
Figure 1: Conversion architecture in document component
|
||
|
||
There are central handler classes for each markup language, which follow a
|
||
common conversion interface ezcDocument and all implement the methods
|
||
getAsDocbook() and createFromDocbook().
|
||
|
||
Additionally the document component can render documents in the following
|
||
output formats. Those formats cannot be read, but just generated:
|
||
|
||
- PDF
|
||
|
||
__ http://docutils.sourceforge.net/rst.html
|
||
__ http://www.w3.org/TR/xhtml1/
|
||
__ http://www.docbook.org/
|
||
__ Document_conversion.html
|
||
__ http://doc.ez.no/eZ-Publish/Technical-manual/4.x/Reference/XML-tags
|
||
__ http://www.wikicreole.org/
|
||
__ http://www.dokuwiki.org/dokuwiki
|
||
__ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
|
||
__ http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
|
||
__ http://www.openoffice.org/
|
||
|
||
Markup languages
|
||
================
|
||
|
||
The following markup languages are currently handled by the document
|
||
component.
|
||
|
||
ReStructured text
|
||
-----------------
|
||
|
||
`RsStructured Text`__ (RST) is a simple text based markup language, intended
|
||
to be easy to read and write by humans. Examples can be found in the
|
||
`documentation of RST`__.
|
||
|
||
The transformation of a simple RST document to docbook can be done just like
|
||
this:
|
||
|
||
.. include:: tutorial/00_00_convert_rst.php
|
||
:literal:
|
||
|
||
In line 3 the document is actually loaded and parsed into an internal abstract
|
||
syntax tree. In line 5 the internal structure is then transformed back to a
|
||
docbook document. In the last line the resulting document is returned as a
|
||
string, so that you can echo or store it.
|
||
|
||
__ http://docutils.sourceforge.net/rst.html
|
||
__ http://docutils.sourceforge.net/docs/user/rst/quickstart.html
|
||
|
||
Error handling
|
||
^^^^^^^^^^^^^^
|
||
|
||
By default each parsing or compiling error will be transformed into an
|
||
exception, so that you are noticed about those errors. The error reporting
|
||
settings can be modified like for all other document handlers::
|
||
|
||
<?php
|
||
$document = new ezcDocumentRst();
|
||
$document->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
|
||
$document->loadFile( '../tutorial.txt' );
|
||
|
||
$docbook = $document->getAsDocbook();
|
||
echo $docbook->save();
|
||
?>
|
||
|
||
Where the setting in line 3 causes, that only warnings, errors and fatal errors
|
||
are transformed to exceptions now, while the notices are only collected, but
|
||
ignored. This setting affects both, the parsing of the source document and the
|
||
compiling into the destination language.
|
||
|
||
Directives
|
||
^^^^^^^^^^
|
||
|
||
`RST directives`__ are elements in the RST documents with parameters, optional
|
||
named options and optional content. The document component implements a well
|
||
known subset of the `directives implemented in the docutils RST parser`__. You
|
||
may register custom directive handlers, or overwrite existing directive
|
||
handlers using your own implementation. A directive in RST markup with
|
||
parameters, options and content could look like::
|
||
|
||
My document
|
||
===========
|
||
|
||
The custom directive:
|
||
|
||
.. my_directive:: parameters
|
||
:option: value
|
||
|
||
Some indented text...
|
||
|
||
For such a directive you should register a handler on the RST document, like::
|
||
|
||
<?php
|
||
$document = new ezcDocumentRst();
|
||
$document->registerDirective( 'my_directive', 'myCustomDirective' );
|
||
$document->loadFile( $from );
|
||
|
||
$docbook = $document->getAsDocbook();
|
||
$xml = $docbook->save();
|
||
?>
|
||
|
||
The class myCustomDirective must extend the class ezcDocumentRstDirective, and
|
||
implement the method toDocbook(). For rendering you get access to the full AST,
|
||
the contents of the current directive and the base path, where the document
|
||
resist in the file system - which is necessary for accessing external files.
|
||
|
||
__ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#directives
|
||
__ http://docutils.sourceforge.net/docs/ref/rst/directives.html
|
||
|
||
Directive example
|
||
`````````````````
|
||
|
||
A full example for a custom directive, where we want to embed real world
|
||
addresses into our RST document and maintain the semantics in the resulting
|
||
docbook, could look like::
|
||
|
||
Address example
|
||
===============
|
||
|
||
.. address:: John Doe
|
||
:street: Some Lane 42
|
||
|
||
We would possibly add more information, like the ZIP code, city and state, but
|
||
skip this to keep the code short. The implemented directive then would just
|
||
need to take these information and transform it into valid docbook XML using
|
||
the DOM extension.
|
||
|
||
.. include:: tutorial/00_01_address_directive.php
|
||
:literal:
|
||
|
||
The AST node, which should be rendered, is passed to the constructor of the
|
||
custom directive visitor and available in the class property $node. The
|
||
complete DOMDocument and the current DOMNode are passed to the method. In this
|
||
case we just create a `address node`__ with the optional child nodes street and
|
||
personname, depending on the existence of the respective values.
|
||
|
||
You can now render the RST document after you registered you custom directive
|
||
handler as shown above:
|
||
|
||
.. include:: tutorial/00_02_custom_directive.php
|
||
:literal:
|
||
|
||
The output will then look like::
|
||
|
||
<?xml version="1.0"?>
|
||
<article xmlns="http://docbook.org/ns/docbook">
|
||
<section id="address_example">
|
||
<sectioninfo/>
|
||
<title>Address example</title>
|
||
<address>
|
||
<personname> John Doe</personname>
|
||
<street> Some Lane 42</street>
|
||
</address>
|
||
</section>
|
||
</article>
|
||
|
||
__ http://docbook.org/tdg/en/html/address.html
|
||
|
||
XHTML rendering
|
||
^^^^^^^^^^^^^^^
|
||
|
||
For RST a conversion shortcut has been implemented, so that you don't need to
|
||
convert the RST to docbook and the docbook to XHTML. This saves conversion time
|
||
and enables you to prevent from information loss during multiple conversions::
|
||
|
||
<?php
|
||
$document = new ezcDocumentRst();
|
||
$document->loadFile( $from );
|
||
|
||
$xhtml = $document->getAsXhtml();
|
||
$xml = $xhtml->save();
|
||
?>
|
||
|
||
The default XHTML compiler generates complete XHTML documents, including header
|
||
and meta-data in the header. If you want to in-line the result, you may specify
|
||
another XHTML compiler, which just creates a XHTML block level element, which
|
||
can be embedded in your source code::
|
||
|
||
<?php
|
||
$document = new ezcDocumentRst();
|
||
$document->options->xhtmlVisitor = 'ezcDocumentRstXhtmlBodyVisitor';
|
||
$document->loadFile( $from );
|
||
|
||
$xhtml = $document->getAsXhtml();
|
||
$xml = $xhtml->save();
|
||
?>
|
||
|
||
You can of course also use the predefined and custom directives for XHTML
|
||
rendering. The directives used during XHTML generation also need to implement
|
||
the interface ezcDocumentRstXhtmlDirective.
|
||
|
||
Modification of XHTML rendering
|
||
```````````````````````````````
|
||
|
||
You can modify the generated output of the XHTML visitor by creating a custom
|
||
visitor for the RST AST. The easiest way probably is to extend from one of the
|
||
existing XHTML visitors and reusing it. For example you may want to fill the
|
||
type attribute in bullet lists, like known from HTML, which isn't valid XHTML,
|
||
though::
|
||
|
||
class myDocumentRstXhtmlVisitor extends ezcDocumentRstXhtmlVisitor
|
||
{
|
||
protected function visitBulletList( DOMNode $root, ezcDocumentRstNode $node )
|
||
{
|
||
$list = $this->document->createElement( 'ul' );
|
||
$root->appendChild( $list );
|
||
|
||
$listTypes = array(
|
||
'*' => 'circle',
|
||
'+' => 'disc',
|
||
'-' => 'square',
|
||
"\xe2\x80\xa2" => 'disc',
|
||
"\xe2\x80\xa3" => 'circle',
|
||
"\xe2\x81\x83" => 'square',
|
||
);
|
||
// Not allowed in XHTML strict
|
||
$list->setAttribute( 'type', $listTypes[$node->token->content] );
|
||
|
||
// Decoratre blockquote contents
|
||
foreach ( $node->nodes as $child )
|
||
{
|
||
$this->visitNode( $list, $child );
|
||
}
|
||
}
|
||
}
|
||
|
||
The structure, which is not enforced for visitors, but used in the docbook and
|
||
XHTML visitors, is to call special methods for each node type in the AST to
|
||
decorate the AST recursively. This method will be called for all bullet list
|
||
nodes in the AST which contain the actual list items. As the first parameter
|
||
the current position in the XHTML DOM tree is also provided to the method.
|
||
|
||
To create the XHTML we can now just create a new list node (<ul>) in the
|
||
current DOMNode, set the new attribute, and recursively decorate all
|
||
descendants using the general visitor dispatching method visitNode() for all
|
||
children in the AST. For the AST children being also rendered as children in
|
||
the XML tree, we pass the just created DOMNode (<ul>) as the new root node to
|
||
the visitNode() method.
|
||
|
||
After defining such a class, you could use the custom visitor like shown
|
||
above::
|
||
|
||
<?php
|
||
$document = new ezcDocumentRst();
|
||
$document->options->xhtmlVisitor = 'myDocumentRstXhtmlVisitor';
|
||
$document->loadFile( $from );
|
||
|
||
$xhtml = $document->getAsXhtml();
|
||
$xml = $xhtml->save();
|
||
?>
|
||
|
||
Now the lists in the generated XHTML will also the type attribute set.
|
||
|
||
Writing RST
|
||
^^^^^^^^^^^
|
||
|
||
Writing a RST document from an existing docbook document, or a
|
||
ezcDocumentDocbook object generated from some other source, is trivial:
|
||
|
||
.. include:: tutorial/00_03_write_rst.php
|
||
:literal:
|
||
|
||
For the conversion internally the ezcDocumentDocbookToRstConverter class is
|
||
used, which can also be called directly, like::
|
||
|
||
$converter = new ezcDocumentDocbookToRstConverter();
|
||
$rst = $converter->convert( $docbook );
|
||
|
||
Using this you can configure the converter to your wishes, or extend the
|
||
convert to handle yet unhandled docbook elements. The converter is, as usaul
|
||
configured using its option property, and the options are defined in the
|
||
ezcDocumentDocbookToRstConverterOptions class. There you may configure the
|
||
header underlines used, the bullet types or the line wrapping.
|
||
|
||
Extending RST writing
|
||
`````````````````````
|
||
|
||
As said before, not all existing docbook elements might already be handled by
|
||
the converter. But its handler based mechanism makes it easy to extend or
|
||
overwrite existing behaviour.
|
||
|
||
Similar to the example above we can convert the <address> docbook element back
|
||
to the address RST directive.
|
||
|
||
.. include:: tutorial/00_04_address_element.php
|
||
:literal:
|
||
|
||
The handler classes are assigned to XML elements in some namespace, "docbook"
|
||
in this case. It is registered in line 18 for the element "address". The class
|
||
itself has to extend from the ezcDocumentElementVisitorHandler class, which is
|
||
in this case already extended by ezcDocumentDocbookToRstBaseHandler, which
|
||
provides some convenience methods for RST creation, like renderDirective() used
|
||
in this example.
|
||
|
||
The handler is called, whenever the element, it has been registered for, occurs
|
||
in the docbook XML tree. In this case it has to append the generated RST part
|
||
for this element to the RST document - and may call the general conversion
|
||
handler again for its child elements. This example converts the above shown
|
||
docbook XML back to::
|
||
|
||
.. _address_example:
|
||
|
||
===============
|
||
Address example
|
||
===============
|
||
|
||
.. address::
|
||
John Doe
|
||
Some Lane 42
|
||
|
||
Which ignores any special address sub elements for the simplicity of the
|
||
example. For more examples on element handlers check the existing
|
||
implementations.
|
||
|
||
XHTML
|
||
-----
|
||
|
||
Converting XHTML or HTML to a document markup language is a non trivial task,
|
||
because XHTML elements are often used for layout, ignoring the actual semantics
|
||
of the element. Therefore the document component allows to stack a set of
|
||
filters, which each performs a specific conversion task. The default filter
|
||
stack may work fine, but you may want to also implement custom filters
|
||
depending on the contents of the filtered website, or to cover additional
|
||
sources of meta data information, like RDF, Microformats or similar.
|
||
|
||
The available filters are:
|
||
|
||
- ezcDocumentXhtmlElementFilter
|
||
|
||
This filter just maintains the common semantics of XHTML elements by
|
||
converting them to their docbook equivalents. It ignores common class names.
|
||
This filter is the most basic and you probably want to always add this one to
|
||
the filter stack.
|
||
|
||
- ezcDocumentXhtmlXpathFilter
|
||
|
||
The XPath filter takes a XPath expression to locate the root of the document
|
||
contents. It makes no sense to use this one together with the content locator
|
||
filter. This is a more static, but also more precise way to tell the
|
||
converter where to find the actual contents.
|
||
|
||
- ezcDocumentXhtmlMetadataFilter
|
||
|
||
This filter extracts common meta data from the XHTML head, and converts it
|
||
into docbook section info elements.
|
||
|
||
- ezcDocumentXhtmlTablesFilter
|
||
|
||
HTML tables are especially often used for layout markup. This filter takes a
|
||
threshold, and if the table text factor drops below this threshold the table
|
||
is ignored. The same is true for stacked tables.
|
||
|
||
- ezcDocumentXhtmlContentLocatorFilter
|
||
|
||
The content locator filter tries to find the actual article in the markup of
|
||
a website, ignoring the surrounding layout markup. This seems to work well
|
||
for example for common news sites.
|
||
|
||
By default just the element and meta data filters are used. So the conversion
|
||
of a common website, like the `introduction article`__ from ezcomponents.org,
|
||
results in a docbook document containing all lists for the navigation, etc..
|
||
|
||
.. include:: tutorial/01_00_read_html.php
|
||
:literal:
|
||
|
||
So let's additionally use the XPath filter to pass the location of the actual
|
||
content to the conversion:
|
||
|
||
.. include:: tutorial/01_01_read_html_filtered.php
|
||
:literal:
|
||
|
||
With this additional filter, the contents are correctly found and converted
|
||
properly.
|
||
|
||
__ http://ezcomponents.org/introduction
|
||
|
||
Writing XHTML
|
||
^^^^^^^^^^^^^
|
||
|
||
Writing XHTML from docbook is very similar to the approach used for writing
|
||
RST: It the same handler based mechanism, so you may want to check that chapter
|
||
to learn how to extend it for unhandled docbook elements.
|
||
|
||
.. include:: tutorial/01_02_write_html.php
|
||
:literal:
|
||
|
||
As you can see, it happens the same way, as for other conversion from Docbook
|
||
to any other format.
|
||
|
||
HTML styles
|
||
^^^^^^^^^^^
|
||
|
||
By default inline CSS is embedded in all generated HTML, to create a more
|
||
appealing default experience. This may of course be deactivated and you may
|
||
also reference custom style sheets to be included in the generated HTML.
|
||
|
||
.. include:: tutorial/01_03_write_html_styled.php
|
||
:literal:
|
||
|
||
For this we again use the converted directly to be able to configure it as we
|
||
like.
|
||
|
||
eZ Xml
|
||
------
|
||
|
||
eZ XML describes the markup format used internally by `eZ Publish`__ for
|
||
storing markup in content objects. The format is roughly specified in the `eZ
|
||
Publish documentation`__.
|
||
|
||
Modules are often register custom elements, which are not specified anywhere,
|
||
so there might be several elements not handled by default.
|
||
|
||
__ http://ez.no/ezpublish
|
||
__ http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
|
||
|
||
Reading eZ XML
|
||
^^^^^^^^^^^^^^
|
||
|
||
Reading eZ XML is basically the same as for all other formats:
|
||
|
||
.. include:: tutorial/02_00_read_ezxml.php
|
||
:literal:
|
||
|
||
As always the document object is either constructed from an input string or
|
||
file. To convert into docbook you may just use the method getAsDocbook().
|
||
|
||
Link handling
|
||
`````````````
|
||
|
||
Inside eZ XML documents link URIs are replaced with IDs, which reference the
|
||
links inside the eZ Publish database, to ensure that a changed link is update
|
||
globally. The replacing of such links is handled by a class extending from
|
||
ezcDocumentEzXmlLinkProvider. By default dummy URLs are added to the documents.
|
||
|
||
URLs are either referenced directly by their ID, a node ID, or an object ID.
|
||
Those parameters are passed to the link provide, which then should return an
|
||
URL for that.
|
||
|
||
.. include:: tutorial/02_01_link_provider.php
|
||
:literal:
|
||
|
||
The link provider is only implemented as a trivial stub, but you can establish
|
||
a database connection there and actually fetch the required data. I this case
|
||
the generated docbook document look like::
|
||
|
||
<?xml version="1.0"?>
|
||
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
|
||
<article xmlns="http://docbook.org/ns/docbook">
|
||
<section>
|
||
<title>Paragraph</title>
|
||
<para>Some content, with a <ulink url="http://host/path/1">link</ulink>.</para>
|
||
</section>
|
||
</article>
|
||
|
||
The link provider is set again as a option of the converter. Like shown for the
|
||
docbook conversions of the other handlers, you can register element handlers
|
||
for yet unhandled eZ XML elements on the converter, too.
|
||
|
||
Wrting eZ XML
|
||
^^^^^^^^^^^^^
|
||
|
||
Writing eZ XML works nearly the same as reading. It again uses a XML based
|
||
element handled, like shown in the Docbook to RST conversion in more detail.
|
||
For the link conversion an object extending from ezcDocumentEzXmlLinkConverter
|
||
is used, which returns an array with the attributes of the link in the eZ XML
|
||
document.
|
||
|
||
Wiki markup
|
||
-----------
|
||
|
||
Wiki markup has no central standard, but is used as a term to describe some
|
||
common subset with lots of different extensions. Most wiki markup languages
|
||
only support a quite trivial markup with severe limitations on the recursion of
|
||
markup blocks. For example no markup really tables containing lists, or
|
||
especially not tables containing other tables.
|
||
|
||
The document component implements a generic parser to support multiple wiki
|
||
markup languages. For each different markup syntax a tokenizer has to be
|
||
implemented, which converts the implemented markup into a unified token stream,
|
||
which can then be handled by the generic parser.
|
||
|
||
The document component currently supports reading three wiki markup languages,
|
||
but new ones are added easily by implementing another tokenizer. Supported are:
|
||
|
||
- Creole__, developed by a initiative with the intention to create a unified
|
||
wiki markup standard. This is the default wiki language, and currently the
|
||
only one which can be written.
|
||
|
||
Creole currently only supports a very limited set of markup__, all further
|
||
markup additions are still up to discussion.
|
||
|
||
- Dokuwiki__ is a popular wiki system, for example used on `wiki.php.net`__
|
||
with a quite different syntax, and the most complete markup support, even
|
||
including something like footnotes.
|
||
|
||
- Confluence__ is a common Java based wiki with an entirely different and most
|
||
uncommon syntax, which has mainly been implemented to prove the generic
|
||
nature of the parser.
|
||
|
||
All markup languages are tested against all examples from the respective
|
||
markup language documentation, there might still be cases where the parsers of
|
||
the default implementation behaves slightly different from the implementation
|
||
in the document component.
|
||
|
||
__ http://www.wikicreole.org/
|
||
__ http://www.wikicreole.org/wiki/Elements
|
||
__ http://www.dokuwiki.org/dokuwiki
|
||
__ http://wiki.php.net/
|
||
__ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
|
||
|
||
Reading wiki markup
|
||
^^^^^^^^^^^^^^^^^^^
|
||
|
||
Reading wiki texts basically works like for any other markup language:
|
||
|
||
.. include:: tutorial/03_00_read_wiki.php
|
||
:literal:
|
||
|
||
As said, by default the Creoletokenizer is used. The same result can be
|
||
produced with dokuwiki markup and switching the tokenizer:
|
||
|
||
.. include:: tutorial/03_01_read_wiki_confluence.php
|
||
:literal:
|
||
|
||
Writing wiki markup
|
||
^^^^^^^^^^^^^^^^^^^
|
||
|
||
Until now only writing of creole wiki markup is supported. Since creole does
|
||
not support a lot of the markup available in docbook, not all documents might
|
||
get converted properly. Because it does not even support explicit internal
|
||
references, we cannot even simulate footnotes like in HTML.
|
||
|
||
If you want to add support for such conversions, it works exactly like the
|
||
docbook RST conversion and can be extended the same way.
|
||
|
||
.. include:: tutorial/03_02_write_wiki.php
|
||
:literal:
|
||
|
||
PDF
|
||
---
|
||
|
||
PDF (Portable Document Format) has been developed to provide a document
|
||
format, which can be presented software and system independent. Because of
|
||
this it is often used as a pre-print document exchange format.
|
||
|
||
The document componen can generate PDF document from all other input formats
|
||
and offers a language very similar to CSS to apply custom styling to the
|
||
generated output. Additionally it supports adding custom parts, like footers
|
||
and headers, to the PDF document.
|
||
|
||
Reading PDF
|
||
^^^^^^^^^^^
|
||
|
||
The document component for now does not support reading PDF documents.
|
||
|
||
Writing PDF
|
||
^^^^^^^^^^^
|
||
|
||
Writing PDF basically works like writing any other format supported by the
|
||
document component, like the basic example shows:
|
||
|
||
.. include:: tutorial/04_01_create_pdf.php
|
||
:literal:
|
||
|
||
First we include some RST file to create a Docbook file from it, because, like
|
||
described before, Docbook is the central conversion format.
|
||
|
||
Afterwards the Docbook document is loaded by the PDF class and saved. When
|
||
converting the document to a string the PDF is renderer using the default
|
||
options and the default driver. The result of this rendering call can be
|
||
watched here: `04_01_create_pdf.pdf`__.
|
||
|
||
__ 04_01_create_pdf.pdf
|
||
|
||
Output writers
|
||
``````````````
|
||
|
||
Since there are numerous different PDF renderers in the PHP world and the
|
||
available ones might depend on the current environment, the document component
|
||
supports different PDF driver, as wrapper around different existent libraries.
|
||
|
||
For now two implementation exist for pecl/haru and TCPDF, but it is fairly easy
|
||
to write another one, for another PDF class.
|
||
|
||
Haru
|
||
""""
|
||
|
||
libharu__ is a open source PDF generation library, written in C, and wrapped
|
||
by the haru PHP extension, available from PECL__. If PEAR is correctly setup
|
||
on your machine it should install as easy as::
|
||
|
||
pear install pecl/haru
|
||
|
||
The Haru driver is pretty fast, but currently has issues with some special
|
||
characters. It is the default driver, but can be explicitly used by setting
|
||
the driver option on the PDF class, like::
|
||
|
||
$pdf = new ezcDocumentPdf();
|
||
$pdf->options->driver = new ezcDocumentPdfHaruDriver();
|
||
|
||
__ http://libharu.org
|
||
__ http://pecl.php.net/package/haru
|
||
|
||
TCPDF
|
||
"""""
|
||
|
||
TCPDF is a pure PHP based PDF generation library, available from
|
||
`tcpdf.org`__. To use the TCPDF driver you need to download and include its
|
||
main class before rendering the PDF. It supports all aspects of PDF rendering
|
||
required by the document component, but has some bad coding practices, like:
|
||
|
||
- Throws lots of warnings and notices, which you might want to silence by
|
||
temporarily changing the error reporting level
|
||
- Reads and writes several global variables, which might or might not
|
||
interfere with your application code
|
||
- Uses eval() in several places, which results in non-cacheable OP-Codes.
|
||
|
||
The TCPDF driver can be used after including the TCPDF source code, using::
|
||
|
||
$pdf = new ezcDocumentPdf();
|
||
$pdf->options->driver = new ezcDocumentPdfTcpdfDriver();
|
||
|
||
__ http://tcpdf.org
|
||
|
||
Styling the PDF
|
||
```````````````
|
||
|
||
The PDF output can be styled using a CSS like language, which assigns styles
|
||
based on the Docbook XML structure. The default styling rules are defined in
|
||
the `default.css`__.
|
||
|
||
__ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
|
||
|
||
The first most relevant part are the general layout options, which can be
|
||
defined for the common article root node in the Docbook XML file. You can set
|
||
global font options there, like::
|
||
|
||
article {
|
||
// Basic font style definitions
|
||
font-size: "12pt";
|
||
font-family: "serif";
|
||
font-weight: "normal";
|
||
font-style: "normal";
|
||
line-height: "1.4";
|
||
text-align: "left";
|
||
|
||
// Basic page layout definitions
|
||
text-columns: "1";
|
||
text-column-spacing: "10mm";
|
||
|
||
// General text layout options
|
||
orphans: "3";
|
||
widows: "3";
|
||
}
|
||
|
||
The meaning of the first set of options should be obvious from CSS. We require
|
||
each value to be wrapped by quotes for easier parsing, though.
|
||
|
||
The second set of options defines options for multi-column layouts, which are
|
||
not available in the web, but quite common in generated PDF documents. You can
|
||
specify the number of text columns, as well as the distance between the text
|
||
columns here.
|
||
|
||
The third set in this example defines lesser known text layout options like
|
||
the handling of `orphans and widows`__, which specify the handling of
|
||
overlapping parts of paragraphs on page wrapping.
|
||
|
||
You can, of course, apply those styles to any elements in your document, using
|
||
the common CSS addressing rules, like::
|
||
|
||
// Emphasis node anywhere in the document
|
||
emphasis { ... }
|
||
|
||
// Title element directly below a section element
|
||
section > title { ... }
|
||
|
||
// Title element anywhere below a section element
|
||
section title { ... }
|
||
|
||
// Title element with the ID "first_title"
|
||
title#first_title { ... }
|
||
|
||
// Title element with the class "foo"
|
||
title.foo { ... }
|
||
|
||
// emphasis node directly below a title with class "foo", anywhere in a
|
||
// section with the ID "first"
|
||
section#first title.foo > emphasis { ... }
|
||
|
||
The values and `measures`__ for the properties are very similar to the
|
||
properties in CSS. For example the margin and padding properties accept one-
|
||
to four-tuples of values, with the same respective meaning like in CSS.
|
||
|
||
Another central formatting element, which is special to the PDF generation, is
|
||
the virtual element "page"::
|
||
|
||
page {
|
||
page-size: "A4";
|
||
page-orientation: "portrait";
|
||
padding: "22mm 16mm";
|
||
}
|
||
|
||
The page-size property accepts several known page size identifiers and the
|
||
page-orientation defines the orientation of a page. You can also address any
|
||
page directly by its ID, which will be 'page_1' for the first page, or its
|
||
class, which will be "right", or "left", depending on the current page number.
|
||
|
||
A detailed description of all available `PDF style options`__ is available
|
||
here__.
|
||
|
||
__ http://en.wikipedia.org/wiki/Widows_and_orphans
|
||
__ measures
|
||
__ Document_styles.html
|
||
__ Document_styles.html
|
||
|
||
Measures
|
||
""""""""
|
||
|
||
The properties in the PDF component accept different measures, which are:
|
||
|
||
- "mm", Millimeters, the default measure, if none is specified
|
||
- "pt", Points, 72 points per inch
|
||
- "px", Pixel, depends on the set resolution, by default also 72 points per
|
||
inch
|
||
- "in", Inch
|
||
|
||
The unit "Points" is most common for font sizes, while millimeters or inches
|
||
will probably more useful for page paddings. You are free to choose any of
|
||
them and can even combine different units in one tuple, like::
|
||
|
||
para {
|
||
// Top margin: 12 mm; Right margin: .1 inch; Bottom margin: 10 points,
|
||
// Left margin: 1 pixel
|
||
margin: "12 .1in 10pt 1px";
|
||
}
|
||
|
||
PDF parts
|
||
`````````
|
||
|
||
PDF parts are additional parts in a rendered document, like headers and
|
||
footers. You can implement and register them yourself, and they are activated
|
||
by different triggers, like:
|
||
|
||
- on document creation
|
||
- on page creation
|
||
- when a document has been finished
|
||
|
||
The default implementation for headers and footers is triggered on page
|
||
creation and renders the title of the document, its author and a page number
|
||
in the header or the footer. To develop a custom PDF part you should extend
|
||
from the ezcDocumentPdfPart class.
|
||
|
||
For the following document we are using a set of custom styles, as well as a
|
||
header and a footer to customize the rendered PDF document. The additional
|
||
custom CSS changes the default font and the page border:
|
||
|
||
.. include:: tutorial/custom.css
|
||
:literal:
|
||
|
||
The code using the custom CSS and headers and footers then looks like:
|
||
|
||
.. include:: tutorial/04_02_create_pdf_styled.php
|
||
:literal:
|
||
|
||
The first part, the creation of a Docbook document from a RST document is just
|
||
the same like in the first example.
|
||
|
||
Afterwards we load the above mentioned custom.css as an additional style. You
|
||
can load as many styles as you want. If multiple styles are loaded, the latter
|
||
ones always (partly) redefine the first styles.
|
||
|
||
After that two custom PDF parts are registered using their respective option
|
||
class to configure their skin. The footer should only show the page number,
|
||
while the header should display all parts (title and author), but the page
|
||
number.
|
||
|
||
At the end of the example the document is created as usual, and looks like
|
||
this: `04_02_create_pdf_styled.pdf`__ Since the source document does not
|
||
include any author information, this information is also not rendered in the
|
||
header.
|
||
|
||
__ 04_02_create_pdf_styled.pdf
|
||
|
||
Hyphenating
|
||
```````````
|
||
|
||
Proper hyphenation is crucial for nice text rendering especially for justified
|
||
paragraph formatting. Since hyphenation is highly language dependent you can
|
||
create and use your own custom hyphenator - the default one doesn't do any
|
||
hyphenation by default, but just keeps every word as it is.
|
||
|
||
Custom hyphenators can be implemented by extending from the abstract class
|
||
ezcDocumentPdfHyphenator. The only need to implement one Method,
|
||
```splitWord()```, which should return possible splitting points of the given
|
||
word, as documented in the ezcDocumentPdfHyphenator class.
|
||
|
||
The custom hyphenator can be configured in the ezcDocumentPdfOptions class,
|
||
like this::
|
||
|
||
$pdf = new ezcDocumentPdf();
|
||
$pdf->options->hyphenator = new myHyphenator();
|
||
|
||
The hyphenator will then be used by all text renderers during the rendering
|
||
process.
|
||
|
||
Open Document Text
|
||
------------------
|
||
|
||
The Open Document Text (ODT) format is natively provided by the
|
||
`OpenOffice.org`__ office application suite and supported by other common word
|
||
processing tools. The Document component supports importing, exporting and
|
||
styling of ODT files.
|
||
|
||
.. note:: By now only im- and export of flat ODT (.fodt) files is possible.
|
||
These can be processed by OpenOffice.org natively. To store FODT,
|
||
simply choose the file type from the save dialog.
|
||
|
||
|
||
Reading ODT
|
||
^^^^^^^^^^^
|
||
|
||
The ODT document class reads FODT files and converts them into the internal
|
||
Docbook representation of the Document component:
|
||
|
||
.. include:: tutorial/05_00_read_fodt.php
|
||
:literal:
|
||
|
||
You can generate any of the supported document formats from the Docbook
|
||
representation.
|
||
|
||
FODT files may contain embedded media files, i.e. usually images, which will be
|
||
extracted during the import process. You can specify the directory where these images will
|
||
be stored through the ```imageDir``` option::
|
||
|
||
<?php
|
||
$odt->options->imageDir = '/path/to/your/images';
|
||
?>
|
||
|
||
The default is your systems temporary directory.
|
||
|
||
Since Open Document only contains few semantical information compared to
|
||
Docbook, the import mechanism performs heuristic detection of information like
|
||
emphasized text. This mechanism is quite rudimentary by now and will be made
|
||
available as a public API as it matured.
|
||
|
||
Writing ODT
|
||
^^^^^^^^^^^^^
|
||
|
||
FODT files can be written similar to any of the other formats supported by the
|
||
Document component:
|
||
|
||
.. include:: tutorial/05_01_write_fodt.php
|
||
:literal:
|
||
|
||
Styling ODT
|
||
^^^^^^^^^^^
|
||
|
||
FODT output can be styled using a CSS like language similar to `Styling the
|
||
PDF`_. Using simplified CSS you assign style rules to Docbook XML elements,
|
||
which are generated into automatic styles in the resulting Open Document. The
|
||
default styling rules (`default.css`__) are the same as for PDF.
|
||
|
||
__ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
|
||
|
||
Applying custom styles can be done as follows:
|
||
|
||
.. include:: tutorial/05_02_write_fodt_styled.php
|
||
:literal:
|
||
|
||
A detailed description of the available `style options` is available `here`__.
|
||
|
||
__ Document_styles.html
|
||
__ Document_styles.html
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: rst
|
||
fill-column: 79
|
||
End:
|
||
vim: et syn=rst tw=79 |