gnu-social/vendor/zetacomponents/document/design/pdf_design.txt
2021-07-16 19:44:40 +01:00

365 lines
11 KiB
Plaintext

==================================
Design document for PDF generation
==================================
:Author: kn
This is design document for the PDF generation in the eZ Components document
component. PDF documents should be created from Docbook documents, which can
be generated from each markup available in the document component.
The requirements which should be designed in this document are specified in
the pdf_requirements.txt document.
Layout directives
=================
The PDF document will be created from the Docbook document object. It will
already contain basic formatting rules for meaningful default document layout.
Additional rules may be passed to the document in various ways.
Generally a CSS like approach will be used to encode layout information. This
allows both, the easily readable addressing of nodes in an XML tree, like
already known from CSS, and humanly readable formatting options.
A limited subset of CSS will be used for now for addressing elements inside the
Docbook XML tree. The grammar for those rules will be::
Address ::= Element ( Rule )*
Rule ::= '>'? Element
Element ::= ElementName ( '.' ClassName | '#' ElementId )
ClassName ::= [A-Za-z_-]+
ElementName ::= XMLName* | '*'
ElementId ::= XMLName
* XMLName references to http://www.w3.org/TR/REC-xml/#NT-Name
The semantics of this simple subset of addressing directives are the same as in
CSS. A second level title could for example then be addressed by::
section title
The formatting options are also mostly the same as in CSS, but again only
using a subset of the definitions available in CSS and with some additional
formatting options, relevant especially for PDF rendering. The used formatting
options depend on the renderer - unknown formatting options may issue errors
or warnings.
The PDF document wrapper class will implement Iterator and ArrayAccess to
access the layout directives, like the following example shows::
$pdf = new ezcDocumentPdf();
$pdf->createFromDocbook( $docbook );
$pdf->styles['article > section title']['font-size'] = '1.6em';
Directives which are specified later will always overwrite earlier directives,
for each each formatting option specified in the later directive. The
overwriting of formatting options will NOT depend on the complexity of the
node addressing like in CSS.
Importing and exporting layout directives
-----------------------------------------
The layout directives can be exported and imported to and from files, so that
users of the component may store a custom PDF layout. The storage format will
again very much look like a simplified variant of CSS::
File ::= Directive+
Directive ::= Address '{' Formatting* '}'
Formatting ::= Name ':' '"' Value '"' ';'
Name ::= [A-Za-z-]+
Value ::= [^"]+
C-style comments are allowed anywhere in the definition file, like ```/* ..
*/``` and ```// ...```.
Importing and exporting styles may be accomblished by::
$pdf->styles->load( 'styles.pcss' );
List of formatting options
--------------------------
There will be formatting options just processed, like they are defined in CSS,
and some custom options. The options just reused from CSS are:
- background-color
- background-image
- background-position
- background-repeat
- border-color
- border-width
- border-bottom-color
- border-bottom-width
- border-left-color
- border-left-width
- border-right-color
- border-right-width
- border-top-color
- border-top-width
- color
- direction
- font-family
- font-size
- font-style
- font-variant
- font-weight
- line-height
- list-style
- list-style-position
- list-style-type
- margin
- margin-bottom
- margin-left
- margin-right
- margin-top
- orphans
- padding
- padding-bottom
- padding-left
- padding-right
- padding-top
- page-break-after
- page-break-before
- text-align
- text-decoration
- text-indent
- white-space
- widows
- word-spacing
Custom properties are:
text-columns
Number of text text columns in one section.
text-column-spacing
The margin between multiple text comlumns on one page
page-size
Size of pages
page-orientation
Orientation of pages
Not all options can be applied to each element. The renderer might complain on
invalid options, depending on the configured error level.
Special layout elements
=======================
Footers & Headers
-----------------
Footnotes and Headers are special layout elements, which can be rendered
manually by the user of the component. They can be considered as small
sub-documents, but their renderer receives additional information about the
current page they are rendered on.
They can be set like::
$pdf = new ezcDocumentPdf();
$pdf->createFromDocbook( $docbook );
$pdf->footer = new myDocumentPdfPart();
Each of those parts can render itself and calculate the appropriate bounding.
There might be extensions from the basic PDFPart class, which again render small
Docbook sub documents into one header, or just take a string, replacing
placeholders with page dependent contents.
Possible implementations would be:
ezcDocumentPdfDocbookPart
Receives a docbook document and renders it using a a defined style at the
header or footer of the current page. Placeholders in the text,
represented by, for example, entities might be replaced.
ezcDocumentPdfStringPart
Receives a simple string, in which simple placeholders are replaced.
Other elements
--------------
There are various possible full site elements, which might be rendered before or
after the actual contents. Those are for example:
- Cover page
- Bibliography
- Back page
To add those to on PDF document you can create a pdf set, which is then rendered
into one file::
$pdf = new ezcDocumentPdf();
$pdf->createFromDocbook( $docbook );
$set = new ezcDocumentPdfSet();
$set->parts = array(
new ezcDocumentPdfPdfPart( 'title.pdf' ),
$customTableOfContents,
$pdf,
$bibliography,
);
$set->render( 'my.pdf' );
Some of the documents aggregated in one set can of course again be documents
created from Docbook documents. Each element in the set may contain custom
layout directives.
For the inclusion of other document parts into a PdfSet you are expected to
extend from the PDF base class and implement you custom functionality there.
This could mean generating idexes, or a bibliography from the content.
Drivers
=======
The actual PDF renderer calls methods on the driver, which abstract the quirks
of the respective implementations. There will be drivers for at least:
- pecl/libharu
- TCPDF
Renderer
========
The renderer will be responsible for the actual typesetting. It will receive a
Docbook document, apply the given layout directives and calculate the
appropriate calls to the driver from those.
The renderer optionally receives a set of helper objects, which perform relevant
parts of the typesetting, like:
Hyphenator
Class implementing hyphenation for a specific language. We might provide a
default implementation, which reads standard hyphenation files.
The renderer state will be shared using an object representing the page
currently processed, which contains information about the already covered
areas and therefore the still available space.
Using such a state object, the state can easily be shared between different
renderers for different aspects of the rendering process. This should allow us
to write simpler rendering classes, which should be better maintainable then
one big renderer class, which methods would take care of all aspects.
This page state object, knowing about free space on the current page, for
example allows to float text around images spanning multiple paragraphs,
because the already covered space is encoded. This allows all renderers for
the different aspects to reuse this knowledge and depend their rendering on
this. The space already covered on a page will most probably be represented by
a list of bounding boxes.
Which renderer classes can be separated, will show up during implementation,
but those for example could be:
ezcDocumentPdfParagraphRenderer
Takes care of rendering the Docbook inline markup inside one paragraph.
Respects orphans and widows and might be required to split paragraphs.
ezcDocumentPdfTableRenderer
Renders tables. It might be useful to even split this up more into a table
row and cell renderer.
Additional renderer features
----------------------------
If the used driver class implements the respective interfaces the renderer will
also offer to sign PDF documents, or add write protection (or similar) to the
PDF document.
Example
=======
A full example for the creation of a PDF document from a HTML page could look
like::
<?php
$html = new ezcDocumentXhtml();
$html->loadFile( 'http://ezcomponents.org/introduction' );
$pdf = new ezcDocumentPdf();
$pdf->createFromDocbook( $html->getAsDocbook() );
// Load some custom layout directives
$pdf->style->load( 'my_styles.pcss' );
$pdf->style['article']['text-columns'] = 3;
// Set a custom header
$pdf->header = new ezcDocumentPdfStringPart(
'%title by %author - %pageNum / %pageCount'
);
// Set a custom paragraph renderer
$pdf->renderer->paragraph = new myPdfParagraphRenderer();
// Use the hyphenator with a german dictionary
$pdf->renderer->hyphenator = new myDictionaryHyphenator(
'/path/to/german.dict'
);
// Store the generated PDF
file_put_contents( 'my.pdf', $pdf );
?>
A file containing the layout directives could look like::
article {
page-size: "A4";
}
paragraph {
font-family: "Bitstream Vera Sans";
font-size: "1em";
}
article > title {
font-weight: "bold";
}
section title {
font-weight: "normal";
}
Classes
=======
The classes implemented for the PDF generation are:
ezcDocumentPdf
Base class, representing the PDF generation. Aggregates the style
information, the docbook source document, renderer and page parts like
footer and header.
ezcDocumentPdfSet
Class aggregating multiple ezcDocumentPdf objects, to create one single
PDF document from multiple parts, like a cover page, the actual content, a
bibliography, etc.
ezcDocumentPdfStyles
Class containing the PDF layout directives, also implements loading and
storing of those layout directives.
ezcDocumentPdfPart
Abstract base class for page parts, like headers and footers. Renders the
respective part and will be extended by multiple concrete
implementations, which offer convient rendering methods.
ezcDocumentPdfRenderer
Basic renderer class, which aggregates renderers for distinct page
elements, like paragraphs and tables, and dispatches the rendering to
them. Also maintains the ezcDocumentPdfPage state object, which contains
information of already covered parts of the pages.
ezcDocumentPdfParagraphRenderer
Example for the concrete aspect specific renderer classes, which only
implement the rendering of small parts of a document, like single
paragraphs, tables, or table cell contents.
ezcDocumentPdfPage
State object describing the current state of a single page in the PDF
document, like still available space.
ezcDocumentPdfHyphenator
Abstract base class for hyphenation implementations for more accurate word
wrapping.