forked from GNUsocial/gnu-social
203 lines
6.1 KiB
Plaintext
203 lines
6.1 KiB
Plaintext
|
===============================
|
||
|
Requirements for PDF generation
|
||
|
===============================
|
||
|
|
||
|
Generation of PDF documents should be added to the document component. Those
|
||
|
PDF documents should be created from Docbook documents, which can be generated
|
||
|
from each markup available in the document component.
|
||
|
|
||
|
This document summarizes the requirements for PDF generation.
|
||
|
|
||
|
Central requirements
|
||
|
====================
|
||
|
|
||
|
The key requirements for the component
|
||
|
|
||
|
Layout
|
||
|
------
|
||
|
|
||
|
The central requirement is to generate user styles PDF documents from Docbook
|
||
|
markup. The customized styling should include, but is not limited to:
|
||
|
|
||
|
- Text
|
||
|
- Fonts and text sizes
|
||
|
- Line heights
|
||
|
- Colors
|
||
|
- Alignments
|
||
|
|
||
|
- Pages
|
||
|
- Footer and headers
|
||
|
- Background images
|
||
|
- Multiple text columns per page
|
||
|
- Page sizes
|
||
|
- Margins and paddings
|
||
|
|
||
|
- Block level elements (tables, graphics, literal blocks, ...)
|
||
|
- Borders, backgrounds, fonts, colors
|
||
|
|
||
|
It must be possible to assign different styles depending on the parent
|
||
|
elements in the Docbook markup, so that the titles in the following Docbook
|
||
|
document can be formatted differently::
|
||
|
|
||
|
<?xml version="1.0"?>
|
||
|
<article>
|
||
|
<title>Article title</title>
|
||
|
<section>
|
||
|
<title>First heading</title>
|
||
|
<section>
|
||
|
<title>Second heading</title>
|
||
|
</section>
|
||
|
</section>
|
||
|
</article>
|
||
|
|
||
|
It would be nice if the styles can be imported and exported from / to a easily
|
||
|
readable and writeable format.
|
||
|
|
||
|
Text formatting
|
||
|
---------------
|
||
|
|
||
|
Proper formatting of texts is most probably the biggest problem in the
|
||
|
implementation, the requirements include:
|
||
|
|
||
|
Hyphenation
|
||
|
^^^^^^^^^^^
|
||
|
|
||
|
Especially justified texts in narrow text columns requires hyphenation for
|
||
|
words, otherwise the blanks between characters and words might increase to
|
||
|
much. A pluggable hyphenation mechanism is required, which can be adapted to
|
||
|
different languages, based on externally available dictionaries.
|
||
|
|
||
|
Widows and orphans
|
||
|
^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
See: http://en.wikipedia.org/wiki/Widows_and_orphans
|
||
|
|
||
|
There should be ways to configure the thresholds under which paragraphs are
|
||
|
considered widows or orphans, which should be avoided.
|
||
|
|
||
|
Inline formatting
|
||
|
^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
Depending on the used font and styles inline formatting might have a serious
|
||
|
effect on the text width. This MUST be respecting during text rendering.
|
||
|
|
||
|
LTF and RTL languages
|
||
|
^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
The text wrapping must be able to work with left-to-right and right-to-left
|
||
|
languages.
|
||
|
|
||
|
Floating media objects
|
||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
For media objects, which do not span the whole column width, it should be
|
||
|
possible to float text around the media objects. Detection of the actual image
|
||
|
borders is not required - the rectangular frame around the image should be
|
||
|
sufficant for text floating.
|
||
|
|
||
|
Embedding of media
|
||
|
------------------
|
||
|
|
||
|
There are a lot of different media types, which might be embedded into PDF:
|
||
|
The most common format seem to be JPEG and EPS. JPEG is not suitable for
|
||
|
several types of graphics [1], and EPS can only be used properly for some
|
||
|
types of vector based images. Conversion options and supported formats must be
|
||
|
evaluated.
|
||
|
|
||
|
It might depend on the used driver which formats are supported.
|
||
|
|
||
|
PDI allows embedding of other PDFs inside the created PDF - this can be useful
|
||
|
when merging different generated documents.
|
||
|
|
||
|
.. [1] http://kore-nordmann.de/blog/image_formats.html
|
||
|
|
||
|
Metadata
|
||
|
--------
|
||
|
|
||
|
The document component already preserves metadata associated with documents.
|
||
|
PDF supports embedding additional document metadata. This should definitely
|
||
|
be embedded, but it might also be useful to offer a easy accessible API for
|
||
|
embedding of additional metadata. XMP is especially designed to embed metadata
|
||
|
using the RDF.
|
||
|
|
||
|
Autogenerated contents
|
||
|
----------------------
|
||
|
|
||
|
Headers and footers often contain some fixed texts, but might also contain
|
||
|
autogenerated contents, like:
|
||
|
|
||
|
- Current page / number of pages / page orientation (left, right)
|
||
|
- Current section title
|
||
|
- Author, read from document metadata
|
||
|
|
||
|
It must be possible to define callbacks which generate those contents for the
|
||
|
page they are currently rendered on. The best possible markup used for
|
||
|
generation of those contents needs to be evaluated.
|
||
|
|
||
|
There are several elements, which can require automatic generation, those are
|
||
|
at least:
|
||
|
|
||
|
- Header / Footer
|
||
|
- Cover page
|
||
|
- Table of contents
|
||
|
- Back page
|
||
|
|
||
|
For most of those elements a predefined generator can be implemented which
|
||
|
creates meaningful default contents, and then can be extended by the user.
|
||
|
Especially for cover and back pages it might be useful to include them
|
||
|
directly from other PDF documents.
|
||
|
|
||
|
Driver infrastructure
|
||
|
---------------------
|
||
|
|
||
|
There are multiple ways to generate PDF documents, like:
|
||
|
|
||
|
- pecl/libharu
|
||
|
- FPDF
|
||
|
- TCPDF
|
||
|
- pdflib
|
||
|
- Zend_PDF
|
||
|
|
||
|
It might depend on the environment which one of those libraries is available
|
||
|
and performs the best. A driver infrastructure should offer the user the
|
||
|
choice of selecting the best output driver for writing the actual PDF. Not all
|
||
|
of those drivers do support proper text wrapping themselves, so that this
|
||
|
cannot be handed over to the drivers.
|
||
|
|
||
|
Optional requirements
|
||
|
=====================
|
||
|
|
||
|
Once PDF rendering is implemented correctly, including correct rendering and
|
||
|
wrapping of texts, it might be useful in similar cases, for example:
|
||
|
|
||
|
- SVG to PDF conversion
|
||
|
|
||
|
The conversion of SVG to PDF is used for distribution of heavily customized
|
||
|
designed documents. With a proper rendering infrastructure the API should be
|
||
|
kept flexible enough to support such conversions later
|
||
|
|
||
|
- HTML to PDF conversion
|
||
|
|
||
|
It might be useful to directly convert styled HTML to PDF - if the API stays
|
||
|
flexible enough this should be possible to add later.
|
||
|
|
||
|
One major problem might be the used markup for formatting of inline text
|
||
|
elements.
|
||
|
|
||
|
Import of PDF pages
|
||
|
-------------------
|
||
|
|
||
|
For cover pages (or similar) of documents it might be useful to extract whole
|
||
|
pages from other PDFs and embed them in the generated PDF document.
|
||
|
|
||
|
This requires reading of existing PDF documents, though - which is not planned
|
||
|
to be implemented yet.
|
||
|
|
||
|
Signing PDFs / write protection
|
||
|
-------------------------------
|
||
|
|
||
|
It is common to make PDF documents write protected or sign PDF documents. If
|
||
|
the respective PDF creation library can handle that, it should be exposed in
|
||
|
the API of the PDF creation.
|
||
|
|