gnu-social/vendor/zetacomponents/document/design/odt_design.txt

317 lines
11 KiB
Plaintext
Raw Normal View History

2020-08-07 23:42:38 +01:00
==============================================
Design document for ODT parsing and generation
==============================================
:Author: ts
The scope of this document is to define the design for a first implementaion of
ODT (Open Document Text) support in the eZ Document component. The parts of the
Document component designed in this document do not affect other Open Document
formats like spreadsheets or graphics. The goal is to define the infrastructure
for reading and writing ODT documents, i.e. to convert existing ODT documents
into the internal representation of the Document component (DocBook XML) and to
generate new ODT documents from the internal representation.
------------
Requirements
------------
The following sections describe the requirements for the ODT handling in the
Document component. The first section defines requirements for reading ODT, the
second for writing ODT and the third section defines requirements for later
enhancements to be kept in mind during the initial implementation.
Import
======
The Document component should be able to parse existing ODT documents and to
convert them to the internal format used by the Document component (DocBook
XML). Requirements for the import process are:
- Read plain XML ODT files
- Parse all necessary structural ODT elements
- Convert ODT elements properly into equivalent or similar DocBook representations
- Maintaining the content semantics provided by the ODT as good as possible
- Maintain meta information provided by the ODT as good as possible
- Develop a first heuristical approach of how ODT styling information can be
used to determine semantics of an element.
Export
======
The Document component should be able to generate new ODT documents from an
existing internal representations (DocBook XML). Requirements for this process
are:
- Write plain XML ODT files
- Convert DocBook representation elements to their corresponding ODT
representations
- Maintain the document structure
- Maintain content and metadata semantics as good as possible
- Styling of ODT elements.
Later enhancements
==================
In the first step of ODT integration only rudimentary features for import and
export should be realized. The following ideas must be kept in mind during the
design and implementation, to ensure future extensibility.
- Reading / writing of ODT package files (ZIP)
- ODF can be presented either as a single XML file or as a ZIP package
containg multiple XML files and other related files (e.g. images) in
addition.
- Reading and writing this format is not necessary from the start, but since
it is the default way for users to store ODT, it should be supported later
on.
- The handling of ZIP files requires a tie-in with the Archive component or
similar.
------
Design
------
In the first development cycle, only the structural conversion between ODT and
DocBook XML will be considered. In addition, rudimentary styling information
will be taken into account. The reading and writing of ODF packages is not
considered in this design.
Import
======
Three different steps are necessary to import an ODT document and convert it
into DocbookXml:
1. Read the XML data
2. Preprocess the ODT representation
3. Actual conversion to DocBook XML representation
Step 1 will be performed through the DOM extension in PHP, the internal
representation of an ODT will be a DOM treee. The second step performs
pre-processing on this DOM tree. Pre-processing is e.g. needed to assign
additional semantics to the ODT elements to achieve a better rendering.
Finally, the pre-processed DOM tree will be visited, to achieve the actual
creation of the DocBook XML representation.
Pre-processing
--------------
The step of pre-processing the ODT representation is necessary to assign
DocBoox semantics to the ODT elements. ODT and DocBook XML have some
similarities, but also differ widely in some parts. The pre-processing step
performs manipultations on the ODT representation and potentially adds
information which is utilized by the latter conversion step to create a correct
semantical representation.
This process works similar to filters in the XHTML document import. The class
level design of this feature is inspired by the XHTML handling: Filters can be
registered which pre-process the incoming ODT in the given order.
A filter may process the following steps on a DOMElement:
- Add type information to an XML element to determine into which DocBook XML
element the element will be converted
- Add attribute information to determine the attributes in the DocBook XML
representation
- Add additional elements or element hierarchies
The resulting DOM tree must not necessarily be valid ODT anymore, to reflect
the latter DocBook structure in a better way.
The first implemented filter will only perform rudimentary operations on the
DOM to assign basic semantical information to the elements. A second
implementation will be an additional filter which takes some styling
information into account to enhance this information. Futher filters can be
implemented by third parties to extend or replace these mechanisms.
Conversion
----------
The conversion process itself will mostly visit the DOM tree and utilize the
information, attached to the elements in the pre-processing step, to generate a
DocBook XML with the corresponding content. The filter pre-processing step is
responsible to annotate all significant elements properly so that the
conversion can use them.
Flat ODT documents (consisting of only 1 XML file), which will purely be
handled in the first version of ODT support, may contain image content embeded.
To extract those, the user my specify a target directory or the system temp dir
will be used as the default. The content will then be referenced in DocBook
from this location.
.. note:: We should check if it is possible to define and handle data URLs in
docbook. May be problematic with other formats though. (kn)
Export
======
.. note:: First sentence a bit unclear ;) (kn)
The export process for ODT works similar to PDF rendering, except for that is a
little bit less strict. The internal DocBook representation is converted to the
desired ODT representation according to its semantics.
Based on the DocBook XML elements, the user can define styles using a
simplified CSS syntax (see PDF). Each of the style definitions is converted to
an automatic style in the resulting ODT document. ODT elements affected by a
certain style get this style applied.
Styles
======
A style is defined for each styling information. There is no direct assignement
of layouting elements to styling information, but always a style in between.
The <style/> element has the following properties:
name
The internal name of the style. Must be unique over all styles, in
concatenation with the style:family.
displayname
Name of the style to display in GUIs. If left out, the name is used.
family
Family collection of the style. One of (in context of text documents):
text
Style that might be applied to any piece of text.
paragraph
Style for complete paragraphs and headings.
section
Style to be applied to sections of text in text documents (@TODO: Not
handled yet!).
ruby
Not handled, yet.
table
table-column
table-row
table-cell
table-page
chart
default
graphic
parent-style-name
Identifies a parten style. Style properties of the parent are inherited and
maybe overwritten. If no parent style is specified, the default style for
the styles family will be the base for inheritence.
next-style-name
Next paragraph style. If a new paragraph is started after the element this
style is applied to, this paragraph will have the style named in this
element. Only sensible for editing in a GUI.
list-style-name
Style used in headings and paragraphs of lists contained in the styled
element, only if the lists have no list-style applied themselves.
master-page-name
Styles with a master page applied will force a page break before the
element and load the styles from the master-page then.
data-style-name
Styling of table cells (e.g. formulas, currencies, ...).
class
Information for GUIs, to sort styles into categories.
default-outline-level
"Transforms" a paragraph into some kind of heading, without making it a
heading itself. Senseless.
Style mappings (replacing a style conditionally with another style) will not be
taken into account, yet.
Types of styles
---------------
default-style
Default styles must be defined for each used style family. The default
style is always the base of inheritance for the style family.
page-layout
Definition of the global page properties, format and stuff.
header-style / footer-style
Styling of the header and footer area.
master-page
Definition of a master page. Defines header / footer, forms, styles for the
page and more.
Table templates
---------------
Not yet handled.
Font face declaration
---------------------
Correspond to the @font-face declaration of CSS2.
Data styles
-----------
Not yet handled.
List styles
-----------
Define properties of a list (not its content!). A style for each list level. If
no style exists for a specific level, the next lower level style is used. If
none is defined, a default style is used. name and display-name properties as
ususal. Can have the consecutive-numbering attribute defined, to specify if
different list levels restart numbering or not
List styles
-----------
Define properties of a list (not its content!). A style for each list level. If
no style exists for a specific level, the next lower level style is used. If
none is defined, a default style is used. name and display-name properties as
ususal. Can have the consecutive-numbering attribute defined, to specify if
different list levels restart numbering or not.
List-level styles
^^^^^^^^^^^^^^^^^
A list-level style commonly has a level attribute, defining, to which
list-level the style is applied. All other attribute depend on the type of
list. A list may contain different kinds of lists, depending on the depth of
the level.
Number level styles
~~~~~~~~~~~~~~~~~~~
Defining an enumeration list level using a list-level-style-number element. Has
the following attributes:
style-name
Defines the text style for list item numbers.
num-format
Defines the formatting of the list item numbers.
display-levels
Defines how many level numberings to display (e.g. 1.2.3 or just 1.2).
start-value
Defines the first number to be used by the very first element of the
defined level.
Bullet level style
~~~~~~~~~~~~~~~~~~
Attributes defining a list level to be an item list.
text-style
Style for the bullet character.
bullet-character
A unicode character to be used as the bullet.
num-format-prefix / num-format-suffix
Prefix and suffix to be placed before / after a bullet.
bullet-relative-size
Relative size (percentage, integer) of the bullet in respect to the item
content.
Image level style
~~~~~~~~~~~~~~~~~
Creates items preceeded by images. The image to be used is either referenced or
stored using base64 encoded binary data.
..
Local Variables:
mode: rst
fill-column: 79
End:
vim: et syn=rst tw=79