forked from GNUsocial/gnu-social
317 lines
11 KiB
Plaintext
317 lines
11 KiB
Plaintext
|
==============================================
|
|||
|
Design document for ODT parsing and generation
|
|||
|
==============================================
|
|||
|
|
|||
|
:Author: ts
|
|||
|
|
|||
|
The scope of this document is to define the design for a first implementaion of
|
|||
|
ODT (Open Document Text) support in the eZ Document component. The parts of the
|
|||
|
Document component designed in this document do not affect other Open Document
|
|||
|
formats like spreadsheets or graphics. The goal is to define the infrastructure
|
|||
|
for reading and writing ODT documents, i.e. to convert existing ODT documents
|
|||
|
into the internal representation of the Document component (DocBook XML) and to
|
|||
|
generate new ODT documents from the internal representation.
|
|||
|
|
|||
|
------------
|
|||
|
Requirements
|
|||
|
------------
|
|||
|
|
|||
|
The following sections describe the requirements for the ODT handling in the
|
|||
|
Document component. The first section defines requirements for reading ODT, the
|
|||
|
second for writing ODT and the third section defines requirements for later
|
|||
|
enhancements to be kept in mind during the initial implementation.
|
|||
|
|
|||
|
Import
|
|||
|
======
|
|||
|
|
|||
|
The Document component should be able to parse existing ODT documents and to
|
|||
|
convert them to the internal format used by the Document component (DocBook
|
|||
|
XML). Requirements for the import process are:
|
|||
|
|
|||
|
- Read plain XML ODT files
|
|||
|
- Parse all necessary structural ODT elements
|
|||
|
- Convert ODT elements properly into equivalent or similar DocBook representations
|
|||
|
- Maintaining the content semantics provided by the ODT as good as possible
|
|||
|
- Maintain meta information provided by the ODT as good as possible
|
|||
|
- Develop a first heuristical approach of how ODT styling information can be
|
|||
|
used to determine semantics of an element.
|
|||
|
|
|||
|
Export
|
|||
|
======
|
|||
|
|
|||
|
The Document component should be able to generate new ODT documents from an
|
|||
|
existing internal representations (DocBook XML). Requirements for this process
|
|||
|
are:
|
|||
|
|
|||
|
- Write plain XML ODT files
|
|||
|
- Convert DocBook representation elements to their corresponding ODT
|
|||
|
representations
|
|||
|
- Maintain the document structure
|
|||
|
- Maintain content and metadata semantics as good as possible
|
|||
|
- Styling of ODT elements.
|
|||
|
|
|||
|
Later enhancements
|
|||
|
==================
|
|||
|
|
|||
|
In the first step of ODT integration only rudimentary features for import and
|
|||
|
export should be realized. The following ideas must be kept in mind during the
|
|||
|
design and implementation, to ensure future extensibility.
|
|||
|
|
|||
|
- Reading / writing of ODT package files (ZIP)
|
|||
|
- ODF can be presented either as a single XML file or as a ZIP package
|
|||
|
containg multiple XML files and other related files (e.g. images) in
|
|||
|
addition.
|
|||
|
- Reading and writing this format is not necessary from the start, but since
|
|||
|
it is the default way for users to store ODT, it should be supported later
|
|||
|
on.
|
|||
|
- The handling of ZIP files requires a tie-in with the Archive component or
|
|||
|
similar.
|
|||
|
|
|||
|
------
|
|||
|
Design
|
|||
|
------
|
|||
|
|
|||
|
In the first development cycle, only the structural conversion between ODT and
|
|||
|
DocBook XML will be considered. In addition, rudimentary styling information
|
|||
|
will be taken into account. The reading and writing of ODF packages is not
|
|||
|
considered in this design.
|
|||
|
|
|||
|
Import
|
|||
|
======
|
|||
|
|
|||
|
Three different steps are necessary to import an ODT document and convert it
|
|||
|
into DocbookXml:
|
|||
|
|
|||
|
1. Read the XML data
|
|||
|
2. Preprocess the ODT representation
|
|||
|
3. Actual conversion to DocBook XML representation
|
|||
|
|
|||
|
Step 1 will be performed through the DOM extension in PHP, the internal
|
|||
|
representation of an ODT will be a DOM treee. The second step performs
|
|||
|
pre-processing on this DOM tree. Pre-processing is e.g. needed to assign
|
|||
|
additional semantics to the ODT elements to achieve a better rendering.
|
|||
|
Finally, the pre-processed DOM tree will be visited, to achieve the actual
|
|||
|
creation of the DocBook XML representation.
|
|||
|
|
|||
|
Pre-processing
|
|||
|
--------------
|
|||
|
|
|||
|
The step of pre-processing the ODT representation is necessary to assign
|
|||
|
DocBoox semantics to the ODT elements. ODT and DocBook XML have some
|
|||
|
similarities, but also differ widely in some parts. The pre-processing step
|
|||
|
performs manipultations on the ODT representation and potentially adds
|
|||
|
information which is utilized by the latter conversion step to create a correct
|
|||
|
semantical representation.
|
|||
|
|
|||
|
This process works similar to filters in the XHTML document import. The class
|
|||
|
level design of this feature is inspired by the XHTML handling: Filters can be
|
|||
|
registered which pre-process the incoming ODT in the given order.
|
|||
|
|
|||
|
A filter may process the following steps on a DOMElement:
|
|||
|
|
|||
|
- Add type information to an XML element to determine into which DocBook XML
|
|||
|
element the element will be converted
|
|||
|
- Add attribute information to determine the attributes in the DocBook XML
|
|||
|
representation
|
|||
|
- Add additional elements or element hierarchies
|
|||
|
|
|||
|
The resulting DOM tree must not necessarily be valid ODT anymore, to reflect
|
|||
|
the latter DocBook structure in a better way.
|
|||
|
|
|||
|
The first implemented filter will only perform rudimentary operations on the
|
|||
|
DOM to assign basic semantical information to the elements. A second
|
|||
|
implementation will be an additional filter which takes some styling
|
|||
|
information into account to enhance this information. Futher filters can be
|
|||
|
implemented by third parties to extend or replace these mechanisms.
|
|||
|
|
|||
|
Conversion
|
|||
|
----------
|
|||
|
|
|||
|
The conversion process itself will mostly visit the DOM tree and utilize the
|
|||
|
information, attached to the elements in the pre-processing step, to generate a
|
|||
|
DocBook XML with the corresponding content. The filter pre-processing step is
|
|||
|
responsible to annotate all significant elements properly so that the
|
|||
|
conversion can use them.
|
|||
|
|
|||
|
Flat ODT documents (consisting of only 1 XML file), which will purely be
|
|||
|
handled in the first version of ODT support, may contain image content embeded.
|
|||
|
To extract those, the user my specify a target directory or the system temp dir
|
|||
|
will be used as the default. The content will then be referenced in DocBook
|
|||
|
from this location.
|
|||
|
|
|||
|
.. note:: We should check if it is possible to define and handle data URLs in
|
|||
|
docbook. May be problematic with other formats though. (kn)
|
|||
|
|
|||
|
Export
|
|||
|
======
|
|||
|
|
|||
|
.. note:: First sentence a bit unclear ;) (kn)
|
|||
|
|
|||
|
The export process for ODT works similar to PDF rendering, except for that is a
|
|||
|
little bit less strict. The internal DocBook representation is converted to the
|
|||
|
desired ODT representation according to its semantics.
|
|||
|
|
|||
|
Based on the DocBook XML elements, the user can define styles using a
|
|||
|
simplified CSS syntax (see PDF). Each of the style definitions is converted to
|
|||
|
an automatic style in the resulting ODT document. ODT elements affected by a
|
|||
|
certain style get this style applied.
|
|||
|
|
|||
|
Styles
|
|||
|
======
|
|||
|
|
|||
|
A style is defined for each styling information. There is no direct assignement
|
|||
|
of layouting elements to styling information, but always a style in between.
|
|||
|
The <style/> element has the following properties:
|
|||
|
|
|||
|
name
|
|||
|
The internal name of the style. Must be unique over all styles, in
|
|||
|
concatenation with the style:family.
|
|||
|
displayname
|
|||
|
Name of the style to display in GUIs. If left out, the name is used.
|
|||
|
family
|
|||
|
Family collection of the style. One of (in context of text documents):
|
|||
|
text
|
|||
|
Style that might be applied to any piece of text.
|
|||
|
paragraph
|
|||
|
Style for complete paragraphs and headings.
|
|||
|
section
|
|||
|
Style to be applied to sections of text in text documents (@TODO: Not
|
|||
|
handled yet!).
|
|||
|
ruby
|
|||
|
Not handled, yet.
|
|||
|
table
|
|||
|
table-column
|
|||
|
table-row
|
|||
|
table-cell
|
|||
|
table-page
|
|||
|
chart
|
|||
|
default
|
|||
|
graphic
|
|||
|
parent-style-name
|
|||
|
Identifies a parten style. Style properties of the parent are inherited and
|
|||
|
maybe overwritten. If no parent style is specified, the default style for
|
|||
|
the styles family will be the base for inheritence.
|
|||
|
next-style-name
|
|||
|
Next paragraph style. If a new paragraph is started after the element this
|
|||
|
style is applied to, this paragraph will have the style named in this
|
|||
|
element. Only sensible for editing in a GUI.
|
|||
|
list-style-name
|
|||
|
Style used in headings and paragraphs of lists contained in the styled
|
|||
|
element, only if the lists have no list-style applied themselves.
|
|||
|
master-page-name
|
|||
|
Styles with a master page applied will force a page break before the
|
|||
|
element and load the styles from the master-page then.
|
|||
|
data-style-name
|
|||
|
Styling of table cells (e.g. formulas, currencies, ...).
|
|||
|
class
|
|||
|
Information for GUIs, to sort styles into categories.
|
|||
|
default-outline-level
|
|||
|
"Transforms" a paragraph into some kind of heading, without making it a
|
|||
|
heading itself. Senseless.
|
|||
|
|
|||
|
Style mappings (replacing a style conditionally with another style) will not be
|
|||
|
taken into account, yet.
|
|||
|
|
|||
|
Types of styles
|
|||
|
---------------
|
|||
|
|
|||
|
default-style
|
|||
|
Default styles must be defined for each used style family. The default
|
|||
|
style is always the base of inheritance for the style family.
|
|||
|
page-layout
|
|||
|
Definition of the global page properties, format and stuff.
|
|||
|
header-style / footer-style
|
|||
|
Styling of the header and footer area.
|
|||
|
master-page
|
|||
|
Definition of a master page. Defines header / footer, forms, styles for the
|
|||
|
page and more.
|
|||
|
|
|||
|
Table templates
|
|||
|
---------------
|
|||
|
|
|||
|
Not yet handled.
|
|||
|
|
|||
|
Font face declaration
|
|||
|
---------------------
|
|||
|
|
|||
|
Correspond to the @font-face declaration of CSS2.
|
|||
|
|
|||
|
Data styles
|
|||
|
-----------
|
|||
|
|
|||
|
Not yet handled.
|
|||
|
|
|||
|
List styles
|
|||
|
-----------
|
|||
|
|
|||
|
Define properties of a list (not its content!). A style for each list level. If
|
|||
|
no style exists for a specific level, the next lower level style is used. If
|
|||
|
none is defined, a default style is used. name and display-name properties as
|
|||
|
ususal. Can have the consecutive-numbering attribute defined, to specify if
|
|||
|
different list levels restart numbering or not
|
|||
|
|
|||
|
List styles
|
|||
|
-----------
|
|||
|
|
|||
|
Define properties of a list (not its content!). A style for each list level. If
|
|||
|
no style exists for a specific level, the next lower level style is used. If
|
|||
|
none is defined, a default style is used. name and display-name properties as
|
|||
|
ususal. Can have the consecutive-numbering attribute defined, to specify if
|
|||
|
different list levels restart numbering or not.
|
|||
|
|
|||
|
List-level styles
|
|||
|
^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
A list-level style commonly has a level attribute, defining, to which
|
|||
|
list-level the style is applied. All other attribute depend on the type of
|
|||
|
list. A list may contain different kinds of lists, depending on the depth of
|
|||
|
the level.
|
|||
|
|
|||
|
Number level styles
|
|||
|
~~~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
Defining an enumeration list level using a list-level-style-number element. Has
|
|||
|
the following attributes:
|
|||
|
|
|||
|
style-name
|
|||
|
Defines the text style for list item numbers.
|
|||
|
num-format
|
|||
|
Defines the formatting of the list item numbers.
|
|||
|
display-levels
|
|||
|
Defines how many level numberings to display (e.g. 1.2.3 or just 1.2).
|
|||
|
start-value
|
|||
|
Defines the first number to be used by the very first element of the
|
|||
|
defined level.
|
|||
|
|
|||
|
Bullet level style
|
|||
|
~~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
Attributes defining a list level to be an item list.
|
|||
|
|
|||
|
text-style
|
|||
|
Style for the bullet character.
|
|||
|
bullet-character
|
|||
|
A unicode character to be used as the bullet.
|
|||
|
num-format-prefix / num-format-suffix
|
|||
|
Prefix and suffix to be placed before / after a bullet.
|
|||
|
bullet-relative-size
|
|||
|
Relative size (percentage, integer) of the bullet in respect to the item
|
|||
|
content.
|
|||
|
|
|||
|
Image level style
|
|||
|
~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
Creates items preceeded by images. The image to be used is either referenced or
|
|||
|
stored using base64 encoded binary data.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
..
|
|||
|
Local Variables:
|
|||
|
mode: rst
|
|||
|
fill-column: 79
|
|||
|
End:
|
|||
|
vim: et syn=rst tw=79
|