forked from GNUsocial/gnu-social
160 lines
5.2 KiB
Plaintext
160 lines
5.2 KiB
Plaintext
eZ Component: Document, Design
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Design description
|
|
==================
|
|
|
|
ezcDocument interface
|
|
---------------------
|
|
|
|
Interface that defines an abstract document class. Classes that implement
|
|
this interface are called after the format's name, for example 'ezcDocumentHtml'.
|
|
|
|
These classes are able to generate document objects from a certain type of data
|
|
(text, DOM, XML, file) with static functions like 'createFromText()'. When the
|
|
object is created, it's possible to get it's content in any type of data that
|
|
is supported by this format using functions like 'getText()' or 'getXML()'.
|
|
If a certain given type of data is not supported by this format, it will throw
|
|
a error.
|
|
|
|
ezcDocConverter interface
|
|
-------------------------
|
|
|
|
Interface that defines an abstract conversion class. Real conversion classes
|
|
will implement conversion of the given document from one format to another.
|
|
The names of that classes follow the pattern: ezcDocFormat1ToFormat2, where
|
|
Format1 and Format2 are format names like 'Html' or 'Docbook", for example:
|
|
'ezcDocHtmlToDocbook' implements conversion from HTML to DocBook.
|
|
|
|
The main function of this interface 'convert()' takes a document in one format
|
|
and return it in another. Both argument and return value are objects of
|
|
'ezcDocument' implementation classes.
|
|
|
|
ezcDocParser class
|
|
------------------
|
|
|
|
Contains methods for text document parsing using a formal grammar. Exact formal
|
|
grammars and format-specific callback functions (if needed) are set in
|
|
format handling classes.
|
|
|
|
ezcDocXSLTTransformer class
|
|
---------------------------
|
|
|
|
A class based on ezcDocConverterBase for transforming DOM documents using special
|
|
rules and element callback handlers. Contains only methods for document
|
|
transformation. Exact rules and element handlers are set in derived classes.
|
|
|
|
ezcDocOutput class
|
|
------------------
|
|
|
|
Performs an output of the given document tree in the text format using simple
|
|
internal templating system. Also it cares about text indenting to show the
|
|
structure of the document. Exact templates for element output and helper
|
|
formatting functions are set in derived classes.
|
|
|
|
ezcDocOutputTemplate class
|
|
--------------------------
|
|
|
|
Implemented in the DocumentTemplateTieIn component. It extends ezcDocOutput
|
|
class for using Template component for elements output.
|
|
|
|
ezcDocValidator
|
|
---------------
|
|
|
|
Validates a document or a separate element against it's schema. This class uses
|
|
RelaxNG schema format as the input.
|
|
|
|
|
|
Algorithms
|
|
==========
|
|
|
|
Transforming XML
|
|
----------------
|
|
|
|
XML documents are transformed using XSLT stylesheets and XSL extension for PHP.
|
|
Transformations are done with ezcDocXSLTTransformer class.
|
|
|
|
|
|
Parsing text/XML
|
|
----------------
|
|
|
|
ezcDocParser class performs a parsing of the input text and presents
|
|
it as a DOM tree.
|
|
|
|
This is not an implementation of a real context-free parser.
|
|
There is an assumption that input language is XML-like, i.e. consists
|
|
of elements that have their opening and ending parts and some
|
|
content between them (that may contain another elements).
|
|
|
|
Sometimes it's hard or impossible to formalize input in these terms,
|
|
so some special algorithms or custom element handlers will be used
|
|
in this case.
|
|
|
|
Document output
|
|
---------------
|
|
|
|
ezcDocOutput class performs an output of the given document tree in the text
|
|
format using simple internal templating system. Also it cares about text
|
|
indenting to show the structure of the document.
|
|
|
|
Exact templates for element output and helper formatting functions are set
|
|
in derived classes. Templates are simple strings in which some character
|
|
or sequence is replaced with another string using str_replace.
|
|
|
|
ezcDocOutputTemplate class is implemented in the DocumentTemplateTieIn
|
|
component. It extends this class to use Template component for elements
|
|
output.
|
|
|
|
Validating documents
|
|
--------------------
|
|
|
|
ezcDocValidator is used to validate a document or a separate element
|
|
against it's schema.
|
|
|
|
This class uses RelaxNG schema format as the input, then transforms it
|
|
to the inner format for fast processing. The processed schema is stored
|
|
in cached .php file for faster access in the future.
|
|
|
|
The idea for fast validation is using regular expressions and strings.
|
|
Here is an example:
|
|
|
|
<element name="elem1">
|
|
<zeroOrMore>
|
|
<element name="elem2">
|
|
...
|
|
</element>
|
|
</zeroOrMore>
|
|
<element name="elem3">
|
|
...
|
|
</element>
|
|
</element>
|
|
|
|
This RelaxNG schema for the element's content can be presented with regexp:
|
|
|
|
'#(elem2)*elem3#'
|
|
|
|
Validated document element's children can be also presented with a string,
|
|
like 'elem2elem2elem3' for instance, which is validated with this regexp.
|
|
|
|
The similar process used for attributes.
|
|
|
|
|
|
Examples
|
|
========
|
|
|
|
Converting Format1 to Format2
|
|
-----------------------------
|
|
|
|
$docFormat1 = new ezcDocumentText( $text, 'format1' );
|
|
$converter1 = new ezcDocFormat1ToInternal( $parameters1 );
|
|
$docInternal = $converter1->convert( $docFormat1 );
|
|
$converter2 = new ezcDocInternalToFormat2( $parameters2 );
|
|
$docFormat2 = $converter2->convert( $docInternal );
|
|
$result = $docFormat2->getText();
|
|
|
|
/// The same with static functions:
|
|
|
|
$docFormat1 = new ezcDocumentText( $text, 'format1' );
|
|
$docInternal = ezcDocFormat1ToInternal::convert( $docFormat1, $parameters1 );
|
|
$docFormat2 = ezcDocInternalToFormat2::convert( $docInternal, $parameters2 );
|
|
$result = $docFormat2->getText(); |