This repository has been archived on 2023-08-20. You can view files and clone it, but cannot push or open issues or pull requests.

21 KiB

Raw Blame History

YAP Syntax

@ingroup YAPProgrammming

We will describe the syntax of YAP at two levels. We first will describe the syntax for Prolog terms. In a second level we describe the tokens from which Prolog terms are built.

@defgroup Formal_Syntax Syntax of Terms @ingroup YAPSyntax

Below, we describe the syntax of YAP terms from the different classes of tokens defined above. The formalism used will be BNF, extended where necessary with attributes denoting integer precedence or operator type.

 term       ---->     subterm(1200)   end_of_term_marker

 subterm(N) ---->     term(M)         [M <= N]

 term(N)    ---->     op(N, fx) subterm(N-1)
             |        op(N, fy) subterm(N)
             |        subterm(N-1) op(N, xfx) subterm(N-1)
             |        subterm(N-1) op(N, xfy) subterm(N)
             |        subterm(N) op(N, yfx) subterm(N-1)
             |        subterm(N-1) op(N, xf)
             |        subterm(N) op(N, yf)

 term(0)   ---->      atom '(' arguments ')'
             |        '(' subterm(1200)  ')'
             |        '{' subterm(1200)  '}'
             |        list
             |        string
             |        number
             |        atom
             |        variable

 arguments ---->      subterm(999)
             |        subterm(999) ',' arguments

 list      ---->      '[]'
             |        '[' list_expr ']'

 list_expr ---->      subterm(999)
             |        subterm(999) list_tail

 list_tail ---->      ',' list_expr
             |        ',..' subterm(999)
             |        '|' subterm(999)

Notes:

\a op(N,T) denotes an atom which has been previously declared with type \a T and base precedence \a N.
Since ',' is itself a pre-declared operator with type \a xfy and precedence 1000, is \a subterm starts with a '(', \a op must be followed by a space to avoid ambiguity with the case of a functor followed by arguments, e.g.:

+ (a,b)        [the same as '+'(','(a,b)) of arity one]

  versus

+(a,b)         [the same as '+'(a,b) of arity two]

In the first rule for term(0) no blank space should exist between \a atom and '('.

Each term to be read by the YAP parser must end with a single dot, followed by a blank (in the sense mentioned in the previous paragraph). When a name consisting of a single dot could be taken for the end of term marker, the ambiguity should be avoided by surrounding the dot with single quotes.

@defgroup Tokens Prolog Tokens

@ingroup YAPSyntax

Prolog tokens are grouped into the following categories:

@defgroup Numbers Numbers

@ingroup Tokens

Numbers can be further subdivided into integer and floating-point numbers.

@defgroup Integers Integers

@ingroup Numbers

Integer numbers are described by the following regular expression:


<integer> := {<digit>+<single-quote>|0{xXo}}<alpha_numeric_char>+

where {...} stands for optionality, \a + optional repetition (one or more times), \a \<digit\> denotes one of the characters 0 ... 9, \a | denotes or, and \a \<single-quote\> denotes the character "'". The digits before the \a \<single-quote\> character, when present, form the number basis, that can go from 0, 1 and up to 36. Letters from A to Z are used when the basis is larger than 10.

Note that if no basis is specified then base 10 is assumed. Note also that the last digit of an integer token can not be immediately followed by one of the characters 'e', 'E', or '.'.

Following the ISO standard, YAP also accepts directives of the form 0x to represent numbers in hexadecimal base and of the form 0o to represent numbers in octal base. For usefulness, YAP also accepts directives of the form 0X to represent numbers in hexadecimal base.

Example: the following tokens all denote the same integer

10  2'1010  3'101  8'12  16'a  36'a  0xa  0o12

Numbers of the form 0'a are used to represent character constants. So, the following tokens denote the same integer:

0'd  100

YAP (version 6.3.4) supports integers that can fit the word size of the machine. This is 32 bits in most current machines, but 64 in some others, such as the Alpha running Linux or Digital Unix. The scanner will read larger or smaller integers erroneously.

@defgroup Floats Floats

@ingroup Numbers

Floating-point numbers are described by:

   <float> := <digit>+{<dot><digit>+}
               <exponent-marker>{<sign>}<digit>+
            |<digit>+<dot><digit>+
               {<exponent-marker>{<sign>}<digit>+}

where \a \<dot\> denotes the decimal-point character '.', \a \<exponent-marker\> denotes one of 'e' or 'E', and \a \<sign\> denotes one of '+' or '-'.

Examples:

10.0   10e3   10e-3   3.1415e+3

Floating-point numbers are represented as a double in the target machine. This is usually a 64-bit number.

Strings @defgroup Strings Character Strings

Strings are described by the following rules:

  string --> " string_quoted_characters "
  string --> ` string_quoted_characters `

  string_quoted_characters --> '"' '"' string_quoted_characters
  string_quoted_characters --> '\'
                          escape_sequence string_quoted_characters
  string_quoted_characters -->
                          string_character string_quoted_characters

  escape_sequence --> 'a' | 'b' | 'r' | 'f' | 't' | 'n' | 'v'
  escape_sequence --> '\' | '"' | ''' | '`'
  escape_sequence --> at_most_3_octal_digit_seq_char '\'
  escape_sequence --> 'x' at_most_2_hexa_digit_seq_char '\'

where string_character is any character except the double quote (back quote) and escape characters.

YAP supports four different textual elements:

Atoms, mentioned above, are textual representations of symbols, that are interned in the data-base. They are stored either in ISO-LATIN-1 (first 256 code points), or as UTF-32.
Strings are atomic representations of text. The back-quote character is used to identify these objects in the program. Strings exist as stack objects, in the same way as other Prolog terms. As Prolog unification cannot be used to manipulate strings, YAP includes built-ins such as string_arg/3, sub_string/5, or string_concat to manipulate them efficiently. Strings are stored as opaque objects containing a
Lists of codes represent text as a list of numbers, where each number is a character code. A string of N bytes requires N pairs, that is 2N cells, leading to a total of 16 bytes per character on 64 byte machines. Thus, they are a very expensive, but very flexible representation, as one can use unification to construct and access string elements.
Lists of atoms represent text as a list of atoms, where each number has a single character code. A string of N bytes also requires 2N pairs. They have similar properties to lists of codes.

The flags double_quotes and backquoted_string change the interpretation of text strings, they can take the values atom, string, codes, and chars.