Added section 3.

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1813 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
kostis 2007-03-08 12:07:35 +00:00
parent 3b4bfa28f0
commit 9f4dc198ba

View File

@ -115,16 +115,18 @@ fully populated only when it is certain that execution will enter the
clause body. While shallow backtracking avoids some of the performance clause body. While shallow backtracking avoids some of the performance
problems of unnecessary choice point creation, it does not offer the problems of unnecessary choice point creation, it does not offer the
full benefits that indexing can provide. Other systems like full benefits that indexing can provide. Other systems like
BIM-Prolog~\cite{IndexingProlog@NACLP-89}, ilProlog, BIM-Prolog~\cite{IndexingProlog@NACLP-89}, SWI-Prolog~\cite{SWI} and
SWI-Prolog~\cite{SWI}, and XSB~\cite{XSB} allow for user-controlled XSB~\cite{XSB} allow for user-controlled multi-argument indexing (via
multi-argument indexing (via an \code{:-~index} directive). an \code{:-~index} directive). Notably, ilProlog~\cite{ilProlog} uses
Unfortunatelly, this support comes with various implementation compile-time heuristics and generates code for multi-argument indexing
restrictions. For example, in SWI-Prolog at most four arguments can be automatically. In all these systems, this support comes with various
indexed; in XSB the compiler does not offer multi-argument indexing implementation restrictions. For example, in SWI-Prolog at most four
and the predicates need to be asserted instead; we know of no system arguments can be indexed; in XSB the compiler does not offer
where multi-argument indexing looks inside compound terms. More multi-argument indexing and the predicates need to be asserted
importantly, requiring users to specify arguments to index on is instead; we know of no system where multi-argument indexing looks
neither user-friendly nor guarantees good performance results. inside compound terms. More importantly, requiring users to specify
arguments to index on is neither user-friendly nor guarantees good
performance results.
% Trees, tries and unification factoring: % Trees, tries and unification factoring:
Recognizing the need for better indexing, researchers have proposed Recognizing the need for better indexing, researchers have proposed
@ -153,8 +155,8 @@ predicates are known we run the risk of doing indexing on output
arguments, whose only effect is an unnecessary increase in compilation arguments, whose only effect is an unnecessary increase in compilation
times and, more importantly, in code size. In a programming language times and, more importantly, in code size. In a programming language
like Mercury~\cite{Mercury@JLP-96} where modes are known the compiler like Mercury~\cite{Mercury@JLP-96} where modes are known the compiler
can of course avoid this risk; in Mercury modes are in fact used to can of course avoid this risk; indeed in Mercury modes (and types) are
guide the compiler in generating indexing tables. However, the used to guide the compiler generate good indexing tables. However, the
situation is different for a language like Prolog. Getting accurate situation is different for a language like Prolog. Getting accurate
information about the set of all possible modes of predicates requires information about the set of all possible modes of predicates requires
a global static analyzer in the compiler --- and most Prolog systems a global static analyzer in the compiler --- and most Prolog systems
@ -177,8 +179,8 @@ analysis since queries are often ad hoc and generated only during
runtime as new hypotheses are formed or refined. runtime as new hypotheses are formed or refined.
% %
Our thesis is that the Prolog abstract machine should be able to adapt Our thesis is that the Prolog abstract machine should be able to adapt
automatically to the runtime requirements of such, or even better of automatically to the runtime requirements of such or, even better, of
all, applications by employing increasingly agressive forms of dynamic all applications by employing increasingly agressive forms of dynamic
compilation. As a concrete example of what this means in practice, in compilation. As a concrete example of what this means in practice, in
this paper we will attack the problem of providing effective indexing this paper we will attack the problem of providing effective indexing
during runtime. Naturally, we will base our technique on the existing during runtime. Naturally, we will base our technique on the existing
@ -192,40 +194,57 @@ sections.
%\end{itemize} %\end{itemize}
\section{Preliminaries} \label{sec:prelims} \section{Indexing in the WAM} \label{sec:prelims}
%========================================== %================================================
To make the paper relatively self-contained we briefly review the
indexing instructions of the WAM and their use. In the WAM, the first
level of dispatching involves a test on the type of the argument. The
\switchONterm instruction checks the tag of the dereferenced value in
the first argument register and implements a four-way branch where one
branch is for the dereferenced register being an unbound variable, one
for being atomic, one for (non-empty) list, and one for structure. In
any case, control goes to a (possibly empty) bucket of clauses. In the
buckets for constants and structures the second level of dispatching
involves the value of the register. The \switchONconstant and
\switchONstructure instructions implement this dispatching, typically
with a \fail instruction when the bucket is empty, with a \jump
instruction for only one clause, with a sequential scan when the
number of clauses is small, and with a hash lookup when the number of
clauses exceeds a threshold. For this reason the \switchONconstant and
\switchONstructure instructions take as arguments a hash table
\instr{T} and the number of clauses \instr{N} the table contains (or
equivalently, \instr{N} is the size of the hash table). In each bucket
of this hash table and also in the bucket for the variable case of
\switchONterm the code performs a sequential backtracking search of
the clauses using a \TryRetryTrust chain of instructions. The \try
instruction sets up a choice point, the \retry instructions (if any)
update certain fields of this choice point, and the \trust instruction
removes it.
The WAM has additional indexing instructions (\instr{try\_me\_else}
and friends) that allow indexing to be intersperced with the code of
clauses. For simplicity we will not consider them here. This is not a
problem since the above scheme handles all cases. Also, we will feel
free to do some minor modifications and optimizations when this
simplifies things.
\section{Demand-Driven Indexing of Static Predicates} \label{sec:static} We present an example. Consider the Prolog code shown in
%=======================================================================
For static predicates the compiler has complete information about all
clauses and shapes of their head arguments. It is both desirable and
possible to take advantage of this information at compile time and so
we treat the case of static predicates separately.
%
We will do so with schemes of increasing effectiveness and
implementation complexity.
\subsection{A simple WAM extension for any argument indexing}
%------------------------------------------------------------
Let us initially consider the case where the predicates to index
consist only of Datalog facts. This is commonly the case for all
extensional database predicates where indexing is most effective and
called for. One such code example is shown in
Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine
learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}. learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}.
The five clauses get compiled to the WAM code shown in The five clauses get compiled to the WAM code shown in
Fig.~\ref{fig:carc:clauses}. Assuming first argument indexing as Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the
default, the indexing code that a Prolog compiler generates is shown indexing code that a Prolog compiler generates is shown in
in Fig.~\ref{fig:carc:index}. This code is typically placed before the Fig.~\ref{fig:carc:index}. This code is typically placed before the
code for the clauses and the \switchONconstant instruction is the code for the clauses and the \switchONconstant instruction is the
entry point of predicate. Note that compared with vanilla WAM this entry point of predicate. Note that compared with vanilla WAM this
instruction has an extra argument: the register on the value of which instruction has an extra argument: the register on the value of which
we will index ($r_1$). Another difference from the WAM is that if this we will index ($r_1$). The extra argument will allow us to go beyond
argument register contains an unbound variable instead of a constant first argument indexing. Another departure from the WAM is that if
then execution will continue with the next instruction. The reason for this argument register contains an unbound variable instead of a
the extra argument and this small change in the behavior of constant then execution will continue with the next instruction; in
\switchONconstant will become apparent soon. effect we have merged part of the functionality of \switchONterm into
the \switchONconstant instruction. This small change in the behavior
of \switchONconstant will allow us to get \JITI. Let's see how.
%------------------------------------------------------------------------------ %------------------------------------------------------------------------------
\begin{figure}[t] \begin{figure}[t]
@ -325,6 +344,26 @@ the extra argument and this small change in the behavior of
\end{figure} \end{figure}
%------------------------------------------------------------------------------ %------------------------------------------------------------------------------
\section{Demand-Driven Indexing of Static Predicates} \label{sec:static}
%=======================================================================
For static predicates the compiler has complete information about all
clauses and shapes of their head arguments. It is both desirable and
possible to take advantage of this information at compile time and so
we treat the case of static predicates separately.
%
We will do so with schemes of increasing effectiveness and
implementation complexity.
\subsection{A simple WAM extension for any argument indexing}
%------------------------------------------------------------
Let us initially consider the case where the predicates to index
consist only of Datalog facts. This is commonly the case for all
extensional database predicates where indexing is most effective and
called for.
Refer to the example in Fig.~\ref{fig:carc}.
%
The indexing code of Fig.~\ref{fig:carc:index} incurs a small cost for The indexing code of Fig.~\ref{fig:carc:index} incurs a small cost for
a call where the first argument is a variable (namely, executing the a call where the first argument is a variable (namely, executing the
\switchONconstant instruction) but the instruction pays off for calls \switchONconstant instruction) but the instruction pays off for calls
@ -538,9 +577,9 @@ symbols can be obtained by looking at the second argument of the
\getcon instruction whose argument register is $r_2$. In the loaded \getcon instruction whose argument register is $r_2$. In the loaded
bytecode, assuming the argument register is represented in one byte, bytecode, assuming the argument register is represented in one byte,
these symbols are found $sizeof(\getcon) + sizeof(opcode) + 1$ bytes these symbols are found $sizeof(\getcon) + sizeof(opcode) + 1$ bytes
away from the clause label. Thus, multi-argument \JITI is easy to get away from the clause label; see Fig.~\ref{fig:carc:clauses}. Thus,
and the creation of index tables can be extremely fast when indexing multi-argument \JITI is easy to get and the creation of index tables
Datalog facts. can be extremely fast when indexing Datalog facts.
\subsection{Beyond Datalog and other implementation issues} \subsection{Beyond Datalog and other implementation issues}
%---------------------------------------------------------- %----------------------------------------------------------
@ -644,7 +683,7 @@ visited and continues with a \TryRetryTrust chain pointing to the
clauses. When the index construction is done, the instruction mutates clauses. When the index construction is done, the instruction mutates
to a \switchSTAR WAM instruction. to a \switchSTAR WAM instruction.
%------------------------------------------------------------------------- %-------------------------------------------------------------------------
\begin{Algorithm} \begin{Algorithm}[t]
\caption{Actions of the abstract machine with \JITI} \caption{Actions of the abstract machine with \JITI}
\label{alg:construction} \label{alg:construction}
\begin{enumerate} \begin{enumerate}
@ -733,12 +772,12 @@ $O(1)$ where $n$ is the number of clauses.
The observant reader has no doubt noticed that The observant reader has no doubt noticed that
Algorithm~\ref{alg:construction} provides multi-argument indexing but Algorithm~\ref{alg:construction} provides multi-argument indexing but
only for the main functor symbol of arguments. For clauses with only for the main functor symbol of arguments. For clauses with
structured terms that require indexing in their subterms we can either compound terms that require indexing in their subterms we can either
employ a program transformation like \emph{unification employ a program transformation like \emph{unification
factoring}~\cite{UnifFact@POPL-95} at compile time or modify the factoring}~\cite{UnifFact@POPL-95} at compile time or modify the
algorithm to consider index positions inside structure symbols. This algorithm to consider index positions inside compound terms. This is
is relatively easy to do but requires support from the register relatively easy to do but requires support from the register allocator
allocator (passing the subterms of structures in appropriate argument (passing the subterms of compound terms in appropriate argument
registers) and/or a new set of instructions. Due to space limitations registers) and/or a new set of instructions. Due to space limitations
we omit further details. we omit further details.