Added introduction.

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1814 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
kostis 2007-03-08 15:19:16 +00:00
parent 9f4dc198ba
commit 52c4cfb18f

View File

@ -95,12 +95,55 @@
\section{Introduction}
%=====================
The WAM~\cite{Warren83}
The WAM~\cite{Warren83} has been both a blessing and a curse for
Prolog systems. Its ingenious design has allowed implementors to get
byte code compilers with decent performance --- it is not a fluke that
most Prolog systems are still based on the WAM. On the other hand,
\emph{because} the WAM gives good performance in many cases,
implementors have felt reluctant to explore alternatives that
drastically depart from its basic philosophy.
%
For example, first argument indexing makes sense for many Prolog
applications. For applications accessing large databases though is
clearly sub-optimal; for long time now, the database community has
recognized that good indexing mechanisms are the basis for fast query
processing.
%% The slogan ``first argument indexing is all you need'' makes sense for
%% many Prolog applications. For applications accessing large databases
%% though is clearly false; for long time now, the database community has
%% realized that indexing mechanisms are essential for fast query processing.
As logic programming applications grow in size, Prolog systems need to
efficiently access larger and larger data sets and the need for any-
and multi-argument indexing becomes more and more profound. Static
generation of multi-argument indexing is one alternative. However,
this alternative is often unattractive because it may drastically
increase the size of the generated byte code unnecessarily. Static
analysis techniques can partly address this concern, but in
applications that rely on features which are inherently dynamic (e.g.,
generating hypotheses for inductive logic programming data sets during
runtime) they are inapplicable or grossly inaccurate. Another
alternative, which has not been investigated so far, is to do flexible
indexing on demand during program execution.
This is precisely what we advocate in this paper. More specifically,
we present a minimal extension to the WAM that allows for flexible
indexing of Prolog clauses during runtime based on actual demand. For
static predicates, the scheme we propose is partly guided by the
compiler; for dynamic code, besides being demand-driven by queries,
the method needs to cater for code updates during runtime. In our
experience these schemes pay off. We have implemented \JITI in two
different Prolog systems (Yap and XXX) and have obtained non-trivial
speedups, ranging from a few percent to orders of magnitude, across a
wide range of applications. Given these results, we see very little
reason for Prolog systems not to incorporate some form of indexing
based on actual demand from queries. In fact, we see \JITI as only the
first step towards effective runtime optimization of Prolog programs.
This paper is structured as follows. After commenting on the state of
the art and related work concerning indexing in Prolog systems
(Sect.~\ref{sec:related}) we briefly review indexing in the WAM
(Sect.~\ref{sec:prelims}). We then present \JITI schemes for static
(Sect.~\ref{sec:static}) and dynamic (Sect.~\ref{sec:dynamic})
predicates, and discuss their implementation in two Prolog systems and
the performance benefits they bring (Sect.~\ref{sec:perf}). The paper
ends with some concluding remarks.
\section{State of the Art and Related Work} \label{sec:related}
@ -180,7 +223,7 @@ runtime as new hypotheses are formed or refined.
%
Our thesis is that the Prolog abstract machine should be able to adapt
automatically to the runtime requirements of such or, even better, of
all applications by employing increasingly agressive forms of dynamic
all applications by employing increasingly aggressive forms of dynamic
compilation. As a concrete example of what this means in practice, in
this paper we will attack the problem of providing effective indexing
during runtime. Naturally, we will base our technique on the existing
@ -206,12 +249,12 @@ for being atomic, one for (non-empty) list, and one for structure. In
any case, control goes to a (possibly empty) bucket of clauses. In the
buckets for constants and structures the second level of dispatching
involves the value of the register. The \switchONconstant and
\switchONstructure instructions implement this dispatching, typically
\switchONstructure instructions implement this dispatching: typically
with a \fail instruction when the bucket is empty, with a \jump
instruction for only one clause, with a sequential scan when the
number of clauses is small, and with a hash lookup when the number of
clauses exceeds a threshold. For this reason the \switchONconstant and
\switchONstructure instructions take as arguments a hash table
\switchONstructure instructions take as arguments the hash table
\instr{T} and the number of clauses \instr{N} the table contains (or
equivalently, \instr{N} is the size of the hash table). In each bucket
of this hash table and also in the bucket for the variable case of
@ -222,7 +265,7 @@ update certain fields of this choice point, and the \trust instruction
removes it.
The WAM has additional indexing instructions (\instr{try\_me\_else}
and friends) that allow indexing to be intersperced with the code of
and friends) that allow indexing to be interspersed with the code of
clauses. For simplicity we will not consider them here. This is not a
problem since the above scheme handles all cases. Also, we will feel
free to do some minor modifications and optimizations when this
@ -232,8 +275,8 @@ We present an example. Consider the Prolog code shown in
Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine
learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}.
The five clauses get compiled to the WAM code shown in
Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the
indexing code that a Prolog compiler generates is shown in
Fig.~\ref{fig:carc:clauses}. The first argument indexing indexing code
that a Prolog compiler generates is shown in
Fig.~\ref{fig:carc:index}. This code is typically placed before the
code for the clauses and the \switchONconstant instruction is the
entry point of predicate. Note that compared with vanilla WAM this
@ -772,12 +815,12 @@ $O(1)$ where $n$ is the number of clauses.
The observant reader has no doubt noticed that
Algorithm~\ref{alg:construction} provides multi-argument indexing but
only for the main functor symbol of arguments. For clauses with
compound terms that require indexing in their subterms we can either
compound terms that require indexing in their sub-terms we can either
employ a program transformation like \emph{unification
factoring}~\cite{UnifFact@POPL-95} at compile time or modify the
algorithm to consider index positions inside compound terms. This is
relatively easy to do but requires support from the register allocator
(passing the subterms of compound terms in appropriate argument
(passing the sub-terms of compound terms in appropriate argument
registers) and/or a new set of instructions. Due to space limitations
we omit further details.