Added introduction.
git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1814 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
parent
9f4dc198ba
commit
52c4cfb18f
@ -95,12 +95,55 @@
|
||||
|
||||
\section{Introduction}
|
||||
%=====================
|
||||
The WAM~\cite{Warren83}
|
||||
The WAM~\cite{Warren83} has been both a blessing and a curse for
|
||||
Prolog systems. Its ingenious design has allowed implementors to get
|
||||
byte code compilers with decent performance --- it is not a fluke that
|
||||
most Prolog systems are still based on the WAM. On the other hand,
|
||||
\emph{because} the WAM gives good performance in many cases,
|
||||
implementors have felt reluctant to explore alternatives that
|
||||
drastically depart from its basic philosophy.
|
||||
%
|
||||
For example, first argument indexing makes sense for many Prolog
|
||||
applications. For applications accessing large databases though is
|
||||
clearly sub-optimal; for long time now, the database community has
|
||||
recognized that good indexing mechanisms are the basis for fast query
|
||||
processing.
|
||||
|
||||
%% The slogan ``first argument indexing is all you need'' makes sense for
|
||||
%% many Prolog applications. For applications accessing large databases
|
||||
%% though is clearly false; for long time now, the database community has
|
||||
%% realized that indexing mechanisms are essential for fast query processing.
|
||||
As logic programming applications grow in size, Prolog systems need to
|
||||
efficiently access larger and larger data sets and the need for any-
|
||||
and multi-argument indexing becomes more and more profound. Static
|
||||
generation of multi-argument indexing is one alternative. However,
|
||||
this alternative is often unattractive because it may drastically
|
||||
increase the size of the generated byte code unnecessarily. Static
|
||||
analysis techniques can partly address this concern, but in
|
||||
applications that rely on features which are inherently dynamic (e.g.,
|
||||
generating hypotheses for inductive logic programming data sets during
|
||||
runtime) they are inapplicable or grossly inaccurate. Another
|
||||
alternative, which has not been investigated so far, is to do flexible
|
||||
indexing on demand during program execution.
|
||||
|
||||
This is precisely what we advocate in this paper. More specifically,
|
||||
we present a minimal extension to the WAM that allows for flexible
|
||||
indexing of Prolog clauses during runtime based on actual demand. For
|
||||
static predicates, the scheme we propose is partly guided by the
|
||||
compiler; for dynamic code, besides being demand-driven by queries,
|
||||
the method needs to cater for code updates during runtime. In our
|
||||
experience these schemes pay off. We have implemented \JITI in two
|
||||
different Prolog systems (Yap and XXX) and have obtained non-trivial
|
||||
speedups, ranging from a few percent to orders of magnitude, across a
|
||||
wide range of applications. Given these results, we see very little
|
||||
reason for Prolog systems not to incorporate some form of indexing
|
||||
based on actual demand from queries. In fact, we see \JITI as only the
|
||||
first step towards effective runtime optimization of Prolog programs.
|
||||
|
||||
This paper is structured as follows. After commenting on the state of
|
||||
the art and related work concerning indexing in Prolog systems
|
||||
(Sect.~\ref{sec:related}) we briefly review indexing in the WAM
|
||||
(Sect.~\ref{sec:prelims}). We then present \JITI schemes for static
|
||||
(Sect.~\ref{sec:static}) and dynamic (Sect.~\ref{sec:dynamic})
|
||||
predicates, and discuss their implementation in two Prolog systems and
|
||||
the performance benefits they bring (Sect.~\ref{sec:perf}). The paper
|
||||
ends with some concluding remarks.
|
||||
|
||||
|
||||
\section{State of the Art and Related Work} \label{sec:related}
|
||||
@ -180,7 +223,7 @@ runtime as new hypotheses are formed or refined.
|
||||
%
|
||||
Our thesis is that the Prolog abstract machine should be able to adapt
|
||||
automatically to the runtime requirements of such or, even better, of
|
||||
all applications by employing increasingly agressive forms of dynamic
|
||||
all applications by employing increasingly aggressive forms of dynamic
|
||||
compilation. As a concrete example of what this means in practice, in
|
||||
this paper we will attack the problem of providing effective indexing
|
||||
during runtime. Naturally, we will base our technique on the existing
|
||||
@ -206,12 +249,12 @@ for being atomic, one for (non-empty) list, and one for structure. In
|
||||
any case, control goes to a (possibly empty) bucket of clauses. In the
|
||||
buckets for constants and structures the second level of dispatching
|
||||
involves the value of the register. The \switchONconstant and
|
||||
\switchONstructure instructions implement this dispatching, typically
|
||||
\switchONstructure instructions implement this dispatching: typically
|
||||
with a \fail instruction when the bucket is empty, with a \jump
|
||||
instruction for only one clause, with a sequential scan when the
|
||||
number of clauses is small, and with a hash lookup when the number of
|
||||
clauses exceeds a threshold. For this reason the \switchONconstant and
|
||||
\switchONstructure instructions take as arguments a hash table
|
||||
\switchONstructure instructions take as arguments the hash table
|
||||
\instr{T} and the number of clauses \instr{N} the table contains (or
|
||||
equivalently, \instr{N} is the size of the hash table). In each bucket
|
||||
of this hash table and also in the bucket for the variable case of
|
||||
@ -222,7 +265,7 @@ update certain fields of this choice point, and the \trust instruction
|
||||
removes it.
|
||||
|
||||
The WAM has additional indexing instructions (\instr{try\_me\_else}
|
||||
and friends) that allow indexing to be intersperced with the code of
|
||||
and friends) that allow indexing to be interspersed with the code of
|
||||
clauses. For simplicity we will not consider them here. This is not a
|
||||
problem since the above scheme handles all cases. Also, we will feel
|
||||
free to do some minor modifications and optimizations when this
|
||||
@ -232,8 +275,8 @@ We present an example. Consider the Prolog code shown in
|
||||
Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine
|
||||
learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}.
|
||||
The five clauses get compiled to the WAM code shown in
|
||||
Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the
|
||||
indexing code that a Prolog compiler generates is shown in
|
||||
Fig.~\ref{fig:carc:clauses}. The first argument indexing indexing code
|
||||
that a Prolog compiler generates is shown in
|
||||
Fig.~\ref{fig:carc:index}. This code is typically placed before the
|
||||
code for the clauses and the \switchONconstant instruction is the
|
||||
entry point of predicate. Note that compared with vanilla WAM this
|
||||
@ -772,12 +815,12 @@ $O(1)$ where $n$ is the number of clauses.
|
||||
The observant reader has no doubt noticed that
|
||||
Algorithm~\ref{alg:construction} provides multi-argument indexing but
|
||||
only for the main functor symbol of arguments. For clauses with
|
||||
compound terms that require indexing in their subterms we can either
|
||||
compound terms that require indexing in their sub-terms we can either
|
||||
employ a program transformation like \emph{unification
|
||||
factoring}~\cite{UnifFact@POPL-95} at compile time or modify the
|
||||
algorithm to consider index positions inside compound terms. This is
|
||||
relatively easy to do but requires support from the register allocator
|
||||
(passing the subterms of compound terms in appropriate argument
|
||||
(passing the sub-terms of compound terms in appropriate argument
|
||||
registers) and/or a new set of instructions. Due to space limitations
|
||||
we omit further details.
|
||||
|
||||
|
Reference in New Issue
Block a user