Added introduction.

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1814 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
2007-03-08 15:19:16 +00:00
parent 9f4dc198ba
commit 52c4cfb18f
1 changed files with 56 additions and 13 deletions
--- a/docs/index/iclp07.tex
+++ b/docs/index/iclp07.tex
@@ -95,12 +95,55 @@

 \section{Introduction}
 %=====================
-The WAM~\cite{Warren83}
+The WAM~\cite{Warren83} has been both a blessing and a curse for
+Prolog systems. Its ingenious design has allowed implementors to get
+byte code compilers with decent performance --- it is not a fluke that
+most Prolog systems are still based on the WAM. On the other hand,
+\emph{because} the WAM gives good performance in many cases,
+implementors have felt reluctant to explore alternatives that
+drastically depart from its basic philosophy.
+%
+For example, first argument indexing makes sense for many Prolog
+applications. For applications accessing large databases though is
+clearly sub-optimal; for long time now, the database community has
+recognized that good indexing mechanisms are the basis for fast query
+processing.

-%% The slogan ``first argument indexing is all you need'' makes sense for
-%% many Prolog applications. For applications accessing large databases
-%% though is clearly false; for long time now, the database community has
-%% realized that indexing mechanisms are essential for fast query processing.
+As logic programming applications grow in size, Prolog systems need to
+efficiently access larger and larger data sets and the need for any-
+and multi-argument indexing becomes more and more profound. Static
+generation of multi-argument indexing is one alternative. However,
+this alternative is often unattractive because it may drastically
+increase the size of the generated byte code unnecessarily. Static
+analysis techniques can partly address this concern, but in
+applications that rely on features which are inherently dynamic (e.g.,
+generating hypotheses for inductive logic programming data sets during
+runtime) they are inapplicable or grossly inaccurate. Another
+alternative, which has not been investigated so far, is to do flexible
+indexing on demand during program execution.
+
+This is precisely what we advocate in this paper. More specifically,
+we present a minimal extension to the WAM that allows for flexible
+indexing of Prolog clauses during runtime based on actual demand. For
+static predicates, the scheme we propose is partly guided by the
+compiler; for dynamic code, besides being demand-driven by queries,
+the method needs to cater for code updates during runtime. In our
+experience these schemes pay off. We have implemented \JITI in two
+different Prolog systems (Yap and XXX) and have obtained non-trivial
+speedups, ranging from a few percent to orders of magnitude, across a
+wide range of applications. Given these results, we see very little
+reason for Prolog systems not to incorporate some form of indexing
+based on actual demand from queries. In fact, we see \JITI as only the
+first step towards effective runtime optimization of Prolog programs.
+
+This paper is structured as follows. After commenting on the state of
+the art and related work concerning indexing in Prolog systems
+(Sect.~\ref{sec:related}) we briefly review indexing in the WAM
+(Sect.~\ref{sec:prelims}). We then present \JITI schemes for static
+(Sect.~\ref{sec:static}) and dynamic (Sect.~\ref{sec:dynamic})
+predicates, and discuss their implementation in two Prolog systems and
+the performance benefits they bring (Sect.~\ref{sec:perf}). The paper
+ends with some concluding remarks.


 \section{State of the Art and Related Work} \label{sec:related}
@@ -180,7 +223,7 @@ runtime as new hypotheses are formed or refined.
 %
 Our thesis is that the Prolog abstract machine should be able to adapt
 automatically to the runtime requirements of such or, even better, of
-all applications by employing increasingly agressive forms of dynamic
+all applications by employing increasingly aggressive forms of dynamic
 compilation. As a concrete example of what this means in practice, in
 this paper we will attack the problem of providing effective indexing
 during runtime. Naturally, we will base our technique on the existing
@@ -206,12 +249,12 @@ for being atomic, one for (non-empty) list, and one for structure. In
 any case, control goes to a (possibly empty) bucket of clauses. In the
 buckets for constants and structures the second level of dispatching
 involves the value of the register. The \switchONconstant and
-\switchONstructure instructions implement this dispatching, typically
+\switchONstructure instructions implement this dispatching: typically
 with a \fail instruction when the bucket is empty, with a \jump
 instruction for only one clause, with a sequential scan when the
 number of clauses is small, and with a hash lookup when the number of
 clauses exceeds a threshold. For this reason the \switchONconstant and
-\switchONstructure instructions take as arguments a hash table
+\switchONstructure instructions take as arguments the hash table
 \instr{T} and the number of clauses \instr{N} the table contains (or
 equivalently, \instr{N} is the size of the hash table). In each bucket
 of this hash table and also in the bucket for the variable case of
@@ -222,7 +265,7 @@ update certain fields of this choice point, and the \trust instruction
 removes it.

 The WAM has additional indexing instructions (\instr{try\_me\_else}
-and friends) that allow indexing to be intersperced with the code of
+and friends) that allow indexing to be interspersed with the code of
 clauses. For simplicity we will not consider them here. This is not a
 problem since the above scheme handles all cases. Also, we will feel
 free to do some minor modifications and optimizations when this
@@ -232,8 +275,8 @@ We present an example. Consider the Prolog code shown in
 Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine
 learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}.
 The five clauses get compiled to the WAM code shown in
-Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the
-indexing code that a Prolog compiler generates is shown in
+Fig.~\ref{fig:carc:clauses}. The first argument indexing indexing code
+that a Prolog compiler generates is shown in
 Fig.~\ref{fig:carc:index}. This code is typically placed before the
 code for the clauses and the \switchONconstant instruction is the
 entry point of predicate. Note that compared with vanilla WAM this
@@ -772,12 +815,12 @@ $O(1)$ where $n$ is the number of clauses.
 The observant reader has no doubt noticed that
 Algorithm~\ref{alg:construction} provides multi-argument indexing but
 only for the main functor symbol of arguments. For clauses with
-compound terms that require indexing in their subterms we can either
+compound terms that require indexing in their sub-terms we can either
 employ a program transformation like \emph{unification
 factoring}~\cite{UnifFact@POPL-95} at compile time or modify the
 algorithm to consider index positions inside compound terms. This is
 relatively easy to do but requires support from the register allocator
-(passing the subterms of compound terms in appropriate argument
+(passing the sub-terms of compound terms in appropriate argument
 registers) and/or a new set of instructions. Due to space limitations
 we omit further details.