Added section 3.

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1813 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
2007-03-08 12:07:35 +00:00
parent 3b4bfa28f0
commit 9f4dc198ba
1 changed files with 87 additions and 48 deletions
--- a/docs/index/iclp07.tex
+++ b/docs/index/iclp07.tex
@@ -115,16 +115,18 @@ fully populated only when it is certain that execution will enter the
 clause body. While shallow backtracking avoids some of the performance
 problems of unnecessary choice point creation, it does not offer the
 full benefits that indexing can provide. Other systems like
-BIM-Prolog~\cite{IndexingProlog@NACLP-89}, ilProlog,
+BIM-Prolog~\cite{IndexingProlog@NACLP-89}, SWI-Prolog~\cite{SWI} and
-SWI-Prolog~\cite{SWI}, and XSB~\cite{XSB} allow for user-controlled
+XSB~\cite{XSB} allow for user-controlled multi-argument indexing (via
-multi-argument indexing (via an \code{:-~index} directive).
+an \code{:-~index} directive). Notably, ilProlog~\cite{ilProlog} uses
-Unfortunatelly, this support comes with various implementation
+compile-time heuristics and generates code for multi-argument indexing
-restrictions. For example, in SWI-Prolog at most four arguments can be
+automatically. In all these systems, this support comes with various
-indexed; in XSB the compiler does not offer multi-argument indexing
+implementation restrictions. For example, in SWI-Prolog at most four
-and the predicates need to be asserted instead; we know of no system
+arguments can be indexed; in XSB the compiler does not offer
-where multi-argument indexing looks inside compound terms. More
+multi-argument indexing and the predicates need to be asserted
-importantly, requiring users to specify arguments to index on is
+instead; we know of no system where multi-argument indexing looks
-neither user-friendly nor guarantees good performance results.
+inside compound terms. More importantly, requiring users to specify
 arguments to index on is neither user-friendly nor guarantees good
 performance results.
 % Trees, tries and unification factoring:
 Recognizing the need for better indexing, researchers have proposed
@@ -153,8 +155,8 @@ predicates are known we run the risk of doing indexing on output
 arguments, whose only effect is an unnecessary increase in compilation
 times and, more importantly, in code size. In a programming language
 like Mercury~\cite{Mercury@JLP-96} where modes are known the compiler
-can of course avoid this risk; in Mercury modes are in fact used to
+can of course avoid this risk; indeed in Mercury modes (and types) are
-guide the compiler in generating indexing tables. However, the
+used to guide the compiler generate good indexing tables. However, the
 situation is different for a language like Prolog. Getting accurate
 information about the set of all possible modes of predicates requires
 a global static analyzer in the compiler --- and most Prolog systems
@@ -177,8 +179,8 @@ analysis since queries are often ad hoc and generated only during
 runtime as new hypotheses are formed or refined.
 %
 Our thesis is that the Prolog abstract machine should be able to adapt
-automatically to the runtime requirements of such, or even better of
+automatically to the runtime requirements of such or, even better, of
-all, applications by employing increasingly agressive forms of dynamic
+all applications by employing increasingly agressive forms of dynamic
 compilation. As a concrete example of what this means in practice, in
 this paper we will attack the problem of providing effective indexing
 during runtime. Naturally, we will base our technique on the existing
@@ -192,40 +194,57 @@ sections.
 %\end{itemize}
-\section{Preliminaries} \label{sec:prelims}
+\section{Indexing in the WAM} \label{sec:prelims}
-%==========================================
+%================================================
 To make the paper relatively self-contained we briefly review the
 indexing instructions of the WAM and their use. In the WAM, the first
 level of dispatching involves a test on the type of the argument. The
 \switchONterm instruction checks the tag of the dereferenced value in
 the first argument register and implements a four-way branch where one
 branch is for the dereferenced register being an unbound variable, one
 for being atomic, one for (non-empty) list, and one for structure. In
 any case, control goes to a (possibly empty) bucket of clauses. In the
 buckets for constants and structures the second level of dispatching
 involves the value of the register. The \switchONconstant and
 \switchONstructure instructions implement this dispatching, typically
 with a \fail instruction when the bucket is empty, with a \jump
 instruction for only one clause, with a sequential scan when the
 number of clauses is small, and with a hash lookup when the number of
 clauses exceeds a threshold. For this reason the \switchONconstant and
 \switchONstructure instructions take as arguments a hash table
 \instr{T} and the number of clauses \instr{N} the table contains (or
 equivalently, \instr{N} is the size of the hash table). In each bucket
 of this hash table and also in the bucket for the variable case of
 \switchONterm the code performs a sequential backtracking search of
 the clauses using a \TryRetryTrust chain of instructions. The \try
 instruction sets up a choice point, the \retry instructions (if any)
 update certain fields of this choice point, and the \trust instruction
 removes it.
 The WAM has additional indexing instructions (\instr{try\_me\_else}
 and friends) that allow indexing to be intersperced with the code of
 clauses. For simplicity we will not consider them here. This is not a
 problem since the above scheme handles all cases. Also, we will feel
 free to do some minor modifications and optimizations when this
 simplifies things.
-\section{Demand-Driven Indexing of Static Predicates} \label{sec:static}
+We present an example. Consider the Prolog code shown in
 %=======================================================================
 For static predicates the compiler has complete information about all
 clauses and shapes of their head arguments. It is both desirable and
 possible to take advantage of this information at compile time and so
 we treat the case of static predicates separately.
 %
 We will do so with schemes of increasing effectiveness and
 implementation complexity.
 \subsection{A simple WAM extension for any argument indexing}
 %------------------------------------------------------------
 Let us initially consider the case where the predicates to index
 consist only of Datalog facts. This is commonly the case for all
 extensional database predicates where indexing is most effective and
 called for. One such code example is shown in
 Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine
 learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}.
 The five clauses get compiled to the WAM code shown in
-Fig.~\ref{fig:carc:clauses}. Assuming first argument indexing as
+Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the
-default, the indexing code that a Prolog compiler generates is shown
+indexing code that a Prolog compiler generates is shown in
-in Fig.~\ref{fig:carc:index}. This code is typically placed before the
+Fig.~\ref{fig:carc:index}. This code is typically placed before the
 code for the clauses and the \switchONconstant instruction is the
 entry point of predicate. Note that compared with vanilla WAM this
 instruction has an extra argument: the register on the value of which
-we will index ($r_1$). Another difference from the WAM is that if this
+we will index ($r_1$). The extra argument will allow us to go beyond
-argument register contains an unbound variable instead of a constant
+first argument indexing. Another departure from the WAM is that if
-then execution will continue with the next instruction. The reason for
+this argument register contains an unbound variable instead of a
-the extra argument and this small change in the behavior of
+constant then execution will continue with the next instruction; in
-\switchONconstant will become apparent soon.
+effect we have merged part of the functionality of \switchONterm into
 the \switchONconstant instruction. This small change in the behavior
 of \switchONconstant will allow us to get \JITI. Let's see how.
 %------------------------------------------------------------------------------
 \begin{figure}[t]
@@ -325,6 +344,26 @@ the extra argument and this small change in the behavior of
 \end{figure}
 %------------------------------------------------------------------------------
 \section{Demand-Driven Indexing of Static Predicates} \label{sec:static}
 %=======================================================================
 For static predicates the compiler has complete information about all
 clauses and shapes of their head arguments. It is both desirable and
 possible to take advantage of this information at compile time and so
 we treat the case of static predicates separately.
 %
 We will do so with schemes of increasing effectiveness and
 implementation complexity.
 \subsection{A simple WAM extension for any argument indexing}
 %------------------------------------------------------------
 Let us initially consider the case where the predicates to index
 consist only of Datalog facts. This is commonly the case for all
 extensional database predicates where indexing is most effective and
 called for.
 Refer to the example in Fig.~\ref{fig:carc}.
 %
 The indexing code of Fig.~\ref{fig:carc:index} incurs a small cost for
 a call where the first argument is a variable (namely, executing the
 \switchONconstant instruction) but the instruction pays off for calls
@@ -538,9 +577,9 @@ symbols can be obtained by looking at the second argument of the
 \getcon instruction whose argument register is $r_2$. In the loaded
 bytecode, assuming the argument register is represented in one byte,
 these symbols are found $sizeof(\getcon) + sizeof(opcode) + 1$ bytes
-away from the clause label. Thus, multi-argument \JITI is easy to get
+away from the clause label; see Fig.~\ref{fig:carc:clauses}. Thus,
-and the creation of index tables can be extremely fast when indexing
+multi-argument \JITI is easy to get and the creation of index tables
-Datalog facts.
+can be extremely fast when indexing Datalog facts.
 \subsection{Beyond Datalog and other implementation issues}
 %----------------------------------------------------------
@@ -644,7 +683,7 @@ visited and continues with a \TryRetryTrust chain pointing to the
 clauses. When the index construction is done, the instruction mutates
 to a \switchSTAR WAM instruction.
 %-------------------------------------------------------------------------
-\begin{Algorithm}
+\begin{Algorithm}[t]
  \caption{Actions of the abstract machine with \JITI}
  \label{alg:construction}
  \begin{enumerate}
@@ -733,12 +772,12 @@ $O(1)$ where $n$ is the number of clauses.
 The observant reader has no doubt noticed that
 Algorithm~\ref{alg:construction} provides multi-argument indexing but
 only for the main functor symbol of arguments. For clauses with
-structured terms that require indexing in their subterms we can either
+compound terms that require indexing in their subterms we can either
 employ a program transformation like \emph{unification
 factoring}~\cite{UnifFact@POPL-95} at compile time or modify the
-algorithm to consider index positions inside structure symbols. This
+algorithm to consider index positions inside compound terms. This is
-is relatively easy to do but requires support from the register
+relatively easy to do but requires support from the register allocator
-allocator (passing the subterms of structures in appropriate argument
+(passing the subterms of compound terms in appropriate argument
 registers) and/or a new set of instructions. Due to space limitations
 we omit further details.