From 52c4cfb18fc10d2a61aa34b5b74f7b839d9ddcfe Mon Sep 17 00:00:00 2001 From: kostis Date: Thu, 8 Mar 2007 15:19:16 +0000 Subject: [PATCH] Added introduction. git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1814 b08c6af1-5177-4d33-ba66-4b1c6b8b522a --- docs/index/iclp07.tex | 69 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 56 insertions(+), 13 deletions(-) diff --git a/docs/index/iclp07.tex b/docs/index/iclp07.tex index 388f2addd..6dddc2041 100644 --- a/docs/index/iclp07.tex +++ b/docs/index/iclp07.tex @@ -95,12 +95,55 @@ \section{Introduction} %===================== -The WAM~\cite{Warren83} +The WAM~\cite{Warren83} has been both a blessing and a curse for +Prolog systems. Its ingenious design has allowed implementors to get +byte code compilers with decent performance --- it is not a fluke that +most Prolog systems are still based on the WAM. On the other hand, +\emph{because} the WAM gives good performance in many cases, +implementors have felt reluctant to explore alternatives that +drastically depart from its basic philosophy. +% +For example, first argument indexing makes sense for many Prolog +applications. For applications accessing large databases though is +clearly sub-optimal; for long time now, the database community has +recognized that good indexing mechanisms are the basis for fast query +processing. -%% The slogan ``first argument indexing is all you need'' makes sense for -%% many Prolog applications. For applications accessing large databases -%% though is clearly false; for long time now, the database community has -%% realized that indexing mechanisms are essential for fast query processing. +As logic programming applications grow in size, Prolog systems need to +efficiently access larger and larger data sets and the need for any- +and multi-argument indexing becomes more and more profound. Static +generation of multi-argument indexing is one alternative. However, +this alternative is often unattractive because it may drastically +increase the size of the generated byte code unnecessarily. Static +analysis techniques can partly address this concern, but in +applications that rely on features which are inherently dynamic (e.g., +generating hypotheses for inductive logic programming data sets during +runtime) they are inapplicable or grossly inaccurate. Another +alternative, which has not been investigated so far, is to do flexible +indexing on demand during program execution. + +This is precisely what we advocate in this paper. More specifically, +we present a minimal extension to the WAM that allows for flexible +indexing of Prolog clauses during runtime based on actual demand. For +static predicates, the scheme we propose is partly guided by the +compiler; for dynamic code, besides being demand-driven by queries, +the method needs to cater for code updates during runtime. In our +experience these schemes pay off. We have implemented \JITI in two +different Prolog systems (Yap and XXX) and have obtained non-trivial +speedups, ranging from a few percent to orders of magnitude, across a +wide range of applications. Given these results, we see very little +reason for Prolog systems not to incorporate some form of indexing +based on actual demand from queries. In fact, we see \JITI as only the +first step towards effective runtime optimization of Prolog programs. + +This paper is structured as follows. After commenting on the state of +the art and related work concerning indexing in Prolog systems +(Sect.~\ref{sec:related}) we briefly review indexing in the WAM +(Sect.~\ref{sec:prelims}). We then present \JITI schemes for static +(Sect.~\ref{sec:static}) and dynamic (Sect.~\ref{sec:dynamic}) +predicates, and discuss their implementation in two Prolog systems and +the performance benefits they bring (Sect.~\ref{sec:perf}). The paper +ends with some concluding remarks. \section{State of the Art and Related Work} \label{sec:related} @@ -180,7 +223,7 @@ runtime as new hypotheses are formed or refined. % Our thesis is that the Prolog abstract machine should be able to adapt automatically to the runtime requirements of such or, even better, of -all applications by employing increasingly agressive forms of dynamic +all applications by employing increasingly aggressive forms of dynamic compilation. As a concrete example of what this means in practice, in this paper we will attack the problem of providing effective indexing during runtime. Naturally, we will base our technique on the existing @@ -206,12 +249,12 @@ for being atomic, one for (non-empty) list, and one for structure. In any case, control goes to a (possibly empty) bucket of clauses. In the buckets for constants and structures the second level of dispatching involves the value of the register. The \switchONconstant and -\switchONstructure instructions implement this dispatching, typically +\switchONstructure instructions implement this dispatching: typically with a \fail instruction when the bucket is empty, with a \jump instruction for only one clause, with a sequential scan when the number of clauses is small, and with a hash lookup when the number of clauses exceeds a threshold. For this reason the \switchONconstant and -\switchONstructure instructions take as arguments a hash table +\switchONstructure instructions take as arguments the hash table \instr{T} and the number of clauses \instr{N} the table contains (or equivalently, \instr{N} is the size of the hash table). In each bucket of this hash table and also in the bucket for the variable case of @@ -222,7 +265,7 @@ update certain fields of this choice point, and the \trust instruction removes it. The WAM has additional indexing instructions (\instr{try\_me\_else} -and friends) that allow indexing to be intersperced with the code of +and friends) that allow indexing to be interspersed with the code of clauses. For simplicity we will not consider them here. This is not a problem since the above scheme handles all cases. Also, we will feel free to do some minor modifications and optimizations when this @@ -232,8 +275,8 @@ We present an example. Consider the Prolog code shown in Fig.~\ref{fig:carc:facts}. It is a fragment of the well-known machine learning dataset \textit{Carcinogenesis}~\cite{Carcinogenesis@ILP-97}. The five clauses get compiled to the WAM code shown in -Fig.~\ref{fig:carc:clauses}. With only first argument indexing, the -indexing code that a Prolog compiler generates is shown in +Fig.~\ref{fig:carc:clauses}. The first argument indexing indexing code +that a Prolog compiler generates is shown in Fig.~\ref{fig:carc:index}. This code is typically placed before the code for the clauses and the \switchONconstant instruction is the entry point of predicate. Note that compared with vanilla WAM this @@ -772,12 +815,12 @@ $O(1)$ where $n$ is the number of clauses. The observant reader has no doubt noticed that Algorithm~\ref{alg:construction} provides multi-argument indexing but only for the main functor symbol of arguments. For clauses with -compound terms that require indexing in their subterms we can either +compound terms that require indexing in their sub-terms we can either employ a program transformation like \emph{unification factoring}~\cite{UnifFact@POPL-95} at compile time or modify the algorithm to consider index positions inside compound terms. This is relatively easy to do but requires support from the register allocator -(passing the subterms of compound terms in appropriate argument +(passing the sub-terms of compound terms in appropriate argument registers) and/or a new set of instructions. Due to space limitations we omit further details.