Wrote concluding remarks.

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1839 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
kostis 2007-03-12 11:10:24 +00:00
parent 352267fc59
commit a75f5db073

View File

@ -3,6 +3,7 @@
%------------------------------------------------------------------------------
\usepackage{a4wide}
\usepackage{float}
\usepackage{alltt}
\usepackage{xspace}
\usepackage{epsfig}
\usepackage{wrapfig}
@ -977,10 +978,9 @@ YAP uses the term JITI (Just-In-Time Indexing) to refer to \JITI. In
the next section we will take the liberty to use this term as a
convenient abbreviation.
\section{Performance Evaluation} \label{sec:perf}
%================================================
We evaluate \JITI on a set of benchmarks and logic programming applications.
We evaluate \JITI on a set of benchmarks and LP applications.
Throughout, we compare performance of JITI with first argument
indexing. For the benchmarks of Sect.~\ref{sec:perf:ineffective}
and~\ref{sec:perf:effective} which involve both systems, we used a
@ -1002,15 +1002,15 @@ construction. We therefore wanted to measure this overhead.
As both systems support tabling, we decided to use tabling benchmarks
because they are small and easy to understand, and because they are a
worst case for JITI in the following sense: tabling avoids generating
repetitive queries and the benchmarks operate over EDB predicates of
size approximately equal the size of the program. We used \compress, a
tabled program that solves a puzzle from an ICLP Prolog programming
competition. The other benchmarks are different variants of tabled
left, right and doubly recursive transitive closure over an EDB
predicate forming a chain of size shown in Table~\ref{tab:ineffective}
in parentheses. For each variant of transitive closure, we issue two
queries: one with mode \code{(in,out)} and one with mode
\code{(out,out)}.
repetitive queries and the benchmarks operate over extensional
database (EDB) predicates of size approximately equal the size of the
program. We used \compress, a tabled program that solves a puzzle from
an ICLP Prolog programming competition. The other benchmarks are
different variants of tabled left, right and doubly recursive
transitive closure over an EDB predicate forming a chain of size shown
in Table~\ref{tab:ineffective} in parentheses. For each variant of
transitive closure, we issue two queries: one with mode
\code{(in,out)} and one with mode \code{(out,out)}.
%
For YAP, indices on the first argument and \TryRetryTrust chains are
built on all benchmarks under \JITI.
@ -1023,13 +1023,43 @@ ineffective, incurs a runtime overhead that is at the level of noise
and goes mostly unnoticed.
%
We also note that our aim here is \emph{not} to compare the two
systems, so the reader should read the \textbf{YAP} and \textbf{XXX}
columns separately.
systems, so the \textbf{YAP} and \textbf{XXX} columns should be read
separately.
\vspace*{-0.5em}
\subsection{Performance of \JITI when effective} \label{sec:perf:effective}
%--------------------------------------------------------------------------
On the other hand, when \JITI is effective, it can significantly
improve runtime performance. We use the following programs and
applications:
%% \TODO{For the journal version we should also add FSA benchmarks
%% (\bench{k963}, \bench{dg5} and \bench{tl3})}
%------------------------------------------------------------------------------
\begin{small}
\begin{description}
\item[\sgCyl] The same generation DB benchmark on a $24 \times 24
\times 2$ cylinder. We issue the open query.
\item[\muta] A computationally intensive application where most
predicates are defined intentionally.
\item[\pta] A tabled logic program implementing Andersen's points-to
analysis~\cite{anderson-phd}. A medium-sized imperative program is
encoded as a set of facts (about 16,000) and properties of interest
are encoded using rules. Program properties can then be determined
by checking the closure of these rules.
\item[\tea] Another analyzer using tabling to implement Andersen's
points-to analysis. The analyzed program, the \texttt{javac} SPEC
benchmark, is encoded in a file of 411,696 facts (62,759,581 bytes
in total). As its compilation exceeds the limits of the XXX compiler
(w/o JITI), we run this benchmark only in YAP.
\end{description}
\end{small}
%------------------------------------------------------------------------------
%------------------------------------------------------------------------------
\begin{table}[t]
\centering
\setlength{\tabcolsep}{3pt}
\caption{Performance of some benchmarks with 1st vs. \JITI (times in msecs)}
\setlength{\tabcolsep}{3pt}
\subfigure[When JITI is ineffective]{
\label{tab:ineffective}
\begin{tabular}[b]{|l||r|r||r|r|} \hline
@ -1064,30 +1094,6 @@ columns separately.
\end{table}
%------------------------------------------------------------------------------
\subsection{Performance of \JITI when effective} \label{sec:perf:effective}
%--------------------------------------------------------------------------
On the other hand, when \JITI is effective, it can significantly
improve time performance. We use the following programs and
applications:
%% \TODO{For the journal version we should also add FSA benchmarks
%% (\bench{k963}, \bench{dg5} and \bench{tl3})}
\begin{description}
\item[\sgCyl] The same generation DB benchmark on a $24 \times 24
\times 2$ cylinder. We issue the open query.
\item[\muta] A computationally intensive application where most
predicates are defined intentionally.
\item[\pta] A tabled logic program implementing Andersen's points-to
analysis~\cite{anderson-phd}. A medium-sized imperative program is
encoded as a set of facts (about 16,000) and properties of interest
are encoded using rules. Program properties can then be determined
by checking the closure of these rules.
\item[\tea] Another analyzer using tabling to implement Andersen's
points-to analysis. The analyzed program, the \texttt{javac} SPEC
benchmark, is encoded in a file of 411,696 facts (62,759,581 bytes
in total). As its compilation exceeds the limits of the XXX compiler
(w/o JITI), we run this benchmark only in YAP.
\end{description}
As can be seen in Table~\ref{tab:effective}, \JITI significantly
improves the performance of these applications. In \muta, which spends
most of its time in recursive predicates, the speed up is only $79\%$
@ -1097,7 +1103,7 @@ times (from~$16$ up to~$119$) faster. It is important to realize that
programmer intervention or by using any compiler directives, in all
these applications.
We analyze the \sgCyl program which has the biggest speedup in both
We analyze the \sgCyl program that has the biggest speedup in both
systems and is the only one whose code is small enough to be shown.
With the open call to \texttt{same\_generation/2}, most work in this
benchmark consists of calling \texttt{cyl/2} facts in three different
@ -1106,13 +1112,10 @@ with only the second argument bound. Demand-driven indexing improves
performance in the last case only, but this improvement makes a big
difference in this benchmark.
\begin{small}
\begin{verbatim}
\begin{alltt}\small
same_generation(X,X) :- cyl(X,_).
same_generation(X,X) :- cyl(_,X).
same_generation(X,Y) :- cyl(X,Z), same_generation(Z,W), cyl(Y,W).
\end{verbatim}
\end{small}
same_generation(X,Y) :- cyl(X,Z), same_generation(Z,W), cyl(Y,W).\end{alltt}
%% Our experience with the indexing algorithm described here shows a
%% significant performance improvement over the previous indexing code in
@ -1122,40 +1125,56 @@ difference in this benchmark.
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
%-------------------------------------------------------------------------
The need for \JITI was originally noticed in inductive logic
programming applications, which tend to issue ad hoc queries during
runtime and their indexing requirements cannot be determined at
compile time. On the other hand, these applications operate on lots of
data, so memory consumption is a reasonable concern. We evaluate
programming applications. These applications tend to issue ad hoc
queries during execution and thus their indexing requirements cannot
be determined at compile time. On the other hand, they operate on lots
of data, so memory consumption is a reasonable concern. We evaluate
JITI's time and space performance on some learning tasks using the
ALEPH system~\cite{ALEPH}. We use the following datasets:
%
%% \Krki which tries to learn rules from a small database of chess end-games;
\GeneExpr which learns rules for
yeast gene activity given a database of genes, their interactions, and
micro-array gene expression data; \BreastCancer processes real-life
patient reports towards predicting whether an abnormality may be
malignant; \IEProtein processes information extraction from paper
abstracts to search proteins; \Susi learns from shopping patterns; and
\Mesh learns rules for finite-methods mesh design. The datasets
\Carcino, \Choline, \Pyrimidines, and
\Thermolysin try to predict chemical properties of compounds. The
first three datasets store properties of interest as tables, but
\Thermolysin learns from the 3D-structure of a molecule's
conformations. Several of these datasets are standard across the
Machine Learning literature. \GeneExpr~\cite{ilp-regulatory06}
and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
developed by an author of this paper. Most datasets perform simple
queries in an extensional database.
Aleph system~\cite{ALEPH} and the datasets of
Fig.~\ref{fig:ilp:datasets} which issue simple queries in an
extentional database. Several of these datasets are standard in the
Machine Learning literature.
\paragraph*{Time performance.}
We compare times for 10 runs of the saturation/refinement cycle of the
ILP system; see Table~\ref{tab:ilp:time}.
%% The \Krki datasets have small search spaces and small databases, so
%% they achieve the same performance under both versions: there is no
%% slowdown.
The \Mesh and \Pyrimidines applications are the only ones that do not
benefit much from indexing in the database; they do benefit through
from indexing in the dynamic representation of the search space, as
their running times improve somewhat with \JITI.
The \BreastCancer and \GeneExpr applications use data in 1NF (i.e.,
unstructured data). The speedup here is mostly from multiple argument
indexing. \BreastCancer is particularly interesting. It consists of 40
binary relations with 65k elements each, where the first argument is
the key. We know that most calls have the first argument bound, hence
indexing was not expected to matter much. Instead, the results show
\JITI to improve running time by more than an order of magnitude. Like in
\sgCyl, this suggests that even a small percentage of badly indexed
calls can end up dominating runtime.
\IEProtein and \Thermolysin are example applications that manipulate
structured data. \IEProtein is the largest dataset we consider, and
indexing is absolutely critical. The speedup is not just impressive;
it is simply not possible to run the application in reasonable time
with only first argument indexing. \Thermolysin is smaller and
performs some computation per query, but even so, \JITI improves its
performance by an order of magnitude. The remaining benchmarks improve
from one to more than two orders of magnitude.
%------------------------------------------------------------------------------
\begin{table}[t]
\centering
\caption{Time and space performance on Machine Learning (ILP) Datasets}
\caption{Time and space performance of JITI
on Inductive Logic Programming datasets}
\label{tab:ilp}
\setlength{\tabcolsep}{3pt}
\subfigure[Time (in seconds)]{\label{tab:ilp:time}
\begin{tabular}{|l||r|r|r||} \hline
& \multicolumn{3}{|c||}{Time (in secs)} \\
& \multicolumn{3}{|c||}{Time} \\
\cline{2-4}
Benchmark & 1st & JITI &{\bf ratio} \\
\hline
@ -1198,59 +1217,82 @@ queries in an extensional database.
\end{table}
%------------------------------------------------------------------------------
We compare times for 10 runs of the saturation/refinement cycle of the
ILP system. Table~\ref{tab:ilp:time} shows time results.
%% The \Krki datasets have small search spaces and small databases, so
%% they achieve the same performance under both versions: there is no
%% slowdown.
The \Mesh and \Pyrimidines applications do not benefit much from
indexing in the database, but they do benefit from indexing in the
dynamic representation of the search space, as their running times
halve.
%------------------------------------------------------------------------------
\begin{figure}
\hrule \ \\[-2em]
\begin{description}
%% \item[\Krki] tries to learn rules from a small database of chess end-games;
\item[\GeneExpr] learns rules for yeast gene activity given a
database of genes, their interactions, and micro-array gene
expression data~\cite{Regulatory@ILP-06};
\item[\BreastCancer] processes real-life patient reports towards
predicting whether an abnormality may be
malignant~\cite{DavisBDPRCS@IJCAI-05};
\item[\IEProtein] processes information extraction from paper
abstracts to search proteins;
\item[\Susi] learns from shopping patterns;
\item[\Mesh] learns rules for finite-methods mesh design;
\item[\Carcino, \Choline, \Pyrimidines] try to predict chemical
properties of compounds and store them as tables;
\item[\Thermolysin] also manipulates chemical compounds but learns
from the 3D-structure of a molecule's conformations.
\end{description}
\hrule
\caption{Description of the ILP datasets used in the performance
comparison of Table~\ref{tab:ilp}}
\label{fig:ilp:datasets}
\end{figure}
%------------------------------------------------------------------------------
The \BreastCancer and \GeneExpr applications use data in
1NF (that is, unstructured data). The benefit here is mostly from
multiple-argument indexing. \BreastCancer is particularly
interesting. It consists of 40 binary relations with 65k elements
each, where the first argument is the key, like in \sgCyl. We know
that most calls have the first argument bound, hence indexing was not
expected to matter very much. Instead, the results show \JITI running
time to improve by an order of magnitude. Like \sgCyl, this
suggests that even a small percentage of badly indexed calls can end
up dominating runtime.
\IEProtein and \Thermolysin are example applications that manipulate
structured data. \IEProtein is the largest dataset we consider, and
indexing is absolutely critical: it is simply not possible to run the
application in reasonable time with first argument indexing.
\Thermolysin is smaller and performs some computation per query, but
even so, indexing improves performance by an order of magnitude.
Table~\ref{tab:ilp:memory} also shows memory usage with \JITI. The
table presents data obtained at a point near the end of execution; we
chose a point where memory usage should be at a maximum. The second
and third columns show data usage on \emph{static} predicates. The
cost varies widely, from 10\% to the worst case, \Carcino, where the
index tree takes more room than the original program. Hash-tables
\paragraph*{Space performance.}
Table~\ref{tab:ilp:memory} shows memory usage when using \JITI. The
table presents data obtained at a point near the end of execution; a
point where memory usage should be at or close to the maximum. These
applications use a mixture of static and dynamic predicates and we
show their memory usage separately. On static predicates, memory usage
varies widely, from only 10\% to the worst case, \Carcino, where the
index tree takes more space than the original program. Hash tables
dominate usage in \IEProtein and \Susi, whereas \TryRetryTrust chains
dominate in \BreastCancer. In most other cases no single component
dominates memory usage. Memory usage for dynamic data is shown in the
last two columns; note that dynamic data is mostly used to store the
search space. One can observe that there is a much lower overhead in
this case. A more detailed analysis shows that most space is spent on
hash tables and on internal nodes of tree, and that relatively little
space is spent on \TryRetryTrust chains, suggesting that \JITI is
working well.
this case. A more detailed analysis shows that most space is occupied
by the hash tables and by internal nodes of the tree, and that
relatively little space is occupied by \TryRetryTrust chains,
suggesting that \JITI is behaving well in practice.
\section{Concluding Remarks}
%===========================
\begin{itemize}
\item Mention the non-trivial speedups in actual applications; also
that it is important to realize that certain applications have ad
hoc query patterns (e.g., ILP) are not amenable to static analyses
\end{itemize}
Motivated by the needs of LP applications in the areas of inductive
logic programming, program analysis, deductive databases, etc.\ to
access large datasets efficiently, we have described a novel but also
simple idea: \emph{indexing Prolog clauses on demand during program
execution}.
%
Given the impressive speedups this idea can provide for many
applications, we are a bit surprised similar techniques have not been
explored before. In general, Prolog systems have been reluctant to
perform code optimizations during runtime and our feeling is that LP
implementation has been left a bit behind times. We hold that this
should change.
%
Indeed, we see \JITI as only the first, albeit a very important, step
towards effective runtime optimization of logic programs.
As presented, \JITI is a hybrid technique: index generation occurs
during runtime but is partly guided by the compiler, because we want
to preserve compile-time WAM-style indexing. More flexible schemes are
possible. For example, index generation can be fully dynamic (as in
YAP), combined with user declarations, or use static analysis to be
even more selective or go beyond fixed-order indexing.
%
Finally, note that \JITI fully respects Prolog semantics. Better
performance can be achieved in the context of one solution
computations, or in the context of tabling where order of clauses and
solutions does not matter and repeated solutions are discarded.
%==============================================================================
\bibliographystyle{splncs}