Revised up to Section 7.
git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1831 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
parent
7afc0fdd07
commit
9ed8306415
@ -48,6 +48,19 @@
|
|||||||
\newcommand{\pta}{\bench{pta}\xspace}
|
\newcommand{\pta}{\bench{pta}\xspace}
|
||||||
\newcommand{\tea}{\bench{tea}\xspace}
|
\newcommand{\tea}{\bench{tea}\xspace}
|
||||||
%------------------------------------------------------------------------------
|
%------------------------------------------------------------------------------
|
||||||
|
\newcommand{\BreastCancer}{\bench{BreastCancer}\xspace}
|
||||||
|
\newcommand{\Carcinogenesis}{\bench{Carcinogenesis}\xspace}
|
||||||
|
\newcommand{\Choline}{\bench{Choline}\xspace}
|
||||||
|
\newcommand{\GeneExpression}{\bench{GeneExpression}\xspace}
|
||||||
|
\newcommand{\IEProtein}{\bench{IE-Protein\_Extraction}\xspace}
|
||||||
|
\newcommand{\Krki}{\bench{Krki}\xspace}
|
||||||
|
\newcommand{\KrkiII}{\bench{Krki~II}\xspace}
|
||||||
|
\newcommand{\Mesh}{\bench{Mesh}\xspace}
|
||||||
|
\newcommand{\Mutagenesis}{\bench{Mutagenesis}\xspace}
|
||||||
|
\newcommand{\Pyrimidines}{\bench{Pyrimidines}\xspace}
|
||||||
|
\newcommand{\Susi}{\bench{Susi}\xspace}
|
||||||
|
\newcommand{\Thermolysin}{\bench{Thermolysin}\xspace}
|
||||||
|
%------------------------------------------------------------------------------
|
||||||
\newenvironment{SmallProg}{\begin{tt}\begin{small}\begin{tabular}[b]{l}}{\end{tabular}\end{small}\end{tt}}
|
\newenvironment{SmallProg}{\begin{tt}\begin{small}\begin{tabular}[b]{l}}{\end{tabular}\end{small}\end{tt}}
|
||||||
\newenvironment{ScriptProg}{\begin{tt}\begin{scriptsize}\begin{tabular}[b]{l}}{\end{tabular}\end{scriptsize}\end{tt}}
|
\newenvironment{ScriptProg}{\begin{tt}\begin{scriptsize}\begin{tabular}[b]{l}}{\end{tabular}\end{scriptsize}\end{tt}}
|
||||||
\newenvironment{FootProg}{\begin{tt}\begin{footnotesize}\begin{tabular}[c]{l}}{\end{tabular}\end{footnotesize}\end{tt}}
|
\newenvironment{FootProg}{\begin{tt}\begin{footnotesize}\begin{tabular}[c]{l}}{\end{tabular}\end{footnotesize}\end{tt}}
|
||||||
@ -120,7 +133,7 @@ For example, first argument indexing is sufficient for many Prolog
|
|||||||
applications. However, it is clearly sub-optimal for applications
|
applications. However, it is clearly sub-optimal for applications
|
||||||
accessing large databases; for a long time now, the database community
|
accessing large databases; for a long time now, the database community
|
||||||
has recognized that good indexing is the basis for fast query
|
has recognized that good indexing is the basis for fast query
|
||||||
processing~\cite{}.
|
processing.
|
||||||
|
|
||||||
As logic programming applications grow in size, Prolog systems need to
|
As logic programming applications grow in size, Prolog systems need to
|
||||||
efficiently access larger and larger data sets and the need for any-
|
efficiently access larger and larger data sets and the need for any-
|
||||||
@ -144,7 +157,7 @@ the method needs to cater for code updates during runtime. Where our
|
|||||||
schemes radically depart from current practice is that they generate
|
schemes radically depart from current practice is that they generate
|
||||||
new byte code during runtime, in effect doing a form of just-in-time
|
new byte code during runtime, in effect doing a form of just-in-time
|
||||||
compilation. In our experience these schemes pay off. We have
|
compilation. In our experience these schemes pay off. We have
|
||||||
implemented \JITI in two different Prolog systems (Yap and XXX) and
|
implemented \JITI in two different Prolog systems (YAP and XXX) and
|
||||||
have obtained non-trivial speedups, ranging from a few percent to
|
have obtained non-trivial speedups, ranging from a few percent to
|
||||||
orders of magnitude, across a wide range of applications. Given these
|
orders of magnitude, across a wide range of applications. Given these
|
||||||
results, we see very little reason for Prolog systems not to
|
results, we see very little reason for Prolog systems not to
|
||||||
@ -226,14 +239,14 @@ systems currently do not provide the type of indexing that
|
|||||||
applications require. Even in systems like Ciao~\cite{Ciao@SCP-05},
|
applications require. Even in systems like Ciao~\cite{Ciao@SCP-05},
|
||||||
which do come with built-in static analysis and more or less force
|
which do come with built-in static analysis and more or less force
|
||||||
such a discipline on the programmer, mode information is not used for
|
such a discipline on the programmer, mode information is not used for
|
||||||
multi-argument indexing!
|
multi-argument indexing.
|
||||||
|
|
||||||
% The grand finale:
|
% The grand finale:
|
||||||
The situation is actually worse for certain types of Prolog
|
The situation is actually worse for certain types of Prolog
|
||||||
applications. For example, consider applications in the area of
|
applications. For example, consider applications in the area of
|
||||||
inductive logic programming. These applications on the one hand have
|
inductive logic programming. These applications on the one hand have
|
||||||
big demands for effective indexing since they need to efficiently
|
high demands for effective indexing since they need to efficiently
|
||||||
access big datasets and on the other they are very unfit for static
|
access big datasets and on the other they are unfit for static
|
||||||
analysis since queries are often ad hoc and generated only during
|
analysis since queries are often ad hoc and generated only during
|
||||||
runtime as new hypotheses are formed or refined.
|
runtime as new hypotheses are formed or refined.
|
||||||
%
|
%
|
||||||
@ -241,11 +254,11 @@ Our thesis is that the Prolog abstract machine should be able to adapt
|
|||||||
automatically to the runtime requirements of such or, even better, of
|
automatically to the runtime requirements of such or, even better, of
|
||||||
all applications by employing increasingly aggressive forms of dynamic
|
all applications by employing increasingly aggressive forms of dynamic
|
||||||
compilation. As a concrete example of what this means in practice, in
|
compilation. As a concrete example of what this means in practice, in
|
||||||
this paper we will attack the problem of providing effective indexing
|
this paper we will attack the problem of satisfying the indexing needs
|
||||||
during runtime. Naturally, we will base our technique on the existing
|
of applications during runtime. Naturally, we will base our technique
|
||||||
support for indexing that the WAM provides, but we will extend this
|
on the existing support for indexing that the WAM provides, but we
|
||||||
support with the technique of \JITI that we describe in the next
|
will extend this support with the technique of \JITI that we describe
|
||||||
sections.
|
in the next sections.
|
||||||
|
|
||||||
|
|
||||||
\section{Indexing in the WAM} \label{sec:prelims}
|
\section{Indexing in the WAM} \label{sec:prelims}
|
||||||
@ -271,7 +284,7 @@ equivalently, \instr{N} is the size of the hash table). In each bucket
|
|||||||
of this hash table and also in the bucket for the variable case of
|
of this hash table and also in the bucket for the variable case of
|
||||||
\switchONterm the code performs a sequential backtracking search of
|
\switchONterm the code performs a sequential backtracking search of
|
||||||
the clauses using a \TryRetryTrust chain of instructions. The \try
|
the clauses using a \TryRetryTrust chain of instructions. The \try
|
||||||
instruction sets up a choice point, the \retry instructions (if any)
|
instruction sets up a choice point, the \retry instructions (if~any)
|
||||||
update certain fields of this choice point, and the \trust instruction
|
update certain fields of this choice point, and the \trust instruction
|
||||||
removes it.
|
removes it.
|
||||||
|
|
||||||
@ -529,13 +542,14 @@ heuristically decide that some arguments are most likely than others
|
|||||||
to be used in the \code{in} mode. Then we can simply place the
|
to be used in the \code{in} mode. Then we can simply place the
|
||||||
\jitiONconstant instructions for these arguments \emph{before} the
|
\jitiONconstant instructions for these arguments \emph{before} the
|
||||||
instructions for other arguments. This is possible since all indexing
|
instructions for other arguments. This is possible since all indexing
|
||||||
instructions take the argument register number as an argument.
|
instructions take the argument register number as an argument; their
|
||||||
|
order does not matter.
|
||||||
|
|
||||||
\subsection{From any argument indexing to multi-argument indexing}
|
\subsection{From any argument indexing to multi-argument indexing}
|
||||||
%-----------------------------------------------------------------
|
%-----------------------------------------------------------------
|
||||||
The scheme of the previous section gives us only single argument
|
The scheme of the previous section gives us only single argument
|
||||||
indexing. However, all the infrastructure we need is already in place.
|
indexing. However, all the infrastructure we need is already in place.
|
||||||
We can use it to obtain (fixed-order) multi-argument \JITI in a
|
We can use it to obtain any fixed-order multi-argument \JITI in a
|
||||||
straightforward way.
|
straightforward way.
|
||||||
|
|
||||||
Note that the compiler knows exactly the set of clauses that need to
|
Note that the compiler knows exactly the set of clauses that need to
|
||||||
@ -650,7 +664,7 @@ requires the following extensions:
|
|||||||
indexing will be based. Writing such a code walking procedure is not
|
indexing will be based. Writing such a code walking procedure is not
|
||||||
hard.\footnote{In many Prolog systems, a procedure with similar
|
hard.\footnote{In many Prolog systems, a procedure with similar
|
||||||
functionality often exists for the disassembler, the debugger, etc.}
|
functionality often exists for the disassembler, the debugger, etc.}
|
||||||
\item Indexing on an argument that contains unconstrained variables
|
\item Indexing on a position that contains unconstrained variables
|
||||||
for some clauses is tricky. The WAM needs to group clauses in this
|
for some clauses is tricky. The WAM needs to group clauses in this
|
||||||
case and without special treatment creates two choice points for
|
case and without special treatment creates two choice points for
|
||||||
this argument (one for the variables and one per each group of
|
this argument (one for the variables and one per each group of
|
||||||
@ -658,7 +672,7 @@ requires the following extensions:
|
|||||||
by now. Possible solutions to it are described in a 1987 paper by
|
by now. Possible solutions to it are described in a 1987 paper by
|
||||||
Carlsson~\cite{FreezeIndexing@ICLP-87} and can be readily adapted to
|
Carlsson~\cite{FreezeIndexing@ICLP-87} and can be readily adapted to
|
||||||
\JITI. Alternatively, in a simple implementation, we can skip \JITI
|
\JITI. Alternatively, in a simple implementation, we can skip \JITI
|
||||||
for arguments with variables in some clauses.
|
for positions with variables in some clauses.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
Before describing \JITI more formally, we remark on the following
|
Before describing \JITI more formally, we remark on the following
|
||||||
design decisions whose rationale may not be immediately obvious:
|
design decisions whose rationale may not be immediately obvious:
|
||||||
@ -800,26 +814,25 @@ to a \switchSTAR WAM instruction.
|
|||||||
%-------------------------------------------------------------------------
|
%-------------------------------------------------------------------------
|
||||||
|
|
||||||
\paragraph*{Complexity properties.}
|
\paragraph*{Complexity properties.}
|
||||||
Complexity-wise, dynamic index construction does not add any overhead
|
Index construction during runtime does not change the complexity of
|
||||||
to program execution. First, note that each demanded index table will
|
query execution. First, note that each demanded index table will be
|
||||||
be constructed at most once. Also, a \jitiSTAR instruction will be
|
constructed at most once. Also, a \jitiSTAR instruction will be
|
||||||
encountered only in cases where execution would examine all clauses in
|
encountered only in cases where execution would examine all clauses in
|
||||||
the \TryRetryTrust chain.\footnote{This statement is possibly not
|
the \TryRetryTrust chain.\footnote{This statement is possibly not
|
||||||
valid the presence of Prolog cuts.} The construction visits these
|
valid the presence of Prolog cuts.} The construction visits these
|
||||||
clauses \emph{once} and then creates the index table in time linear in
|
clauses \emph{once} and then creates the index table in time linear in
|
||||||
the number of clauses. One pass over the list of $\langle c, L
|
the number of clauses as one pass over the list of $\langle c, L
|
||||||
\rangle$ pairs suffices. After index construction, execution will
|
\rangle$ pairs suffices. After index construction, execution will
|
||||||
visit only a subset of these clauses as the index table will be
|
visit a subset of these clauses as the index table will be consulted.
|
||||||
consulted.
|
|
||||||
%% Finally, note that the maximum number of \jitiSTAR instructions
|
%% Finally, note that the maximum number of \jitiSTAR instructions
|
||||||
%% that will be visited for each query is bounded by the maximum
|
%% that will be visited for each query is bounded by the maximum
|
||||||
%% number of index positions (symbols) in the clause heads of the
|
%% number of index positions (symbols) in the clause heads of the
|
||||||
%% predicate.
|
%% predicate.
|
||||||
Thus, in cases where \JITI is not effective, execution of a query will
|
Thus, in cases where \JITI is not effective, execution of a query will
|
||||||
at most double due to dynamic index construction. In fact, this worst
|
at most double due to dynamic index construction. In fact, this worst
|
||||||
case is extremely unlikely in practice. On the other hand, \JITI can
|
case is pessimistic and extremely unlikely in practice. On the other
|
||||||
change the complexity of evaluating a predicate call from $O(n)$ to
|
hand, \JITI can change the complexity of query evaluation from $O(n)$
|
||||||
$O(1)$ where $n$ is the number of clauses.
|
to $O(1)$ where $n$ is the number of clauses.
|
||||||
|
|
||||||
\subsection{More implementation choices}
|
\subsection{More implementation choices}
|
||||||
%---------------------------------------
|
%---------------------------------------
|
||||||
@ -857,9 +870,9 @@ instructions can either become inactive when this limit is reached, or
|
|||||||
better yet we can recover the space of some tables. To do so, we can
|
better yet we can recover the space of some tables. To do so, we can
|
||||||
employ any standard recycling algorithm (e.g., least recently used)
|
employ any standard recycling algorithm (e.g., least recently used)
|
||||||
and reclaim the of index tables that are no longer in use. This is
|
and reclaim the of index tables that are no longer in use. This is
|
||||||
easy to do by reverting the corresponding \jitiSTAR instructions back
|
easy to do by reverting the corresponding \switchSTAR instructions
|
||||||
to \switchSTAR instructions. If the indices are needed again, they can
|
back to \jitiSTAR instructions. If the indices are demanded again at a
|
||||||
simply be regenerated.
|
time when memory is available, they can simply be regenerated.
|
||||||
|
|
||||||
|
|
||||||
\section{Demand-Driven Indexing of Dynamic Predicates} \label{sec:dynamic}
|
\section{Demand-Driven Indexing of Dynamic Predicates} \label{sec:dynamic}
|
||||||
@ -893,9 +906,9 @@ arguments. As optimizations, we can avoid indexing for predicates with
|
|||||||
only one clause (these are often used to simulate global variables)
|
only one clause (these are often used to simulate global variables)
|
||||||
and we can exclude arguments where some clause has a variable.
|
and we can exclude arguments where some clause has a variable.
|
||||||
|
|
||||||
Under logical update semantics calls to a dynamic goal execute in a
|
Under logical update semantics calls to dynamic predicates execute in a
|
||||||
``snapshot'' of the corresponding predicate. In other words, each call
|
``snapshot'' of the corresponding predicate. In other words, each call
|
||||||
sees the clauses that existed at the time the call was made, even if
|
sees the clauses that existed at the time when the call was made, even if
|
||||||
some of the clauses were later deleted or new clauses were asserted.
|
some of the clauses were later deleted or new clauses were asserted.
|
||||||
If several calls are alive in the stack, several snapshots will be
|
If several calls are alive in the stack, several snapshots will be
|
||||||
alive at the same time. The standard solution to this problem is to
|
alive at the same time. The standard solution to this problem is to
|
||||||
@ -903,8 +916,8 @@ use time stamps to tell which clauses are \emph{live} for which calls.
|
|||||||
%
|
%
|
||||||
This solution complicates freeing index tables because (1) an index
|
This solution complicates freeing index tables because (1) an index
|
||||||
table holds references to clauses, and (2) the table may be in use,
|
table holds references to clauses, and (2) the table may be in use,
|
||||||
that is, it may be accesible from the execution stacks. A table thus
|
that is, it may be accessible from the execution stacks. An index
|
||||||
is killed in several steps:
|
table thus is killed in several steps:
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Detach the index table from the indexing tree.
|
\item Detach the index table from the indexing tree.
|
||||||
\item Recursively \emph{kill} every child of the current table:
|
\item Recursively \emph{kill} every child of the current table:
|
||||||
@ -920,6 +933,7 @@ is killed in several steps:
|
|||||||
%% the \emph{itemset-node}, so the emulator reads all the instruction's
|
%% the \emph{itemset-node}, so the emulator reads all the instruction's
|
||||||
%% arguments before executing the instruction.
|
%% arguments before executing the instruction.
|
||||||
|
|
||||||
|
|
||||||
\section{Implementation in XXX and in YAP} \label{sec:impl}
|
\section{Implementation in XXX and in YAP} \label{sec:impl}
|
||||||
%==========================================================
|
%==========================================================
|
||||||
The implementation of \JITI in XXX follows a variant of the scheme
|
The implementation of \JITI in XXX follows a variant of the scheme
|
||||||
@ -927,7 +941,7 @@ presented in Sect.~\ref{sec:static}. The compiler uses heuristics to
|
|||||||
determine the best argument to index on (i.e., this argument is not
|
determine the best argument to index on (i.e., this argument is not
|
||||||
necessarily the first) and employs \switchSTAR instructions for this
|
necessarily the first) and employs \switchSTAR instructions for this
|
||||||
task. It also statically generates \jitiONconstant instructions for
|
task. It also statically generates \jitiONconstant instructions for
|
||||||
other argument positions that are good candidates for \JITI.
|
other arguments that are good candidates for \JITI.
|
||||||
Currently, an argument is considered a good candidate if it has only
|
Currently, an argument is considered a good candidate if it has only
|
||||||
constants or only structure symbols in all clauses. Thus, XXX uses
|
constants or only structure symbols in all clauses. Thus, XXX uses
|
||||||
only \jitiONconstant and \jitiONstructure instructions, never a
|
only \jitiONconstant and \jitiONstructure instructions, never a
|
||||||
@ -935,11 +949,11 @@ only \jitiONconstant and \jitiONstructure instructions, never a
|
|||||||
symbols.\footnote{Instead, it prompts its user to request unification
|
symbols.\footnote{Instead, it prompts its user to request unification
|
||||||
factoring for predicates that look likely to benefit from indexing
|
factoring for predicates that look likely to benefit from indexing
|
||||||
inside compound terms. The user can then use the appropriate compiler
|
inside compound terms. The user can then use the appropriate compiler
|
||||||
directive for these predicates.} For dynamic predicates \JITI is
|
directive for these predicates.} For dynamic predicates, \JITI is
|
||||||
employed only if they consist of Datalog facts; if a clause which is
|
employed only if they consist of Datalog facts; if a clause which is
|
||||||
not a Datalog fact is asserted, all dynamically created index tables
|
not a Datalog fact is asserted, all dynamically created index tables
|
||||||
for the predicate are simply dropped and the \jitiONconstant
|
for the predicate are simply killed and the \jitiONconstant
|
||||||
instruction becomes a \instr{noop}. All these are done automatically,
|
instruction becomes a \instr{noop}. All this is done automatically,
|
||||||
but the user can disable \JITI in compiled code using an appropriate
|
but the user can disable \JITI in compiled code using an appropriate
|
||||||
compiler option.
|
compiler option.
|
||||||
|
|
||||||
@ -957,7 +971,8 @@ very much the same algorithm as static indexing: the key idea is that
|
|||||||
most nodes in the index tree must be allocated separately so that they
|
most nodes in the index tree must be allocated separately so that they
|
||||||
can grow or contract independently. YAP can index arguments where some
|
can grow or contract independently. YAP can index arguments where some
|
||||||
clauses have unconstrained variables, but only for static predicates,
|
clauses have unconstrained variables, but only for static predicates,
|
||||||
as it would complicate updates.
|
as in dynamic code this would complicate support for logical update
|
||||||
|
semantics.
|
||||||
|
|
||||||
YAP uses the term JITI (Just-In-Time Indexing) to refer to \JITI. In
|
YAP uses the term JITI (Just-In-Time Indexing) to refer to \JITI. In
|
||||||
the next section we will take the liberty to use this term as a
|
the next section we will take the liberty to use this term as a
|
||||||
@ -1099,63 +1114,62 @@ this benchmark.
|
|||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
\end{small}
|
\end{small}
|
||||||
|
|
||||||
% Our experience with the indexing algorithm described here shows a
|
%% Our experience with the indexing algorithm described here shows a
|
||||||
% significant performance improvement over the previous indexing code in
|
%% significant performance improvement over the previous indexing code in
|
||||||
% our system. Quite often, this has allowed us to tackle applications
|
%% our system. Quite often, this has allowed us to tackle applications
|
||||||
% which previously would not have been feasible. We next present some
|
%% which previously would not have been feasible.
|
||||||
% results that show how useful the algorithms can be.
|
|
||||||
|
|
||||||
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
|
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
|
||||||
%-------------------------------------------------------------------------
|
%-------------------------------------------------------------------------
|
||||||
The need for \JITI was originally motivated by ILP applications.
|
The need for \JITI was originally motivated by ILP applications.
|
||||||
Table~\ref{tab:ilp:time} shows JITI performance on some learning tasks
|
Table~\ref{tab:ilp:time} shows JITI performance on some learning tasks
|
||||||
using the ALEPH system~\cite{ALEPH}. The dataset \bench{Krki} tries to
|
using the ALEPH system~\cite{ALEPH}. The dataset \Krki tries to
|
||||||
learn rules from a small database of chess end-games;
|
learn rules from a small database of chess end-games;
|
||||||
\bench{GeneExpression} learns rules for yeast gene activity given a
|
\GeneExpression learns rules for yeast gene activity given a
|
||||||
database of genes, their interactions, and micro-array gene expression
|
database of genes, their interactions, and micro-array gene expression
|
||||||
data; \bench{BreastCancer} processes real-life patient reports towards
|
data; \BreastCancer processes real-life patient reports towards
|
||||||
predicting whether an abnormality may be malignant;
|
predicting whether an abnormality may be malignant;
|
||||||
\bench{IE-Protein\_Extraction} processes information extraction from
|
\IEProtein processes information extraction from
|
||||||
paper abstracts to search proteins; \bench{Susi} learns from shopping
|
paper abstracts to search proteins; \Susi learns from shopping
|
||||||
patterns; and \bench{Mesh} learns rules for finite-methods mesh
|
patterns; and \Mesh learns rules for finite-methods mesh
|
||||||
design. The datasets \bench{Carcinogenesis}, \bench{Choline},
|
design. The datasets \Carcinogenesis, \Choline,
|
||||||
\bench{Mutagenesis}, \bench{Pyrimidines}, and \bench{Thermolysin} are
|
\Mutagenesis, \Pyrimidines, and \Thermolysin try to
|
||||||
about predicting chemical properties of compounds. The first three
|
predict chemical properties of compounds. The first three
|
||||||
datasets store properties of interest as tables, but
|
datasets store properties of interest as tables, but
|
||||||
\bench{Thermolysin} learns from the 3D-structure of a molecule's
|
\Thermolysin learns from the 3D-structure of a molecule's
|
||||||
conformations. Several of these datasets are standard across Machine
|
conformations. Several of these datasets are standard across the Machine
|
||||||
Learning literature. \bench{GeneExpression}~\cite{} and
|
Learning literature. \GeneExpression~\cite{} and
|
||||||
\bench{BreastCancer}~\cite{} were partly developed by some of the
|
\BreastCancer~\cite{} were partly developed by some of the
|
||||||
paper's authors. Most datasets perform simple queries in an
|
paper's authors. Most datasets perform simple queries in an
|
||||||
extensional database. The exception is \bench{Mutagenesis} where
|
extensional database. The exception is \Mutagenesis where
|
||||||
several predicates are defined intensionally, requiring extensive
|
several predicates are defined intensionally, requiring extensive
|
||||||
computation.
|
computation.
|
||||||
|
|
||||||
%------------------------------------------------------------------------------
|
%------------------------------------------------------------------------------
|
||||||
\begin{table}[ht]
|
\begin{table}[t]
|
||||||
\centering
|
\centering
|
||||||
\caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
|
\caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
|
||||||
we give time for standard indexing with no indexing on dynamic
|
we give time for standard indexing with no indexing on dynamic
|
||||||
predicates versus the \JITI implementation}
|
predicates versus the \JITI implementation}
|
||||||
\label{tab:ilp:time}
|
\label{tab:ilp:time}
|
||||||
\setlength{\tabcolsep}{3pt}
|
\setlength{\tabcolsep}{3pt}
|
||||||
\begin {tabular}{|l||r|r|r|} \hline %\cline{1-3}
|
\begin{tabular}{|l||r|r|r|} \hline %\cline{1-3}
|
||||||
& \multicolumn{3}{|c|}{Time (in secs)} \\
|
& \multicolumn{3}{|c|}{Time (in secs)} \\
|
||||||
\cline{2-4}
|
\cline{2-4}
|
||||||
Benchmark & 1st & JITI &{\bf ratio} \\
|
Benchmark & 1st & JITI &{\bf ratio} \\
|
||||||
\hline
|
\hline
|
||||||
\bench{BreastCancer} & 1450 & 88 & 16 \\
|
\BreastCancer & 1450 & 88 & 16 \\
|
||||||
\bench{Carcinogenesis} & 17,705 & 192 & 92 \\
|
\Carcinogenesis & 17,705 & 192 & 92 \\
|
||||||
\bench{Choline} & 14,766 & 1,397 & 11 \\
|
\Choline & 14,766 & 1,397 & 11 \\
|
||||||
\bench{GeneExpression} & 193,283 & 7,483 & 26 \\
|
\GeneExpression & 193,283 & 7,483 & 26 \\
|
||||||
\bench{IE-Protein\_Extraction} & 1,677,146 & 2,909 & 577 \\
|
\IEProtein & 1,677,146 & 2,909 & 577 \\
|
||||||
\bench{Krki} & 0.3 & 0.3 & 1 \\
|
\bench{Krki} & 0.3 & 0.3 & 1 \\
|
||||||
\bench{Krki II} & 1.3 & 1.3 & 1 \\
|
\bench{Krki II} & 1.3 & 1.3 & 1 \\
|
||||||
\bench{Mesh} & 4 & 3 & 1.3 \\
|
\Mesh & 4 & 3 & 1.3 \\
|
||||||
\bench{Mutagenesis} & 51,775 & 27,746 & 1.9 \\
|
\bench{Mutagenesis} & 51,775 & 27,746 & 1.9 \\
|
||||||
\bench{Pyrimidines} & 487,545 & 253,235 & 1.9 \\
|
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
|
||||||
\bench{Susi} & 105,091 & 307 & 342 \\
|
\Susi & 105,091 & 307 & 342 \\
|
||||||
\bench{Thermolysin} & 50,279 & 5,213 & 10 \\
|
\Thermolysin & 50,279 & 5,213 & 10 \\
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table}
|
\end{table}
|
||||||
@ -1163,30 +1177,30 @@ computation.
|
|||||||
|
|
||||||
We compare times for 10 runs of the saturation/refinement cycle of the
|
We compare times for 10 runs of the saturation/refinement cycle of the
|
||||||
ILP system. Table~\ref{tab:ilp:time} shows time results. The
|
ILP system. Table~\ref{tab:ilp:time} shows time results. The
|
||||||
\bench{Krki} datasets have small search spaces and small databases, so
|
\Krki datasets have small search spaces and small databases, so
|
||||||
they achieve the same performance under both versions:
|
they achieve the same performance under both versions:
|
||||||
there is no slowdown. The \bench{Mesh}, \bench{Mutagenesis}, and
|
there is no slowdown. The \Mesh, \Mutagenesis, and
|
||||||
\bench{Pyrimides} applications do not benefit much from indexing in
|
\Pyrimidines applications do not benefit much from indexing in
|
||||||
the database, but they do benefit from indexing in the dynamic
|
the database, but they do benefit from indexing in the dynamic
|
||||||
representation of the search space, as their running times halve.
|
representation of the search space, as their running times halve.
|
||||||
|
|
||||||
The \bench{BreastCancer} and \bench{GeneExpression} applications use
|
The \BreastCancer and \GeneExpression applications use data in
|
||||||
1NF data (that is, unstructured data). The benefit here is mostly from
|
1NF (that is, unstructured data). The benefit here is mostly from
|
||||||
multiple-argument indexing. \bench{BreastCancer} is particularly
|
multiple-argument indexing. \BreastCancer is particularly
|
||||||
interesting. It consists of 40 binary relations with 65k elements
|
interesting. It consists of 40 binary relations with 65k elements
|
||||||
each, where the first argument is the key, like in
|
each, where the first argument is the key, like in \sgCyl. We know
|
||||||
\bench{sg\_cyl}. We know that most calls have the first argument
|
that most calls have the first argument bound, hence indexing was not
|
||||||
bound, hence indexing was not expected to matter very much. Instead,
|
expected to matter very much. Instead, the results show \JITI running
|
||||||
the results show \JITI running time to improve by an order of
|
time to improve by an order of magnitude. Like \sgCyl, this
|
||||||
magnitude. Like in \bench{sg\_cyl}, this suggests that even a small
|
suggests that even a small percentage of badly indexed calls can end
|
||||||
percentage of badly indexed calls can come to dominate running time.
|
up dominating runtime.
|
||||||
|
|
||||||
\bench{IE-Protein\_Extraction} and \bench{Thermolysin} are example
|
\IEProtein and \Thermolysin are example
|
||||||
applications that manipulate structured data.
|
applications that manipulate structured data.
|
||||||
\bench{IE-Protein\_Extraction} is the largest dataset we consider,
|
\IEProtein is the largest dataset we consider,
|
||||||
and indexing is simply critical: it is not possible to run the
|
and indexing is absolutely critical: it is not possible to run the
|
||||||
application in reasonable time with one argument
|
application in reasonable time with first argument
|
||||||
indexing. \bench{Thermolysin} is smaller and performs some
|
indexing. \Thermolysin is smaller and performs some
|
||||||
computation per query: even so, indexing improves performance by an
|
computation per query: even so, indexing improves performance by an
|
||||||
order of magnitude.
|
order of magnitude.
|
||||||
|
|
||||||
@ -1201,79 +1215,81 @@ order of magnitude.
|
|||||||
Benchmark & \textbf{Clause} & {\bf Index} & \textbf{Clause} & {\bf Index} \\
|
Benchmark & \textbf{Clause} & {\bf Index} & \textbf{Clause} & {\bf Index} \\
|
||||||
% \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\
|
% \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\
|
||||||
\hline
|
\hline
|
||||||
\bench{BreastCancer}
|
\BreastCancer
|
||||||
& 60940 & 46887
|
& 60,940 & 46,887
|
||||||
% & 46242 & 3126 & 125
|
% & 46242 & 3126 & 125
|
||||||
& 630 & 14
|
& 630 & 14
|
||||||
% &42 & 18& 57 &6
|
% &42 & 18& 57 &6
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Carcinogenesis}
|
\Carcinogenesis
|
||||||
& 1801 & 2678
|
& 1801 & 2678
|
||||||
% &1225 & 587 & 865
|
% &1225 & 587 & 865
|
||||||
& 13512 & 942
|
& 13,512 & 942
|
||||||
%& 291 & 91 & 457 & 102
|
%& 291 & 91 & 457 & 102
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Choline} & 666 & 174
|
\Choline & 666 & 174
|
||||||
% &67 & 48 & 58
|
% &67 & 48 & 58
|
||||||
& 3172 & 174
|
& 3172 & 174
|
||||||
% & 76 & 4 & 48 & 45
|
% & 76 & 4 & 48 & 45
|
||||||
\\
|
\\
|
||||||
\bench{GeneExpression} & 46726 & 22629
|
|
||||||
% &6780 & 6473 & 9375
|
|
||||||
& 116463 & 9015
|
|
||||||
%& 2703 & 932 & 3910 & 1469
|
|
||||||
\\
|
|
||||||
|
|
||||||
\bench{IE-Protein\_Extraction} &146033 & 129333
|
\GeneExpression
|
||||||
|
& 46,726 & 22,629
|
||||||
|
% &6780 & 6473 & 9375
|
||||||
|
& 116,463 & 9015
|
||||||
|
%& 2703 & 932 & 3910 & 1469
|
||||||
|
\\
|
||||||
|
|
||||||
|
\bench{IE-Protein\_Extraction}
|
||||||
|
& 146,033 & 129,333
|
||||||
%&39279 & 24322 & 65732
|
%&39279 & 24322 & 65732
|
||||||
& 53423 & 1531
|
& 53,423 & 1531
|
||||||
%& 467 & 108 & 868 & 86
|
%& 467 & 108 & 868 & 86
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Krki} & 678 & 117
|
\bench{Krki} & 678 & 117
|
||||||
%&52 & 24 & 40
|
%&52 & 24 & 40
|
||||||
& 2047 & 24
|
& 2047 & 24
|
||||||
%& 10 & 2 & 10 & 1
|
%& 10 & 2 & 10 & 1
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Krki II} & 1866 & 715
|
\bench{Krki II} & 1866 & 715
|
||||||
%&180 & 233 & 301
|
%&180 & 233 & 301
|
||||||
& 2055 & 26
|
& 2055 & 26
|
||||||
%& 11 & 2 & 11 & 1
|
%& 11 & 2 & 11 & 1
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Mesh} & 802 & 161
|
\bench{Mesh} & 802 & 161
|
||||||
%&49 & 18 & 93
|
%&49 & 18 & 93
|
||||||
& 2149 & 109
|
& 2149 & 109
|
||||||
%& 46 & 4 & 35 & 22
|
%& 46 & 4 & 35 & 22
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Mutagenesis} & 1412 & 1848
|
\bench{Mutagenesis} & 1412 & 1848
|
||||||
%&1045 & 291 & 510
|
%&1045 & 291 & 510
|
||||||
& 4302 & 595
|
& 4302 & 595
|
||||||
%& 156 & 114 & 264 & 61
|
%& 156 & 114 & 264 & 61
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Pyrimidines} & 774 & 218
|
\bench{Pyrimidines} & 774 & 218
|
||||||
%&76 & 63 & 77
|
%&76 & 63 & 77
|
||||||
& 25840 & 12291
|
& 25,840 & 12,291
|
||||||
%& 4847 & 43 & 3510 & 3888
|
%& 4847 & 43 & 3510 & 3888
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Susi} & 5007 & 2509
|
\bench{Susi} & 5007 & 2509
|
||||||
%&855 & 578 & 1076
|
%&855 & 578 & 1076
|
||||||
& 4497 & 759
|
& 4497 & 759
|
||||||
%& 324 & 58 & 256 & 120
|
%& 324 & 58 & 256 & 120
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\bench{Thermolysin} & 2317 & 929
|
\bench{Thermolysin} & 2317 & 929
|
||||||
%&429 & 184 & 315
|
%&429 & 184 & 315
|
||||||
& 116129 & 7064
|
& 116,129 & 7064
|
||||||
%& 3295 & 1438 & 2160 & 170
|
%& 3295 & 1438 & 2160 & 170
|
||||||
\\
|
\\
|
||||||
|
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table*}
|
\end{table*}
|
||||||
@ -1287,12 +1303,12 @@ usage on \emph{static} predicates. Static data-base sizes range from
|
|||||||
146MB (\bench{IE-Protein\_Extraction} to less than a MB
|
146MB (\bench{IE-Protein\_Extraction} to less than a MB
|
||||||
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
|
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
|
||||||
more than the original code, as in \bench{Mutagenesis}, or almost as
|
more than the original code, as in \bench{Mutagenesis}, or almost as
|
||||||
much, eg, \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
|
much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
|
||||||
adds at least a third and often a half to the original data-base. A
|
adds at least a third and often a half to the original data-base. A
|
||||||
more detailed analysis shows the source of overhead to be very
|
more detailed analysis shows the source of overhead to be very
|
||||||
different from dataset to dataset. In \bench{IE-Protein\_Extraction}
|
different from dataset to dataset. In \bench{IE-Protein\_Extraction}
|
||||||
the problem is that hash tables are very large. Hash tables are also
|
the problem is that hash tables are very large. Hash tables are also
|
||||||
where most space is spent in \bench{Susi}. In \bench{BreastCancer}
|
where most space is spent in \bench{Susi}. In \BreastCancer
|
||||||
hash tables are actually small, so most space is spent in
|
hash tables are actually small, so most space is spent in
|
||||||
\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
|
\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
|
||||||
spends a large effort in indexing it still generates long
|
spends a large effort in indexing it still generates long
|
||||||
|
Reference in New Issue
Block a user