Revised up to Section 7.
git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1831 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
parent
7afc0fdd07
commit
9ed8306415
@ -48,6 +48,19 @@
|
||||
\newcommand{\pta}{\bench{pta}\xspace}
|
||||
\newcommand{\tea}{\bench{tea}\xspace}
|
||||
%------------------------------------------------------------------------------
|
||||
\newcommand{\BreastCancer}{\bench{BreastCancer}\xspace}
|
||||
\newcommand{\Carcinogenesis}{\bench{Carcinogenesis}\xspace}
|
||||
\newcommand{\Choline}{\bench{Choline}\xspace}
|
||||
\newcommand{\GeneExpression}{\bench{GeneExpression}\xspace}
|
||||
\newcommand{\IEProtein}{\bench{IE-Protein\_Extraction}\xspace}
|
||||
\newcommand{\Krki}{\bench{Krki}\xspace}
|
||||
\newcommand{\KrkiII}{\bench{Krki~II}\xspace}
|
||||
\newcommand{\Mesh}{\bench{Mesh}\xspace}
|
||||
\newcommand{\Mutagenesis}{\bench{Mutagenesis}\xspace}
|
||||
\newcommand{\Pyrimidines}{\bench{Pyrimidines}\xspace}
|
||||
\newcommand{\Susi}{\bench{Susi}\xspace}
|
||||
\newcommand{\Thermolysin}{\bench{Thermolysin}\xspace}
|
||||
%------------------------------------------------------------------------------
|
||||
\newenvironment{SmallProg}{\begin{tt}\begin{small}\begin{tabular}[b]{l}}{\end{tabular}\end{small}\end{tt}}
|
||||
\newenvironment{ScriptProg}{\begin{tt}\begin{scriptsize}\begin{tabular}[b]{l}}{\end{tabular}\end{scriptsize}\end{tt}}
|
||||
\newenvironment{FootProg}{\begin{tt}\begin{footnotesize}\begin{tabular}[c]{l}}{\end{tabular}\end{footnotesize}\end{tt}}
|
||||
@ -120,7 +133,7 @@ For example, first argument indexing is sufficient for many Prolog
|
||||
applications. However, it is clearly sub-optimal for applications
|
||||
accessing large databases; for a long time now, the database community
|
||||
has recognized that good indexing is the basis for fast query
|
||||
processing~\cite{}.
|
||||
processing.
|
||||
|
||||
As logic programming applications grow in size, Prolog systems need to
|
||||
efficiently access larger and larger data sets and the need for any-
|
||||
@ -144,7 +157,7 @@ the method needs to cater for code updates during runtime. Where our
|
||||
schemes radically depart from current practice is that they generate
|
||||
new byte code during runtime, in effect doing a form of just-in-time
|
||||
compilation. In our experience these schemes pay off. We have
|
||||
implemented \JITI in two different Prolog systems (Yap and XXX) and
|
||||
implemented \JITI in two different Prolog systems (YAP and XXX) and
|
||||
have obtained non-trivial speedups, ranging from a few percent to
|
||||
orders of magnitude, across a wide range of applications. Given these
|
||||
results, we see very little reason for Prolog systems not to
|
||||
@ -226,14 +239,14 @@ systems currently do not provide the type of indexing that
|
||||
applications require. Even in systems like Ciao~\cite{Ciao@SCP-05},
|
||||
which do come with built-in static analysis and more or less force
|
||||
such a discipline on the programmer, mode information is not used for
|
||||
multi-argument indexing!
|
||||
multi-argument indexing.
|
||||
|
||||
% The grand finale:
|
||||
The situation is actually worse for certain types of Prolog
|
||||
applications. For example, consider applications in the area of
|
||||
inductive logic programming. These applications on the one hand have
|
||||
big demands for effective indexing since they need to efficiently
|
||||
access big datasets and on the other they are very unfit for static
|
||||
high demands for effective indexing since they need to efficiently
|
||||
access big datasets and on the other they are unfit for static
|
||||
analysis since queries are often ad hoc and generated only during
|
||||
runtime as new hypotheses are formed or refined.
|
||||
%
|
||||
@ -241,11 +254,11 @@ Our thesis is that the Prolog abstract machine should be able to adapt
|
||||
automatically to the runtime requirements of such or, even better, of
|
||||
all applications by employing increasingly aggressive forms of dynamic
|
||||
compilation. As a concrete example of what this means in practice, in
|
||||
this paper we will attack the problem of providing effective indexing
|
||||
during runtime. Naturally, we will base our technique on the existing
|
||||
support for indexing that the WAM provides, but we will extend this
|
||||
support with the technique of \JITI that we describe in the next
|
||||
sections.
|
||||
this paper we will attack the problem of satisfying the indexing needs
|
||||
of applications during runtime. Naturally, we will base our technique
|
||||
on the existing support for indexing that the WAM provides, but we
|
||||
will extend this support with the technique of \JITI that we describe
|
||||
in the next sections.
|
||||
|
||||
|
||||
\section{Indexing in the WAM} \label{sec:prelims}
|
||||
@ -271,7 +284,7 @@ equivalently, \instr{N} is the size of the hash table). In each bucket
|
||||
of this hash table and also in the bucket for the variable case of
|
||||
\switchONterm the code performs a sequential backtracking search of
|
||||
the clauses using a \TryRetryTrust chain of instructions. The \try
|
||||
instruction sets up a choice point, the \retry instructions (if any)
|
||||
instruction sets up a choice point, the \retry instructions (if~any)
|
||||
update certain fields of this choice point, and the \trust instruction
|
||||
removes it.
|
||||
|
||||
@ -529,13 +542,14 @@ heuristically decide that some arguments are most likely than others
|
||||
to be used in the \code{in} mode. Then we can simply place the
|
||||
\jitiONconstant instructions for these arguments \emph{before} the
|
||||
instructions for other arguments. This is possible since all indexing
|
||||
instructions take the argument register number as an argument.
|
||||
instructions take the argument register number as an argument; their
|
||||
order does not matter.
|
||||
|
||||
\subsection{From any argument indexing to multi-argument indexing}
|
||||
%-----------------------------------------------------------------
|
||||
The scheme of the previous section gives us only single argument
|
||||
indexing. However, all the infrastructure we need is already in place.
|
||||
We can use it to obtain (fixed-order) multi-argument \JITI in a
|
||||
We can use it to obtain any fixed-order multi-argument \JITI in a
|
||||
straightforward way.
|
||||
|
||||
Note that the compiler knows exactly the set of clauses that need to
|
||||
@ -650,7 +664,7 @@ requires the following extensions:
|
||||
indexing will be based. Writing such a code walking procedure is not
|
||||
hard.\footnote{In many Prolog systems, a procedure with similar
|
||||
functionality often exists for the disassembler, the debugger, etc.}
|
||||
\item Indexing on an argument that contains unconstrained variables
|
||||
\item Indexing on a position that contains unconstrained variables
|
||||
for some clauses is tricky. The WAM needs to group clauses in this
|
||||
case and without special treatment creates two choice points for
|
||||
this argument (one for the variables and one per each group of
|
||||
@ -658,7 +672,7 @@ requires the following extensions:
|
||||
by now. Possible solutions to it are described in a 1987 paper by
|
||||
Carlsson~\cite{FreezeIndexing@ICLP-87} and can be readily adapted to
|
||||
\JITI. Alternatively, in a simple implementation, we can skip \JITI
|
||||
for arguments with variables in some clauses.
|
||||
for positions with variables in some clauses.
|
||||
\end{enumerate}
|
||||
Before describing \JITI more formally, we remark on the following
|
||||
design decisions whose rationale may not be immediately obvious:
|
||||
@ -800,26 +814,25 @@ to a \switchSTAR WAM instruction.
|
||||
%-------------------------------------------------------------------------
|
||||
|
||||
\paragraph*{Complexity properties.}
|
||||
Complexity-wise, dynamic index construction does not add any overhead
|
||||
to program execution. First, note that each demanded index table will
|
||||
be constructed at most once. Also, a \jitiSTAR instruction will be
|
||||
Index construction during runtime does not change the complexity of
|
||||
query execution. First, note that each demanded index table will be
|
||||
constructed at most once. Also, a \jitiSTAR instruction will be
|
||||
encountered only in cases where execution would examine all clauses in
|
||||
the \TryRetryTrust chain.\footnote{This statement is possibly not
|
||||
valid the presence of Prolog cuts.} The construction visits these
|
||||
clauses \emph{once} and then creates the index table in time linear in
|
||||
the number of clauses. One pass over the list of $\langle c, L
|
||||
the number of clauses as one pass over the list of $\langle c, L
|
||||
\rangle$ pairs suffices. After index construction, execution will
|
||||
visit only a subset of these clauses as the index table will be
|
||||
consulted.
|
||||
visit a subset of these clauses as the index table will be consulted.
|
||||
%% Finally, note that the maximum number of \jitiSTAR instructions
|
||||
%% that will be visited for each query is bounded by the maximum
|
||||
%% number of index positions (symbols) in the clause heads of the
|
||||
%% predicate.
|
||||
Thus, in cases where \JITI is not effective, execution of a query will
|
||||
at most double due to dynamic index construction. In fact, this worst
|
||||
case is extremely unlikely in practice. On the other hand, \JITI can
|
||||
change the complexity of evaluating a predicate call from $O(n)$ to
|
||||
$O(1)$ where $n$ is the number of clauses.
|
||||
case is pessimistic and extremely unlikely in practice. On the other
|
||||
hand, \JITI can change the complexity of query evaluation from $O(n)$
|
||||
to $O(1)$ where $n$ is the number of clauses.
|
||||
|
||||
\subsection{More implementation choices}
|
||||
%---------------------------------------
|
||||
@ -857,9 +870,9 @@ instructions can either become inactive when this limit is reached, or
|
||||
better yet we can recover the space of some tables. To do so, we can
|
||||
employ any standard recycling algorithm (e.g., least recently used)
|
||||
and reclaim the of index tables that are no longer in use. This is
|
||||
easy to do by reverting the corresponding \jitiSTAR instructions back
|
||||
to \switchSTAR instructions. If the indices are needed again, they can
|
||||
simply be regenerated.
|
||||
easy to do by reverting the corresponding \switchSTAR instructions
|
||||
back to \jitiSTAR instructions. If the indices are demanded again at a
|
||||
time when memory is available, they can simply be regenerated.
|
||||
|
||||
|
||||
\section{Demand-Driven Indexing of Dynamic Predicates} \label{sec:dynamic}
|
||||
@ -893,9 +906,9 @@ arguments. As optimizations, we can avoid indexing for predicates with
|
||||
only one clause (these are often used to simulate global variables)
|
||||
and we can exclude arguments where some clause has a variable.
|
||||
|
||||
Under logical update semantics calls to a dynamic goal execute in a
|
||||
Under logical update semantics calls to dynamic predicates execute in a
|
||||
``snapshot'' of the corresponding predicate. In other words, each call
|
||||
sees the clauses that existed at the time the call was made, even if
|
||||
sees the clauses that existed at the time when the call was made, even if
|
||||
some of the clauses were later deleted or new clauses were asserted.
|
||||
If several calls are alive in the stack, several snapshots will be
|
||||
alive at the same time. The standard solution to this problem is to
|
||||
@ -903,8 +916,8 @@ use time stamps to tell which clauses are \emph{live} for which calls.
|
||||
%
|
||||
This solution complicates freeing index tables because (1) an index
|
||||
table holds references to clauses, and (2) the table may be in use,
|
||||
that is, it may be accesible from the execution stacks. A table thus
|
||||
is killed in several steps:
|
||||
that is, it may be accessible from the execution stacks. An index
|
||||
table thus is killed in several steps:
|
||||
\begin{enumerate}
|
||||
\item Detach the index table from the indexing tree.
|
||||
\item Recursively \emph{kill} every child of the current table:
|
||||
@ -920,6 +933,7 @@ is killed in several steps:
|
||||
%% the \emph{itemset-node}, so the emulator reads all the instruction's
|
||||
%% arguments before executing the instruction.
|
||||
|
||||
|
||||
\section{Implementation in XXX and in YAP} \label{sec:impl}
|
||||
%==========================================================
|
||||
The implementation of \JITI in XXX follows a variant of the scheme
|
||||
@ -927,7 +941,7 @@ presented in Sect.~\ref{sec:static}. The compiler uses heuristics to
|
||||
determine the best argument to index on (i.e., this argument is not
|
||||
necessarily the first) and employs \switchSTAR instructions for this
|
||||
task. It also statically generates \jitiONconstant instructions for
|
||||
other argument positions that are good candidates for \JITI.
|
||||
other arguments that are good candidates for \JITI.
|
||||
Currently, an argument is considered a good candidate if it has only
|
||||
constants or only structure symbols in all clauses. Thus, XXX uses
|
||||
only \jitiONconstant and \jitiONstructure instructions, never a
|
||||
@ -935,11 +949,11 @@ only \jitiONconstant and \jitiONstructure instructions, never a
|
||||
symbols.\footnote{Instead, it prompts its user to request unification
|
||||
factoring for predicates that look likely to benefit from indexing
|
||||
inside compound terms. The user can then use the appropriate compiler
|
||||
directive for these predicates.} For dynamic predicates \JITI is
|
||||
directive for these predicates.} For dynamic predicates, \JITI is
|
||||
employed only if they consist of Datalog facts; if a clause which is
|
||||
not a Datalog fact is asserted, all dynamically created index tables
|
||||
for the predicate are simply dropped and the \jitiONconstant
|
||||
instruction becomes a \instr{noop}. All these are done automatically,
|
||||
for the predicate are simply killed and the \jitiONconstant
|
||||
instruction becomes a \instr{noop}. All this is done automatically,
|
||||
but the user can disable \JITI in compiled code using an appropriate
|
||||
compiler option.
|
||||
|
||||
@ -957,7 +971,8 @@ very much the same algorithm as static indexing: the key idea is that
|
||||
most nodes in the index tree must be allocated separately so that they
|
||||
can grow or contract independently. YAP can index arguments where some
|
||||
clauses have unconstrained variables, but only for static predicates,
|
||||
as it would complicate updates.
|
||||
as in dynamic code this would complicate support for logical update
|
||||
semantics.
|
||||
|
||||
YAP uses the term JITI (Just-In-Time Indexing) to refer to \JITI. In
|
||||
the next section we will take the liberty to use this term as a
|
||||
@ -1099,63 +1114,62 @@ this benchmark.
|
||||
\end{verbatim}
|
||||
\end{small}
|
||||
|
||||
% Our experience with the indexing algorithm described here shows a
|
||||
% significant performance improvement over the previous indexing code in
|
||||
% our system. Quite often, this has allowed us to tackle applications
|
||||
% which previously would not have been feasible. We next present some
|
||||
% results that show how useful the algorithms can be.
|
||||
%% Our experience with the indexing algorithm described here shows a
|
||||
%% significant performance improvement over the previous indexing code in
|
||||
%% our system. Quite often, this has allowed us to tackle applications
|
||||
%% which previously would not have been feasible.
|
||||
|
||||
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
|
||||
%-------------------------------------------------------------------------
|
||||
The need for \JITI was originally motivated by ILP applications.
|
||||
Table~\ref{tab:ilp:time} shows JITI performance on some learning tasks
|
||||
using the ALEPH system~\cite{ALEPH}. The dataset \bench{Krki} tries to
|
||||
using the ALEPH system~\cite{ALEPH}. The dataset \Krki tries to
|
||||
learn rules from a small database of chess end-games;
|
||||
\bench{GeneExpression} learns rules for yeast gene activity given a
|
||||
\GeneExpression learns rules for yeast gene activity given a
|
||||
database of genes, their interactions, and micro-array gene expression
|
||||
data; \bench{BreastCancer} processes real-life patient reports towards
|
||||
data; \BreastCancer processes real-life patient reports towards
|
||||
predicting whether an abnormality may be malignant;
|
||||
\bench{IE-Protein\_Extraction} processes information extraction from
|
||||
paper abstracts to search proteins; \bench{Susi} learns from shopping
|
||||
patterns; and \bench{Mesh} learns rules for finite-methods mesh
|
||||
design. The datasets \bench{Carcinogenesis}, \bench{Choline},
|
||||
\bench{Mutagenesis}, \bench{Pyrimidines}, and \bench{Thermolysin} are
|
||||
about predicting chemical properties of compounds. The first three
|
||||
\IEProtein processes information extraction from
|
||||
paper abstracts to search proteins; \Susi learns from shopping
|
||||
patterns; and \Mesh learns rules for finite-methods mesh
|
||||
design. The datasets \Carcinogenesis, \Choline,
|
||||
\Mutagenesis, \Pyrimidines, and \Thermolysin try to
|
||||
predict chemical properties of compounds. The first three
|
||||
datasets store properties of interest as tables, but
|
||||
\bench{Thermolysin} learns from the 3D-structure of a molecule's
|
||||
conformations. Several of these datasets are standard across Machine
|
||||
Learning literature. \bench{GeneExpression}~\cite{} and
|
||||
\bench{BreastCancer}~\cite{} were partly developed by some of the
|
||||
\Thermolysin learns from the 3D-structure of a molecule's
|
||||
conformations. Several of these datasets are standard across the Machine
|
||||
Learning literature. \GeneExpression~\cite{} and
|
||||
\BreastCancer~\cite{} were partly developed by some of the
|
||||
paper's authors. Most datasets perform simple queries in an
|
||||
extensional database. The exception is \bench{Mutagenesis} where
|
||||
extensional database. The exception is \Mutagenesis where
|
||||
several predicates are defined intensionally, requiring extensive
|
||||
computation.
|
||||
|
||||
%------------------------------------------------------------------------------
|
||||
\begin{table}[ht]
|
||||
\begin{table}[t]
|
||||
\centering
|
||||
\caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
|
||||
we give time for standard indexing with no indexing on dynamic
|
||||
predicates versus the \JITI implementation}
|
||||
\label{tab:ilp:time}
|
||||
\setlength{\tabcolsep}{3pt}
|
||||
\begin {tabular}{|l||r|r|r|} \hline %\cline{1-3}
|
||||
\begin{tabular}{|l||r|r|r|} \hline %\cline{1-3}
|
||||
& \multicolumn{3}{|c|}{Time (in secs)} \\
|
||||
\cline{2-4}
|
||||
Benchmark & 1st & JITI &{\bf ratio} \\
|
||||
\hline
|
||||
\bench{BreastCancer} & 1450 & 88 & 16 \\
|
||||
\bench{Carcinogenesis} & 17,705 & 192 & 92 \\
|
||||
\bench{Choline} & 14,766 & 1,397 & 11 \\
|
||||
\bench{GeneExpression} & 193,283 & 7,483 & 26 \\
|
||||
\bench{IE-Protein\_Extraction} & 1,677,146 & 2,909 & 577 \\
|
||||
\BreastCancer & 1450 & 88 & 16 \\
|
||||
\Carcinogenesis & 17,705 & 192 & 92 \\
|
||||
\Choline & 14,766 & 1,397 & 11 \\
|
||||
\GeneExpression & 193,283 & 7,483 & 26 \\
|
||||
\IEProtein & 1,677,146 & 2,909 & 577 \\
|
||||
\bench{Krki} & 0.3 & 0.3 & 1 \\
|
||||
\bench{Krki II} & 1.3 & 1.3 & 1 \\
|
||||
\bench{Mesh} & 4 & 3 & 1.3 \\
|
||||
\Mesh & 4 & 3 & 1.3 \\
|
||||
\bench{Mutagenesis} & 51,775 & 27,746 & 1.9 \\
|
||||
\bench{Pyrimidines} & 487,545 & 253,235 & 1.9 \\
|
||||
\bench{Susi} & 105,091 & 307 & 342 \\
|
||||
\bench{Thermolysin} & 50,279 & 5,213 & 10 \\
|
||||
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
|
||||
\Susi & 105,091 & 307 & 342 \\
|
||||
\Thermolysin & 50,279 & 5,213 & 10 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
@ -1163,30 +1177,30 @@ computation.
|
||||
|
||||
We compare times for 10 runs of the saturation/refinement cycle of the
|
||||
ILP system. Table~\ref{tab:ilp:time} shows time results. The
|
||||
\bench{Krki} datasets have small search spaces and small databases, so
|
||||
\Krki datasets have small search spaces and small databases, so
|
||||
they achieve the same performance under both versions:
|
||||
there is no slowdown. The \bench{Mesh}, \bench{Mutagenesis}, and
|
||||
\bench{Pyrimides} applications do not benefit much from indexing in
|
||||
there is no slowdown. The \Mesh, \Mutagenesis, and
|
||||
\Pyrimidines applications do not benefit much from indexing in
|
||||
the database, but they do benefit from indexing in the dynamic
|
||||
representation of the search space, as their running times halve.
|
||||
|
||||
The \bench{BreastCancer} and \bench{GeneExpression} applications use
|
||||
1NF data (that is, unstructured data). The benefit here is mostly from
|
||||
multiple-argument indexing. \bench{BreastCancer} is particularly
|
||||
The \BreastCancer and \GeneExpression applications use data in
|
||||
1NF (that is, unstructured data). The benefit here is mostly from
|
||||
multiple-argument indexing. \BreastCancer is particularly
|
||||
interesting. It consists of 40 binary relations with 65k elements
|
||||
each, where the first argument is the key, like in
|
||||
\bench{sg\_cyl}. We know that most calls have the first argument
|
||||
bound, hence indexing was not expected to matter very much. Instead,
|
||||
the results show \JITI running time to improve by an order of
|
||||
magnitude. Like in \bench{sg\_cyl}, this suggests that even a small
|
||||
percentage of badly indexed calls can come to dominate running time.
|
||||
each, where the first argument is the key, like in \sgCyl. We know
|
||||
that most calls have the first argument bound, hence indexing was not
|
||||
expected to matter very much. Instead, the results show \JITI running
|
||||
time to improve by an order of magnitude. Like \sgCyl, this
|
||||
suggests that even a small percentage of badly indexed calls can end
|
||||
up dominating runtime.
|
||||
|
||||
\bench{IE-Protein\_Extraction} and \bench{Thermolysin} are example
|
||||
\IEProtein and \Thermolysin are example
|
||||
applications that manipulate structured data.
|
||||
\bench{IE-Protein\_Extraction} is the largest dataset we consider,
|
||||
and indexing is simply critical: it is not possible to run the
|
||||
application in reasonable time with one argument
|
||||
indexing. \bench{Thermolysin} is smaller and performs some
|
||||
\IEProtein is the largest dataset we consider,
|
||||
and indexing is absolutely critical: it is not possible to run the
|
||||
application in reasonable time with first argument
|
||||
indexing. \Thermolysin is smaller and performs some
|
||||
computation per query: even so, indexing improves performance by an
|
||||
order of magnitude.
|
||||
|
||||
@ -1201,34 +1215,37 @@ order of magnitude.
|
||||
Benchmark & \textbf{Clause} & {\bf Index} & \textbf{Clause} & {\bf Index} \\
|
||||
% \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\
|
||||
\hline
|
||||
\bench{BreastCancer}
|
||||
& 60940 & 46887
|
||||
\BreastCancer
|
||||
& 60,940 & 46,887
|
||||
% & 46242 & 3126 & 125
|
||||
& 630 & 14
|
||||
% &42 & 18& 57 &6
|
||||
\\
|
||||
|
||||
\bench{Carcinogenesis}
|
||||
\Carcinogenesis
|
||||
& 1801 & 2678
|
||||
% &1225 & 587 & 865
|
||||
& 13512 & 942
|
||||
& 13,512 & 942
|
||||
%& 291 & 91 & 457 & 102
|
||||
\\
|
||||
|
||||
\bench{Choline} & 666 & 174
|
||||
\Choline & 666 & 174
|
||||
% &67 & 48 & 58
|
||||
& 3172 & 174
|
||||
% & 76 & 4 & 48 & 45
|
||||
\\
|
||||
\bench{GeneExpression} & 46726 & 22629
|
||||
|
||||
\GeneExpression
|
||||
& 46,726 & 22,629
|
||||
% &6780 & 6473 & 9375
|
||||
& 116463 & 9015
|
||||
& 116,463 & 9015
|
||||
%& 2703 & 932 & 3910 & 1469
|
||||
\\
|
||||
|
||||
\bench{IE-Protein\_Extraction} &146033 & 129333
|
||||
\bench{IE-Protein\_Extraction}
|
||||
& 146,033 & 129,333
|
||||
%&39279 & 24322 & 65732
|
||||
& 53423 & 1531
|
||||
& 53,423 & 1531
|
||||
%& 467 & 108 & 868 & 86
|
||||
\\
|
||||
|
||||
@ -1258,7 +1275,7 @@ order of magnitude.
|
||||
|
||||
\bench{Pyrimidines} & 774 & 218
|
||||
%&76 & 63 & 77
|
||||
& 25840 & 12291
|
||||
& 25,840 & 12,291
|
||||
%& 4847 & 43 & 3510 & 3888
|
||||
\\
|
||||
|
||||
@ -1270,10 +1287,9 @@ order of magnitude.
|
||||
|
||||
\bench{Thermolysin} & 2317 & 929
|
||||
%&429 & 184 & 315
|
||||
& 116129 & 7064
|
||||
& 116,129 & 7064
|
||||
%& 3295 & 1438 & 2160 & 170
|
||||
\\
|
||||
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table*}
|
||||
@ -1287,12 +1303,12 @@ usage on \emph{static} predicates. Static data-base sizes range from
|
||||
146MB (\bench{IE-Protein\_Extraction} to less than a MB
|
||||
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
|
||||
more than the original code, as in \bench{Mutagenesis}, or almost as
|
||||
much, eg, \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
|
||||
much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
|
||||
adds at least a third and often a half to the original data-base. A
|
||||
more detailed analysis shows the source of overhead to be very
|
||||
different from dataset to dataset. In \bench{IE-Protein\_Extraction}
|
||||
the problem is that hash tables are very large. Hash tables are also
|
||||
where most space is spent in \bench{Susi}. In \bench{BreastCancer}
|
||||
where most space is spent in \bench{Susi}. In \BreastCancer
|
||||
hash tables are actually small, so most space is spent in
|
||||
\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
|
||||
spends a large effort in indexing it still generates long
|
||||
|
Reference in New Issue
Block a user