remove muta from ILP benchmarks

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1834 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
vsc 2007-03-11 21:41:44 +00:00
parent f71e9d87c3
commit a478f7cb04

View File

@ -1122,7 +1122,7 @@ difference in this benchmark.
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
%-------------------------------------------------------------------------
The need for \JITI was originally noticed in inductive logic
programming applications. Table~\ref{tab:ilp:time} shows JITI
programming applications. Table~\ref{tab:ilp:time} shows \JITI
performance on some learning tasks using the ALEPH
system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
small database of chess end-games; \GeneExpression learns rules for
@ -1132,7 +1132,7 @@ patient reports towards predicting whether an abnormality may be
malignant; \IEProtein processes information extraction from paper
abstracts to search proteins; \Susi learns from shopping patterns; and
\Mesh learns rules for finite-methods mesh design. The datasets
\Carcinogenesis, \Choline, \Mutagenesis, \Pyrimidines, and
\Carcinogenesis, \Choline, \Pyrimidines, and
\Thermolysin try to predict chemical properties of compounds. The
first three datasets store properties of interest as tables, but
\Thermolysin learns from the 3D-structure of a molecule's
@ -1140,9 +1140,7 @@ conformations. Several of these datasets are standard across the
Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
developed by some of the paper's authors. Most datasets perform simple
queries in an extensional database. The exception is \Mutagenesis
where several predicates are defined intensionally, requiring
extensive computation.
queries in an extensional database.
%------------------------------------------------------------------------------
\begin{table}[t]
@ -1165,7 +1163,6 @@ extensive computation.
\bench{Krki} & 0.3 & 0.3 & 1 \\
\bench{Krki II} & 1.3 & 1.3 & 1 \\
\Mesh & 4 & 3 & 1.3 \\
\bench{Mutagenesis} & 51,775 & 27,746 & 1.9 \\
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
\Susi & 105,091 & 307 & 342 \\
\Thermolysin & 50,279 & 5,213 & 10 \\
@ -1175,12 +1172,11 @@ extensive computation.
%------------------------------------------------------------------------------
We compare times for 10 runs of the saturation/refinement cycle of the
ILP system. Table~\ref{tab:ilp:time} shows time results. The
\Krki datasets have small search spaces and small databases, so
they achieve the same performance under both versions:
there is no slowdown. The \Mesh, \Mutagenesis, and
\Pyrimidines applications do not benefit much from indexing in
the database, but they do benefit from indexing in the dynamic
ILP system. Table~\ref{tab:ilp:time} shows time results. The \Krki
datasets have small search spaces and small databases, so they achieve
the same performance under both versions: there is no slowdown. The
\Mesh and \Pyrimidines applications do not benefit much from indexing
in the database, but they do benefit from indexing in the dynamic
representation of the search space, as their running times halve.
The \BreastCancer and \GeneExpression applications use data in
@ -1266,12 +1262,6 @@ order of magnitude.
%& 46 & 4 & 35 & 22
\\
\bench{Mutagenesis} & 1412 & 1848
%&1045 & 291 & 510
& 4302 & 595
%& 156 & 114 & 264 & 61
\\
\bench{Pyrimidines} & 774 & 218
%&76 & 63 & 77
& 25,840 & 12,291
@ -1300,20 +1290,18 @@ Because dynamic memory expands and contracts, we chose a point where
memory usage should be at a maximum. The first two numbers show data
usage on \emph{static} predicates. Static data-base sizes range from
146MB (\bench{IE-Protein\_Extraction} to less than a MB
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
more than the original code, as in \bench{Mutagenesis}, or almost as
much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
adds at least a third and often a half to the original data-base. A
more detailed analysis shows the source of overhead to be very
different from dataset to dataset. In \bench{IE-Protein\_Extraction}
the problem is that hash tables are very large. Hash tables are also
where most space is spent in \bench{Susi}. In \BreastCancer
hash tables are actually small, so most space is spent in
\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
spends a large effort in indexing it still generates long
\TryRetryTrust chains. Storing sets of matching clauses at \jitiSTAR
nodes takes usually over 10\% of total memory usage, but is never
dominant.
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
to be as large as than the original code, as in \Carcinogenesis, or
almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
the YAP \JITI adds at least a third and often a half to the original
data-base. A more detailed analysis shows the source of overhead to be
very different from dataset to dataset. In
\bench{IE-Protein\_Extraction} the problem is that hash tables are
very large. Hash tables are also where most space is spent in
\bench{Susi}. In \BreastCancer hash tables are actually small, so most
space is spent in \TryRetryTrust chains. Storing sets of matching
clauses at \jitiSTAR nodes takes usually over 10\% of total memory
usage, but is never dominant.
This version of ALEPH uses the internal data-base to store the IDB.
The size of reflects the search space, and is to some extent