remove muta from ILP benchmarks

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1834 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
vsc 2007-03-11 21:41:44 +00:00
parent f71e9d87c3
commit a478f7cb04

View File

@ -1122,7 +1122,7 @@ difference in this benchmark.
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP} \subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
%------------------------------------------------------------------------- %-------------------------------------------------------------------------
The need for \JITI was originally noticed in inductive logic The need for \JITI was originally noticed in inductive logic
programming applications. Table~\ref{tab:ilp:time} shows JITI programming applications. Table~\ref{tab:ilp:time} shows \JITI
performance on some learning tasks using the ALEPH performance on some learning tasks using the ALEPH
system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
small database of chess end-games; \GeneExpression learns rules for small database of chess end-games; \GeneExpression learns rules for
@ -1132,7 +1132,7 @@ patient reports towards predicting whether an abnormality may be
malignant; \IEProtein processes information extraction from paper malignant; \IEProtein processes information extraction from paper
abstracts to search proteins; \Susi learns from shopping patterns; and abstracts to search proteins; \Susi learns from shopping patterns; and
\Mesh learns rules for finite-methods mesh design. The datasets \Mesh learns rules for finite-methods mesh design. The datasets
\Carcinogenesis, \Choline, \Mutagenesis, \Pyrimidines, and \Carcinogenesis, \Choline, \Pyrimidines, and
\Thermolysin try to predict chemical properties of compounds. The \Thermolysin try to predict chemical properties of compounds. The
first three datasets store properties of interest as tables, but first three datasets store properties of interest as tables, but
\Thermolysin learns from the 3D-structure of a molecule's \Thermolysin learns from the 3D-structure of a molecule's
@ -1140,9 +1140,7 @@ conformations. Several of these datasets are standard across the
Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06} Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
developed by some of the paper's authors. Most datasets perform simple developed by some of the paper's authors. Most datasets perform simple
queries in an extensional database. The exception is \Mutagenesis queries in an extensional database.
where several predicates are defined intensionally, requiring
extensive computation.
%------------------------------------------------------------------------------ %------------------------------------------------------------------------------
\begin{table}[t] \begin{table}[t]
@ -1165,7 +1163,6 @@ extensive computation.
\bench{Krki} & 0.3 & 0.3 & 1 \\ \bench{Krki} & 0.3 & 0.3 & 1 \\
\bench{Krki II} & 1.3 & 1.3 & 1 \\ \bench{Krki II} & 1.3 & 1.3 & 1 \\
\Mesh & 4 & 3 & 1.3 \\ \Mesh & 4 & 3 & 1.3 \\
\bench{Mutagenesis} & 51,775 & 27,746 & 1.9 \\
\Pyrimidines & 487,545 & 253,235 & 1.9 \\ \Pyrimidines & 487,545 & 253,235 & 1.9 \\
\Susi & 105,091 & 307 & 342 \\ \Susi & 105,091 & 307 & 342 \\
\Thermolysin & 50,279 & 5,213 & 10 \\ \Thermolysin & 50,279 & 5,213 & 10 \\
@ -1175,12 +1172,11 @@ extensive computation.
%------------------------------------------------------------------------------ %------------------------------------------------------------------------------
We compare times for 10 runs of the saturation/refinement cycle of the We compare times for 10 runs of the saturation/refinement cycle of the
ILP system. Table~\ref{tab:ilp:time} shows time results. The ILP system. Table~\ref{tab:ilp:time} shows time results. The \Krki
\Krki datasets have small search spaces and small databases, so datasets have small search spaces and small databases, so they achieve
they achieve the same performance under both versions: the same performance under both versions: there is no slowdown. The
there is no slowdown. The \Mesh, \Mutagenesis, and \Mesh and \Pyrimidines applications do not benefit much from indexing
\Pyrimidines applications do not benefit much from indexing in in the database, but they do benefit from indexing in the dynamic
the database, but they do benefit from indexing in the dynamic
representation of the search space, as their running times halve. representation of the search space, as their running times halve.
The \BreastCancer and \GeneExpression applications use data in The \BreastCancer and \GeneExpression applications use data in
@ -1266,12 +1262,6 @@ order of magnitude.
%& 46 & 4 & 35 & 22 %& 46 & 4 & 35 & 22
\\ \\
\bench{Mutagenesis} & 1412 & 1848
%&1045 & 291 & 510
& 4302 & 595
%& 156 & 114 & 264 & 61
\\
\bench{Pyrimidines} & 774 & 218 \bench{Pyrimidines} & 774 & 218
%&76 & 63 & 77 %&76 & 63 & 77
& 25,840 & 12,291 & 25,840 & 12,291
@ -1300,20 +1290,18 @@ Because dynamic memory expands and contracts, we chose a point where
memory usage should be at a maximum. The first two numbers show data memory usage should be at a maximum. The first two numbers show data
usage on \emph{static} predicates. Static data-base sizes range from usage on \emph{static} predicates. Static data-base sizes range from
146MB (\bench{IE-Protein\_Extraction} to less than a MB 146MB (\bench{IE-Protein\_Extraction} to less than a MB
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be (\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
more than the original code, as in \bench{Mutagenesis}, or almost as to be as large as than the original code, as in \Carcinogenesis, or
much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
adds at least a third and often a half to the original data-base. A the YAP \JITI adds at least a third and often a half to the original
more detailed analysis shows the source of overhead to be very data-base. A more detailed analysis shows the source of overhead to be
different from dataset to dataset. In \bench{IE-Protein\_Extraction} very different from dataset to dataset. In
the problem is that hash tables are very large. Hash tables are also \bench{IE-Protein\_Extraction} the problem is that hash tables are
where most space is spent in \bench{Susi}. In \BreastCancer very large. Hash tables are also where most space is spent in
hash tables are actually small, so most space is spent in \bench{Susi}. In \BreastCancer hash tables are actually small, so most
\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP space is spent in \TryRetryTrust chains. Storing sets of matching
spends a large effort in indexing it still generates long clauses at \jitiSTAR nodes takes usually over 10\% of total memory
\TryRetryTrust chains. Storing sets of matching clauses at \jitiSTAR usage, but is never dominant.
nodes takes usually over 10\% of total memory usage, but is never
dominant.
This version of ALEPH uses the internal data-base to store the IDB. This version of ALEPH uses the internal data-base to store the IDB.
The size of reflects the search space, and is to some extent The size of reflects the search space, and is to some extent