remove muta from ILP benchmarks

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1834 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
2007-03-11 21:41:44 +00:00 · 2007-03-11 21:41:44 +00:00 · a478f7cb04
commit a478f7cb04
parent f71e9d87c3
1 changed files with 20 additions and 32 deletions
--- a/docs/index/iclp07.tex
+++ b/docs/index/iclp07.tex
@ -1122,7 +1122,7 @@ difference in this benchmark.
 \subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
 %-------------------------------------------------------------------------
 The need for \JITI was originally noticed in inductive logic
-programming applications.  Table~\ref{tab:ilp:time} shows JITI
+programming applications.  Table~\ref{tab:ilp:time} shows \JITI
 performance on some learning tasks using the ALEPH
 system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
 small database of chess end-games; \GeneExpression learns rules for
@ -1132,7 +1132,7 @@ patient reports towards predicting whether an abnormality may be
 malignant; \IEProtein processes information extraction from paper
 abstracts to search proteins; \Susi learns from shopping patterns; and
 \Mesh learns rules for finite-methods mesh design. The datasets
-\Carcinogenesis, \Choline, \Mutagenesis, \Pyrimidines, and
+\Carcinogenesis, \Choline, \Pyrimidines, and
 \Thermolysin try to predict chemical properties of compounds. The
 first three datasets store properties of interest as tables, but
 \Thermolysin learns from the 3D-structure of a molecule's
@ -1140,9 +1140,7 @@ conformations. Several of these datasets are standard across the
 Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
 and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
 developed by some of the paper's authors. Most datasets perform simple
-queries in an extensional database. The exception is \Mutagenesis
+queries in an extensional database.
 where several predicates are defined intensionally, requiring
 extensive computation.
 %------------------------------------------------------------------------------
 \begin{table}[t]
@ -1165,7 +1163,6 @@ extensive computation.
    \bench{Krki}                   &       0.3 &     0.3 &   1    \\
    \bench{Krki II}                &       1.3 &     1.3 &   1    \\
    \Mesh           &         4 &       3 &   1.3  \\
    \bench{Mutagenesis}            &    51,775 &  27,746 &   1.9  \\
    \Pyrimidines    &   487,545 & 253,235 &   1.9  \\
    \Susi           &   105,091 &     307 & 342    \\
    \Thermolysin    &    50,279 &   5,213 &  10    \\
@ -1175,12 +1172,11 @@ extensive computation.
 %------------------------------------------------------------------------------
 We compare times for 10 runs of the saturation/refinement cycle of the
-ILP system. Table~\ref{tab:ilp:time} shows time results. The
+ILP system. Table~\ref{tab:ilp:time} shows time results. The \Krki
-\Krki datasets have small search spaces and small databases, so
+datasets have small search spaces and small databases, so they achieve
-they achieve the same performance under both versions:
+the same performance under both versions: there is no slowdown. The
-there is no slowdown. The \Mesh, \Mutagenesis, and
+\Mesh and \Pyrimidines applications do not benefit much from indexing
-\Pyrimidines applications do not benefit much from indexing in
+in the database, but they do benefit from indexing in the dynamic
 the database, but they do benefit from indexing in the dynamic
 representation of the search space, as their running times halve.
 The \BreastCancer and \GeneExpression applications use data in 
@ -1266,12 +1262,6 @@ order of magnitude.
    %& 46 & 4 & 35 & 22
    \\
    \bench{Mutagenesis}       & 1412 & 1848
    %&1045 & 291 & 510
    & 4302 & 595
    %& 156 & 114 & 264 & 61
    \\
    \bench{Pyrimidines}       & 774 & 218
    %&76 & 63 & 77
    & 25,840 & 12,291
@ -1300,20 +1290,18 @@ Because dynamic memory expands and contracts, we chose a point where
 memory usage should be at a maximum. The first two numbers show data
 usage on \emph{static} predicates. Static data-base sizes range from
 146MB (\bench{IE-Protein\_Extraction} to less than a MB
-(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
+(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
-more than the original code, as in \bench{Mutagenesis}, or almost as
+to be as large as than the original code, as in \Carcinogenesis, or
-much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
+almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
-adds at least a third and often a half to the original data-base. A
+the YAP \JITI adds at least a third and often a half to the original
-more detailed analysis shows the source of overhead to be very
+data-base. A more detailed analysis shows the source of overhead to be
-different from dataset to dataset. In \bench{IE-Protein\_Extraction}
+very different from dataset to dataset. In
-the problem is that hash tables are very large. Hash tables are also
+\bench{IE-Protein\_Extraction} the problem is that hash tables are
-where most space is spent in \bench{Susi}. In \BreastCancer
+very large. Hash tables are also where most space is spent in
-hash tables are actually small, so most space is spent in
+\bench{Susi}. In \BreastCancer hash tables are actually small, so most
-\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
+space is spent in \TryRetryTrust chains. Storing sets of matching
-spends a large effort in indexing it still generates long
+clauses at \jitiSTAR nodes takes usually over 10\% of total memory
-\TryRetryTrust chains. Storing sets of matching clauses at \jitiSTAR
+usage, but is never dominant.
 nodes takes usually over 10\% of total memory usage, but is never
 dominant.
 This version of ALEPH uses the internal data-base to store the IDB.
 The size of reflects the search space, and is to some extent