remove muta from ILP benchmarks

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1834 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
2007-03-11 21:41:44 +00:00 · 2007-03-11 21:41:44 +00:00 · a478f7cb04
commit a478f7cb04
parent f71e9d87c3
1 changed files with 20 additions and 32 deletions
--- a/docs/index/iclp07.tex
+++ b/docs/index/iclp07.tex
@ -1122,7 +1122,7 @@ difference in this benchmark.
 \subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
 %-------------------------------------------------------------------------
 The need for \JITI was originally noticed in inductive logic
-programming applications.  Table~\ref{tab:ilp:time} shows JITI
+programming applications.  Table~\ref{tab:ilp:time} shows \JITI
 performance on some learning tasks using the ALEPH
 system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
 small database of chess end-games; \GeneExpression learns rules for
@ -1132,7 +1132,7 @@ patient reports towards predicting whether an abnormality may be
 malignant; \IEProtein processes information extraction from paper
 abstracts to search proteins; \Susi learns from shopping patterns; and
 \Mesh learns rules for finite-methods mesh design. The datasets
-\Carcinogenesis, \Choline, \Mutagenesis, \Pyrimidines, and
+\Carcinogenesis, \Choline, \Pyrimidines, and
 \Thermolysin try to predict chemical properties of compounds. The
 first three datasets store properties of interest as tables, but
 \Thermolysin learns from the 3D-structure of a molecule's
@ -1140,9 +1140,7 @@ conformations. Several of these datasets are standard across the
 Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
 and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
 developed by some of the paper's authors. Most datasets perform simple
-queries in an extensional database. The exception is \Mutagenesis
-where several predicates are defined intensionally, requiring
-extensive computation.
+queries in an extensional database.

 %------------------------------------------------------------------------------
 \begin{table}[t]
@ -1165,7 +1163,6 @@ extensive computation.
    \bench{Krki}                   &       0.3 &     0.3 &   1    \\
    \bench{Krki II}                &       1.3 &     1.3 &   1    \\
    \Mesh           &         4 &       3 &   1.3  \\
-    \bench{Mutagenesis}            &    51,775 &  27,746 &   1.9  \\
    \Pyrimidines    &   487,545 & 253,235 &   1.9  \\
    \Susi           &   105,091 &     307 & 342    \\
    \Thermolysin    &    50,279 &   5,213 &  10    \\
@ -1175,12 +1172,11 @@ extensive computation.
 %------------------------------------------------------------------------------

 We compare times for 10 runs of the saturation/refinement cycle of the
-ILP system. Table~\ref{tab:ilp:time} shows time results. The
-\Krki datasets have small search spaces and small databases, so
-they achieve the same performance under both versions:
-there is no slowdown. The \Mesh, \Mutagenesis, and
-\Pyrimidines applications do not benefit much from indexing in
-the database, but they do benefit from indexing in the dynamic
+ILP system. Table~\ref{tab:ilp:time} shows time results. The \Krki
+datasets have small search spaces and small databases, so they achieve
+the same performance under both versions: there is no slowdown. The
+\Mesh and \Pyrimidines applications do not benefit much from indexing
+in the database, but they do benefit from indexing in the dynamic
 representation of the search space, as their running times halve.

 The \BreastCancer and \GeneExpression applications use data in 
@ -1266,12 +1262,6 @@ order of magnitude.
    %& 46 & 4 & 35 & 22
    \\
    
-    \bench{Mutagenesis}       & 1412 & 1848
-    %&1045 & 291 & 510
-    & 4302 & 595
-    %& 156 & 114 & 264 & 61
-    \\
-    
    \bench{Pyrimidines}       & 774 & 218
    %&76 & 63 & 77
    & 25,840 & 12,291
@ -1300,20 +1290,18 @@ Because dynamic memory expands and contracts, we chose a point where
 memory usage should be at a maximum. The first two numbers show data
 usage on \emph{static} predicates. Static data-base sizes range from
 146MB (\bench{IE-Protein\_Extraction} to less than a MB
-(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can be
-more than the original code, as in \bench{Mutagenesis}, or almost as
-much, e.g., \bench{IE-Protein\_Extraction}. In most cases the YAP \JITI
-adds at least a third and often a half to the original data-base. A
-more detailed analysis shows the source of overhead to be very
-different from dataset to dataset. In \bench{IE-Protein\_Extraction}
-the problem is that hash tables are very large. Hash tables are also
-where most space is spent in \bench{Susi}. In \BreastCancer
-hash tables are actually small, so most space is spent in
-\TryRetryTrust chains. \bench{Mutagenesis} is similar: even though YAP
-spends a large effort in indexing it still generates long
-\TryRetryTrust chains. Storing sets of matching clauses at \jitiSTAR
-nodes takes usually over 10\% of total memory usage, but is never
-dominant.
+(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
+to be as large as than the original code, as in \Carcinogenesis, or
+almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
+the YAP \JITI adds at least a third and often a half to the original
+data-base. A more detailed analysis shows the source of overhead to be
+very different from dataset to dataset. In
+\bench{IE-Protein\_Extraction} the problem is that hash tables are
+very large. Hash tables are also where most space is spent in
+\bench{Susi}. In \BreastCancer hash tables are actually small, so most
+space is spent in \TryRetryTrust chains. Storing sets of matching
+clauses at \jitiSTAR nodes takes usually over 10\% of total memory
+usage, but is never dominant.

 This version of ALEPH uses the internal data-base to store the IDB.
 The size of reflects the search space, and is to some extent