Merged the two tables of 7.3

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1836 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
2007-03-11 23:19:47 +00:00 · 2007-03-11 23:19:47 +00:00 · 9ec9b7fb70
commit 9ec9b7fb70
parent 075c9a5bf3
1 changed files with 67 additions and 124 deletions
--- a/docs/index/iclp07.tex
+++ b/docs/index/iclp07.tex
@ -49,14 +49,13 @@
 \newcommand{\tea}{\bench{tea}\xspace}
 %------------------------------------------------------------------------------
 \newcommand{\BreastCancer}{\bench{BreastCancer}\xspace}
-\newcommand{\Carcinogenesis}{\bench{Carcinogenesis}\xspace}
+\newcommand{\Carcino}{\bench{Carcinogenesis}\xspace}
 \newcommand{\Choline}{\bench{Choline}\xspace}
-\newcommand{\GeneExpression}{\bench{GeneExpression}\xspace}
+\newcommand{\GeneExpr}{\bench{GeneExpression}\xspace}
 \newcommand{\IEProtein}{\bench{IE-Protein\_Extraction}\xspace}
 \newcommand{\Krki}{\bench{Krki}\xspace}
 \newcommand{\KrkiII}{\bench{Krki~II}\xspace}
 \newcommand{\Mesh}{\bench{Mesh}\xspace}
-\newcommand{\Mutagenesis}{\bench{Mutagenesis}\xspace}
 \newcommand{\Pyrimidines}{\bench{Pyrimidines}\xspace}
 \newcommand{\Susi}{\bench{Susi}\xspace}
 \newcommand{\Thermolysin}{\bench{Thermolysin}\xspace}
@ -1013,8 +1012,8 @@ in parentheses. For each variant of transitive closure, we issue two
 queries: one with mode \code{(in,out)} and one with mode
 \code{(out,out)}.
 %
-For YAP, indices on the first argument and \TryRetryTrust are built on
-all benchmarks under \JITI.
+For YAP, indices on the first argument and \TryRetryTrust chains are
+built on all benchmarks under \JITI.
 %
 For XXX, \JITI triggers on no benchmark but the \jitiONconstant
 instructions are executed for the three \bench{tc\_?\_oo} benchmarks.
@ -1069,8 +1068,9 @@ columns separately.
 %--------------------------------------------------------------------------
 On the other hand, when \JITI is effective, it can significantly
 improve time performance. We use the following programs and
-applications:\TODO{If time permits, we should also add FSA benchmarks
-(\bench{k963}, \bench{dg5} and \bench{tl3})}
+applications:
+%% \TODO{For the journal version we should also add FSA benchmarks
+%%       (\bench{k963}, \bench{dg5} and \bench{tl3})}
 \begin{description}
 \item[\sgCyl] The same generation DB benchmark on a $24 \times 24
  \times 2$ cylinder. We issue the open query.
@ -1122,52 +1122,80 @@ difference in this benchmark.
 \subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
 %-------------------------------------------------------------------------
 The need for \JITI was originally noticed in inductive logic
-programming applications.  Table~\ref{tab:ilp:time} shows \JITI
-performance on some learning tasks using the ALEPH
-system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
-small database of chess end-games; \GeneExpression learns rules for
+programming applications, which tend to issue ad hoc queries during
+runtime and their indexing requirements cannot be determined at
+compile time. On the other hand, these applications operate on lots of
+data, so memory consumption is a reasonable concern. We evaluate
+JITI's time and space performance on some learning tasks using the
+ALEPH system~\cite{ALEPH}. We use the following datasets:
+%
+% Table~\ref{tab:ilp:time} shows JITI performance.
+The dataset \Krki tries to learn rules from a
+small database of chess end-games; \GeneExpr learns rules for
 yeast gene activity given a database of genes, their interactions, and
 micro-array gene expression data; \BreastCancer processes real-life
 patient reports towards predicting whether an abnormality may be
 malignant; \IEProtein processes information extraction from paper
 abstracts to search proteins; \Susi learns from shopping patterns; and
 \Mesh learns rules for finite-methods mesh design. The datasets
-\Carcinogenesis, \Choline, \Pyrimidines, and
+\Carcino, \Choline, \Pyrimidines, and
 \Thermolysin try to predict chemical properties of compounds. The
 first three datasets store properties of interest as tables, but
 \Thermolysin learns from the 3D-structure of a molecule's
 conformations. Several of these datasets are standard across the
-Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
+Machine Learning literature. \GeneExpr~\cite{ilp-regulatory06}
 and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
-developed by some of the paper's authors. Most datasets perform simple
+developed by an author of this paper. Most datasets perform simple
 queries in an extensional database.

 %------------------------------------------------------------------------------
 \begin{table}[t]
  \centering
-  \caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
-    we give time for standard indexing with no indexing on dynamic
-    predicates versus the \JITI implementation}
-  \label{tab:ilp:time}
+  \caption{Time and space performance on Machine Learning (ILP) Datasets}
+  \label{tab:ilp}
  \setlength{\tabcolsep}{3pt}
-  \begin{tabular}{|l||r|r|r|} \hline %\cline{1-3}
-                    & \multicolumn{3}{|c|}{Time (in secs)} \\
+  \subfigure[Time (in seconds)]{\label{tab:ilp:time}
+    \begin{tabular}{|l||r|r|r||} \hline
+                  & \multicolumn{3}{|c||}{Time (in secs)} \\
    \cline{2-4}
-    Benchmark       &    1st    &   JITI  &{\bf ratio} \\
+    Benchmark     &    1st    &   JITI  &{\bf ratio} \\
    \hline
-    \BreastCancer   &      1450 &      88 &  16    \\
-    \Carcinogenesis &    17,705 &     192 &  92    \\
-    \Choline        &    14,766 &   1,397 &  11    \\
-    \GeneExpression &   193,283 &   7,483 &  26    \\
-    \IEProtein      & 1,677,146 &   2,909 & 577    \\
-    \bench{Krki}                   &       0.3 &     0.3 &   1    \\
-    \bench{Krki II}                &       1.3 &     1.3 &   1    \\
-    \Mesh           &         4 &       3 &   1.3  \\
-    \Pyrimidines    &   487,545 & 253,235 &   1.9  \\
-    \Susi           &   105,091 &     307 & 342    \\
-    \Thermolysin    &    50,279 &   5,213 &  10    \\
+    \BreastCancer &     1,450 &      88 &  16 \\
+    \Carcino      &    17,705 &     192 &  92 \\
+    \Choline      &    14,766 &   1,397 &  11 \\
+    \GeneExpr     &   193,283 &   7,483 &  26 \\
+    \IEProtein    & 1,677,146 &   2,909 & 577 \\
+    \Krki         &       0.3 &     0.3 &   1 \\
+    \KrkiII       &       1.3 &     1.3 &   1 \\
+    \Mesh         &         4 &       3 & 1.3 \\
+    \Pyrimidines  &   487,545 & 253,235 & 1.9 \\
+    \Susi         &   105,091 &     307 & 342 \\
+    \Thermolysin  &    50,279 &   5,213 &  10 \\
    \hline
-\end{tabular}
+    \end{tabular}
+  }
+  \subfigure[Memory usage (in KB)]{\label{tab:ilp:memory}
+    \begin{tabular}{||r|r|r|r||} \hline
+                \multicolumn{2}{||c|}{Static code}
+              & \multicolumn{2}{|c||}{Dynamic code} \\
+    \hline
+                \multicolumn{1}{||c|}{Clauses} & \multicolumn{1}{c}{Index}
+              & \multicolumn{1}{|c|}{Clauses} & \multicolumn{1}{c||}{Index}\\
+    \hline
+	        60,940 &  46,887 &     630 &     14 \\
+	         1,801 &   2,678 &  13,512 &    942 \\
+	           666 &     174 &   3,172 &    174 \\
+	        46,726 &  22,629 & 116,463 &  9,015 \\
+	       146,033 & 129,333 &  53,423 &  1,531 \\
+	           678 &     117 &   2,047 &     24 \\
+	         1,866 &     715 &   2,055 &     26 \\
+	           802 &     161 &   2,149 &    109 \\
+	           774 &     218 &  25,840 & 12,291 \\
+ 	         5,007 &   2,509 &   4,497 &    759 \\
+	         2,317 &     929 & 116,129 &  7,064 \\
+    \hline
+    \end{tabular}
+  }
 \end{table}
 %------------------------------------------------------------------------------

@ -1179,7 +1207,7 @@ the same performance under both versions: there is no slowdown. The
 in the database, but they do benefit from indexing in the dynamic
 representation of the search space, as their running times halve.

-The \BreastCancer and \GeneExpression applications use data in 
+The \BreastCancer and \GeneExpr applications use data in 
 1NF (that is, unstructured data). The benefit here is mostly from
 multiple-argument indexing. \BreastCancer is particularly
 interesting. It consists of 40 binary relations with 65k elements
@ -1199,90 +1227,6 @@ indexing. \Thermolysin is smaller and performs some
 computation per query: even so, indexing improves performance by an
 order of magnitude.

-\begin{table*}[ht]
-  \centering
-  \caption{Memory Performance on Machine Learning (ILP) Datasets: memory
-    usage is given in KB}
-  \label{tab:ilp:memory}
-  \setlength{\tabcolsep}{3pt}
-  \begin {tabular}{|l|r|r||r|r|} \hline %\cline{1-3}
-    &  \multicolumn{2}{|c||}{\bf Static Code}  & \multicolumn{2}{|c|}{\bf Dynamic Code} \\
-    Benchmark   &  \textbf{Clause} & {\bf Index}  & \textbf{Clause} & {\bf Index} \\
-%    \textbf{Benchmarks} &   & Total & T & W & S &  & Total & T & C & W & S  \\
-    \hline
-    \BreastCancer
-    & 60,940 & 46,887 
-    % & 46242 & 3126  & 125
-    & 630  & 14
-    % &42 & 18& 57 &6
-    \\
-
-    \Carcinogenesis
-    & 1801 & 2678
-    % &1225 & 587 & 865
-    & 13,512 & 942
-    %& 291 & 91 & 457 & 102
-    \\
-
-    \Choline  & 666 & 174
-    % &67 & 48 & 58
-    & 3172 & 174
-    % & 76 & 4 & 48 & 45
-    \\
-
-    \GeneExpression
-    &  46,726 & 22,629
-    % &6780 & 6473 & 9375
-    & 116,463 & 9015
-    %& 2703 & 932 & 3910 & 1469
-    \\
-
-    \bench{IE-Protein\_Extraction}
-    & 146,033 & 129,333
-    %&39279 & 24322 & 65732
-    & 53,423 & 1531
-    %& 467 & 108 & 868 & 86
-    \\
-
-    \bench{Krki}              & 678 & 117
-    %&52 & 24 & 40
-    & 2047 & 24
-    %& 10 & 2 & 10 & 1
-    \\
-
-    \bench{Krki II}           & 1866 & 715
-    %&180 & 233    & 301
-    & 2055 & 26
-    %& 11 & 2 & 11 & 1
-    \\
-
-    \bench{Mesh}              & 802 & 161
-    %&49 & 18 & 93
-    & 2149 & 109
-    %& 46 & 4 & 35 & 22
-    \\
-    
-    \bench{Pyrimidines}       & 774 & 218
-    %&76 & 63 & 77
-    & 25,840 & 12,291
-    %& 4847 & 43 & 3510 & 3888
-    \\
-
-    \bench{Susi}              & 5007 & 2509
-    %&855 & 578 & 1076
-    & 4497 & 759
-    %& 324 & 58 & 256 & 120
-    \\
-
-    \bench{Thermolysin}       & 2317 & 929
-    %&429 & 184 & 315
-    & 116,129 & 7064
-    %& 3295 & 1438 & 2160 & 170
-    \\
-    \hline
-\end{tabular}
-\end{table*}
-

 Table~\ref{tab:ilp:memory} shows the memory cost paid for \JITI. The
 table presents data obtained at a point near the end of execution.
@ -1291,7 +1235,7 @@ memory usage should be at a maximum. The first two numbers show data
 usage on \emph{static} predicates. Static data-base sizes range from
 146MB (\bench{IE-Protein\_Extraction} to less than a MB
 (\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
-to be as large as than the original code, as in \Carcinogenesis, or
+to be as large as than the original code, as in \Carcino, or
 almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
 the YAP \JITI adds at least a third and often a half to the original
 data-base. A more detailed analysis shows the source of overhead to be
@ -1306,16 +1250,15 @@ usage, but is never dominant.
 This version of ALEPH uses the internal data-base to store the IDB.
 The size of reflects the search space, and is to some extent
 independent of the program's static data, although small applications
-such as \bench{Krki} do tend to have a small search space. ALEPH's
+such as \bench{Krki} tend to have a small search space. ALEPH's
 author very carefully designed the system to work around overheads in
-accessing the data-base, so indexing should not be as critical. The
-low overheads suggest that the \JITI is working well, as confirmed in
-a more detailed analysis: most space is spent on hashes tables and on
+accessing the database, so indexing should not be as critical. The
+low overheads suggest that \JITI is working well, as confirmed in
+a more detailed analysis: most space is spent on hash tables and on
 internal nodes of tree, and relatively little space is spent on
 \TryRetryTrust chains.


-
 \section{Concluding Remarks}
 %===========================
 \begin{itemize}