Merged the two tables of 7.3

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1836 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
kostis 2007-03-11 23:19:47 +00:00
parent 075c9a5bf3
commit 9ec9b7fb70

View File

@ -49,14 +49,13 @@
\newcommand{\tea}{\bench{tea}\xspace}
%------------------------------------------------------------------------------
\newcommand{\BreastCancer}{\bench{BreastCancer}\xspace}
\newcommand{\Carcinogenesis}{\bench{Carcinogenesis}\xspace}
\newcommand{\Carcino}{\bench{Carcinogenesis}\xspace}
\newcommand{\Choline}{\bench{Choline}\xspace}
\newcommand{\GeneExpression}{\bench{GeneExpression}\xspace}
\newcommand{\GeneExpr}{\bench{GeneExpression}\xspace}
\newcommand{\IEProtein}{\bench{IE-Protein\_Extraction}\xspace}
\newcommand{\Krki}{\bench{Krki}\xspace}
\newcommand{\KrkiII}{\bench{Krki~II}\xspace}
\newcommand{\Mesh}{\bench{Mesh}\xspace}
\newcommand{\Mutagenesis}{\bench{Mutagenesis}\xspace}
\newcommand{\Pyrimidines}{\bench{Pyrimidines}\xspace}
\newcommand{\Susi}{\bench{Susi}\xspace}
\newcommand{\Thermolysin}{\bench{Thermolysin}\xspace}
@ -1013,8 +1012,8 @@ in parentheses. For each variant of transitive closure, we issue two
queries: one with mode \code{(in,out)} and one with mode
\code{(out,out)}.
%
For YAP, indices on the first argument and \TryRetryTrust are built on
all benchmarks under \JITI.
For YAP, indices on the first argument and \TryRetryTrust chains are
built on all benchmarks under \JITI.
%
For XXX, \JITI triggers on no benchmark but the \jitiONconstant
instructions are executed for the three \bench{tc\_?\_oo} benchmarks.
@ -1069,8 +1068,9 @@ columns separately.
%--------------------------------------------------------------------------
On the other hand, when \JITI is effective, it can significantly
improve time performance. We use the following programs and
applications:\TODO{If time permits, we should also add FSA benchmarks
(\bench{k963}, \bench{dg5} and \bench{tl3})}
applications:
%% \TODO{For the journal version we should also add FSA benchmarks
%% (\bench{k963}, \bench{dg5} and \bench{tl3})}
\begin{description}
\item[\sgCyl] The same generation DB benchmark on a $24 \times 24
\times 2$ cylinder. We issue the open query.
@ -1122,52 +1122,80 @@ difference in this benchmark.
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
%-------------------------------------------------------------------------
The need for \JITI was originally noticed in inductive logic
programming applications. Table~\ref{tab:ilp:time} shows \JITI
performance on some learning tasks using the ALEPH
system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
small database of chess end-games; \GeneExpression learns rules for
programming applications, which tend to issue ad hoc queries during
runtime and their indexing requirements cannot be determined at
compile time. On the other hand, these applications operate on lots of
data, so memory consumption is a reasonable concern. We evaluate
JITI's time and space performance on some learning tasks using the
ALEPH system~\cite{ALEPH}. We use the following datasets:
%
% Table~\ref{tab:ilp:time} shows JITI performance.
The dataset \Krki tries to learn rules from a
small database of chess end-games; \GeneExpr learns rules for
yeast gene activity given a database of genes, their interactions, and
micro-array gene expression data; \BreastCancer processes real-life
patient reports towards predicting whether an abnormality may be
malignant; \IEProtein processes information extraction from paper
abstracts to search proteins; \Susi learns from shopping patterns; and
\Mesh learns rules for finite-methods mesh design. The datasets
\Carcinogenesis, \Choline, \Pyrimidines, and
\Carcino, \Choline, \Pyrimidines, and
\Thermolysin try to predict chemical properties of compounds. The
first three datasets store properties of interest as tables, but
\Thermolysin learns from the 3D-structure of a molecule's
conformations. Several of these datasets are standard across the
Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
Machine Learning literature. \GeneExpr~\cite{ilp-regulatory06}
and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
developed by some of the paper's authors. Most datasets perform simple
developed by an author of this paper. Most datasets perform simple
queries in an extensional database.
%------------------------------------------------------------------------------
\begin{table}[t]
\centering
\caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
we give time for standard indexing with no indexing on dynamic
predicates versus the \JITI implementation}
\label{tab:ilp:time}
\caption{Time and space performance on Machine Learning (ILP) Datasets}
\label{tab:ilp}
\setlength{\tabcolsep}{3pt}
\begin{tabular}{|l||r|r|r|} \hline %\cline{1-3}
& \multicolumn{3}{|c|}{Time (in secs)} \\
\subfigure[Time (in seconds)]{\label{tab:ilp:time}
\begin{tabular}{|l||r|r|r||} \hline
& \multicolumn{3}{|c||}{Time (in secs)} \\
\cline{2-4}
Benchmark & 1st & JITI &{\bf ratio} \\
Benchmark & 1st & JITI &{\bf ratio} \\
\hline
\BreastCancer & 1450 & 88 & 16 \\
\Carcinogenesis & 17,705 & 192 & 92 \\
\Choline & 14,766 & 1,397 & 11 \\
\GeneExpression & 193,283 & 7,483 & 26 \\
\IEProtein & 1,677,146 & 2,909 & 577 \\
\bench{Krki} & 0.3 & 0.3 & 1 \\
\bench{Krki II} & 1.3 & 1.3 & 1 \\
\Mesh & 4 & 3 & 1.3 \\
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
\Susi & 105,091 & 307 & 342 \\
\Thermolysin & 50,279 & 5,213 & 10 \\
\BreastCancer & 1,450 & 88 & 16 \\
\Carcino & 17,705 & 192 & 92 \\
\Choline & 14,766 & 1,397 & 11 \\
\GeneExpr & 193,283 & 7,483 & 26 \\
\IEProtein & 1,677,146 & 2,909 & 577 \\
\Krki & 0.3 & 0.3 & 1 \\
\KrkiII & 1.3 & 1.3 & 1 \\
\Mesh & 4 & 3 & 1.3 \\
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
\Susi & 105,091 & 307 & 342 \\
\Thermolysin & 50,279 & 5,213 & 10 \\
\hline
\end{tabular}
\end{tabular}
}
\subfigure[Memory usage (in KB)]{\label{tab:ilp:memory}
\begin{tabular}{||r|r|r|r||} \hline
\multicolumn{2}{||c|}{Static code}
& \multicolumn{2}{|c||}{Dynamic code} \\
\hline
\multicolumn{1}{||c|}{Clauses} & \multicolumn{1}{c}{Index}
& \multicolumn{1}{|c|}{Clauses} & \multicolumn{1}{c||}{Index}\\
\hline
60,940 & 46,887 & 630 & 14 \\
1,801 & 2,678 & 13,512 & 942 \\
666 & 174 & 3,172 & 174 \\
46,726 & 22,629 & 116,463 & 9,015 \\
146,033 & 129,333 & 53,423 & 1,531 \\
678 & 117 & 2,047 & 24 \\
1,866 & 715 & 2,055 & 26 \\
802 & 161 & 2,149 & 109 \\
774 & 218 & 25,840 & 12,291 \\
5,007 & 2,509 & 4,497 & 759 \\
2,317 & 929 & 116,129 & 7,064 \\
\hline
\end{tabular}
}
\end{table}
%------------------------------------------------------------------------------
@ -1179,7 +1207,7 @@ the same performance under both versions: there is no slowdown. The
in the database, but they do benefit from indexing in the dynamic
representation of the search space, as their running times halve.
The \BreastCancer and \GeneExpression applications use data in
The \BreastCancer and \GeneExpr applications use data in
1NF (that is, unstructured data). The benefit here is mostly from
multiple-argument indexing. \BreastCancer is particularly
interesting. It consists of 40 binary relations with 65k elements
@ -1199,90 +1227,6 @@ indexing. \Thermolysin is smaller and performs some
computation per query: even so, indexing improves performance by an
order of magnitude.
\begin{table*}[ht]
\centering
\caption{Memory Performance on Machine Learning (ILP) Datasets: memory
usage is given in KB}
\label{tab:ilp:memory}
\setlength{\tabcolsep}{3pt}
\begin {tabular}{|l|r|r||r|r|} \hline %\cline{1-3}
& \multicolumn{2}{|c||}{\bf Static Code} & \multicolumn{2}{|c|}{\bf Dynamic Code} \\
Benchmark & \textbf{Clause} & {\bf Index} & \textbf{Clause} & {\bf Index} \\
% \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\
\hline
\BreastCancer
& 60,940 & 46,887
% & 46242 & 3126 & 125
& 630 & 14
% &42 & 18& 57 &6
\\
\Carcinogenesis
& 1801 & 2678
% &1225 & 587 & 865
& 13,512 & 942
%& 291 & 91 & 457 & 102
\\
\Choline & 666 & 174
% &67 & 48 & 58
& 3172 & 174
% & 76 & 4 & 48 & 45
\\
\GeneExpression
& 46,726 & 22,629
% &6780 & 6473 & 9375
& 116,463 & 9015
%& 2703 & 932 & 3910 & 1469
\\
\bench{IE-Protein\_Extraction}
& 146,033 & 129,333
%&39279 & 24322 & 65732
& 53,423 & 1531
%& 467 & 108 & 868 & 86
\\
\bench{Krki} & 678 & 117
%&52 & 24 & 40
& 2047 & 24
%& 10 & 2 & 10 & 1
\\
\bench{Krki II} & 1866 & 715
%&180 & 233 & 301
& 2055 & 26
%& 11 & 2 & 11 & 1
\\
\bench{Mesh} & 802 & 161
%&49 & 18 & 93
& 2149 & 109
%& 46 & 4 & 35 & 22
\\
\bench{Pyrimidines} & 774 & 218
%&76 & 63 & 77
& 25,840 & 12,291
%& 4847 & 43 & 3510 & 3888
\\
\bench{Susi} & 5007 & 2509
%&855 & 578 & 1076
& 4497 & 759
%& 324 & 58 & 256 & 120
\\
\bench{Thermolysin} & 2317 & 929
%&429 & 184 & 315
& 116,129 & 7064
%& 3295 & 1438 & 2160 & 170
\\
\hline
\end{tabular}
\end{table*}
Table~\ref{tab:ilp:memory} shows the memory cost paid for \JITI. The
table presents data obtained at a point near the end of execution.
@ -1291,7 +1235,7 @@ memory usage should be at a maximum. The first two numbers show data
usage on \emph{static} predicates. Static data-base sizes range from
146MB (\bench{IE-Protein\_Extraction} to less than a MB
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
to be as large as than the original code, as in \Carcinogenesis, or
to be as large as than the original code, as in \Carcino, or
almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
the YAP \JITI adds at least a third and often a half to the original
data-base. A more detailed analysis shows the source of overhead to be
@ -1306,16 +1250,15 @@ usage, but is never dominant.
This version of ALEPH uses the internal data-base to store the IDB.
The size of reflects the search space, and is to some extent
independent of the program's static data, although small applications
such as \bench{Krki} do tend to have a small search space. ALEPH's
such as \bench{Krki} tend to have a small search space. ALEPH's
author very carefully designed the system to work around overheads in
accessing the data-base, so indexing should not be as critical. The
low overheads suggest that the \JITI is working well, as confirmed in
a more detailed analysis: most space is spent on hashes tables and on
accessing the database, so indexing should not be as critical. The
low overheads suggest that \JITI is working well, as confirmed in
a more detailed analysis: most space is spent on hash tables and on
internal nodes of tree, and relatively little space is spent on
\TryRetryTrust chains.
\section{Concluding Remarks}
%===========================
\begin{itemize}