Merged the two tables of 7.3
git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1836 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
parent
075c9a5bf3
commit
9ec9b7fb70
@ -49,14 +49,13 @@
|
||||
\newcommand{\tea}{\bench{tea}\xspace}
|
||||
%------------------------------------------------------------------------------
|
||||
\newcommand{\BreastCancer}{\bench{BreastCancer}\xspace}
|
||||
\newcommand{\Carcinogenesis}{\bench{Carcinogenesis}\xspace}
|
||||
\newcommand{\Carcino}{\bench{Carcinogenesis}\xspace}
|
||||
\newcommand{\Choline}{\bench{Choline}\xspace}
|
||||
\newcommand{\GeneExpression}{\bench{GeneExpression}\xspace}
|
||||
\newcommand{\GeneExpr}{\bench{GeneExpression}\xspace}
|
||||
\newcommand{\IEProtein}{\bench{IE-Protein\_Extraction}\xspace}
|
||||
\newcommand{\Krki}{\bench{Krki}\xspace}
|
||||
\newcommand{\KrkiII}{\bench{Krki~II}\xspace}
|
||||
\newcommand{\Mesh}{\bench{Mesh}\xspace}
|
||||
\newcommand{\Mutagenesis}{\bench{Mutagenesis}\xspace}
|
||||
\newcommand{\Pyrimidines}{\bench{Pyrimidines}\xspace}
|
||||
\newcommand{\Susi}{\bench{Susi}\xspace}
|
||||
\newcommand{\Thermolysin}{\bench{Thermolysin}\xspace}
|
||||
@ -1013,8 +1012,8 @@ in parentheses. For each variant of transitive closure, we issue two
|
||||
queries: one with mode \code{(in,out)} and one with mode
|
||||
\code{(out,out)}.
|
||||
%
|
||||
For YAP, indices on the first argument and \TryRetryTrust are built on
|
||||
all benchmarks under \JITI.
|
||||
For YAP, indices on the first argument and \TryRetryTrust chains are
|
||||
built on all benchmarks under \JITI.
|
||||
%
|
||||
For XXX, \JITI triggers on no benchmark but the \jitiONconstant
|
||||
instructions are executed for the three \bench{tc\_?\_oo} benchmarks.
|
||||
@ -1069,8 +1068,9 @@ columns separately.
|
||||
%--------------------------------------------------------------------------
|
||||
On the other hand, when \JITI is effective, it can significantly
|
||||
improve time performance. We use the following programs and
|
||||
applications:\TODO{If time permits, we should also add FSA benchmarks
|
||||
(\bench{k963}, \bench{dg5} and \bench{tl3})}
|
||||
applications:
|
||||
%% \TODO{For the journal version we should also add FSA benchmarks
|
||||
%% (\bench{k963}, \bench{dg5} and \bench{tl3})}
|
||||
\begin{description}
|
||||
\item[\sgCyl] The same generation DB benchmark on a $24 \times 24
|
||||
\times 2$ cylinder. We issue the open query.
|
||||
@ -1122,52 +1122,80 @@ difference in this benchmark.
|
||||
\subsection{Performance of \JITI on ILP applications} \label{sec:perf:ILP}
|
||||
%-------------------------------------------------------------------------
|
||||
The need for \JITI was originally noticed in inductive logic
|
||||
programming applications. Table~\ref{tab:ilp:time} shows \JITI
|
||||
performance on some learning tasks using the ALEPH
|
||||
system~\cite{ALEPH}. The dataset \Krki tries to learn rules from a
|
||||
small database of chess end-games; \GeneExpression learns rules for
|
||||
programming applications, which tend to issue ad hoc queries during
|
||||
runtime and their indexing requirements cannot be determined at
|
||||
compile time. On the other hand, these applications operate on lots of
|
||||
data, so memory consumption is a reasonable concern. We evaluate
|
||||
JITI's time and space performance on some learning tasks using the
|
||||
ALEPH system~\cite{ALEPH}. We use the following datasets:
|
||||
%
|
||||
% Table~\ref{tab:ilp:time} shows JITI performance.
|
||||
The dataset \Krki tries to learn rules from a
|
||||
small database of chess end-games; \GeneExpr learns rules for
|
||||
yeast gene activity given a database of genes, their interactions, and
|
||||
micro-array gene expression data; \BreastCancer processes real-life
|
||||
patient reports towards predicting whether an abnormality may be
|
||||
malignant; \IEProtein processes information extraction from paper
|
||||
abstracts to search proteins; \Susi learns from shopping patterns; and
|
||||
\Mesh learns rules for finite-methods mesh design. The datasets
|
||||
\Carcinogenesis, \Choline, \Pyrimidines, and
|
||||
\Carcino, \Choline, \Pyrimidines, and
|
||||
\Thermolysin try to predict chemical properties of compounds. The
|
||||
first three datasets store properties of interest as tables, but
|
||||
\Thermolysin learns from the 3D-structure of a molecule's
|
||||
conformations. Several of these datasets are standard across the
|
||||
Machine Learning literature. \GeneExpression~\cite{ilp-regulatory06}
|
||||
Machine Learning literature. \GeneExpr~\cite{ilp-regulatory06}
|
||||
and \BreastCancer~\cite{DBLP:conf/ijcai/DavisBDPRCS05} were partly
|
||||
developed by some of the paper's authors. Most datasets perform simple
|
||||
developed by an author of this paper. Most datasets perform simple
|
||||
queries in an extensional database.
|
||||
|
||||
%------------------------------------------------------------------------------
|
||||
\begin{table}[t]
|
||||
\centering
|
||||
\caption{Machine Learning (ILP) Datasets: Times are given in Seconds,
|
||||
we give time for standard indexing with no indexing on dynamic
|
||||
predicates versus the \JITI implementation}
|
||||
\label{tab:ilp:time}
|
||||
\caption{Time and space performance on Machine Learning (ILP) Datasets}
|
||||
\label{tab:ilp}
|
||||
\setlength{\tabcolsep}{3pt}
|
||||
\begin{tabular}{|l||r|r|r|} \hline %\cline{1-3}
|
||||
& \multicolumn{3}{|c|}{Time (in secs)} \\
|
||||
\subfigure[Time (in seconds)]{\label{tab:ilp:time}
|
||||
\begin{tabular}{|l||r|r|r||} \hline
|
||||
& \multicolumn{3}{|c||}{Time (in secs)} \\
|
||||
\cline{2-4}
|
||||
Benchmark & 1st & JITI &{\bf ratio} \\
|
||||
Benchmark & 1st & JITI &{\bf ratio} \\
|
||||
\hline
|
||||
\BreastCancer & 1450 & 88 & 16 \\
|
||||
\Carcinogenesis & 17,705 & 192 & 92 \\
|
||||
\Choline & 14,766 & 1,397 & 11 \\
|
||||
\GeneExpression & 193,283 & 7,483 & 26 \\
|
||||
\IEProtein & 1,677,146 & 2,909 & 577 \\
|
||||
\bench{Krki} & 0.3 & 0.3 & 1 \\
|
||||
\bench{Krki II} & 1.3 & 1.3 & 1 \\
|
||||
\Mesh & 4 & 3 & 1.3 \\
|
||||
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
|
||||
\Susi & 105,091 & 307 & 342 \\
|
||||
\Thermolysin & 50,279 & 5,213 & 10 \\
|
||||
\BreastCancer & 1,450 & 88 & 16 \\
|
||||
\Carcino & 17,705 & 192 & 92 \\
|
||||
\Choline & 14,766 & 1,397 & 11 \\
|
||||
\GeneExpr & 193,283 & 7,483 & 26 \\
|
||||
\IEProtein & 1,677,146 & 2,909 & 577 \\
|
||||
\Krki & 0.3 & 0.3 & 1 \\
|
||||
\KrkiII & 1.3 & 1.3 & 1 \\
|
||||
\Mesh & 4 & 3 & 1.3 \\
|
||||
\Pyrimidines & 487,545 & 253,235 & 1.9 \\
|
||||
\Susi & 105,091 & 307 & 342 \\
|
||||
\Thermolysin & 50,279 & 5,213 & 10 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{tabular}
|
||||
}
|
||||
\subfigure[Memory usage (in KB)]{\label{tab:ilp:memory}
|
||||
\begin{tabular}{||r|r|r|r||} \hline
|
||||
\multicolumn{2}{||c|}{Static code}
|
||||
& \multicolumn{2}{|c||}{Dynamic code} \\
|
||||
\hline
|
||||
\multicolumn{1}{||c|}{Clauses} & \multicolumn{1}{c}{Index}
|
||||
& \multicolumn{1}{|c|}{Clauses} & \multicolumn{1}{c||}{Index}\\
|
||||
\hline
|
||||
60,940 & 46,887 & 630 & 14 \\
|
||||
1,801 & 2,678 & 13,512 & 942 \\
|
||||
666 & 174 & 3,172 & 174 \\
|
||||
46,726 & 22,629 & 116,463 & 9,015 \\
|
||||
146,033 & 129,333 & 53,423 & 1,531 \\
|
||||
678 & 117 & 2,047 & 24 \\
|
||||
1,866 & 715 & 2,055 & 26 \\
|
||||
802 & 161 & 2,149 & 109 \\
|
||||
774 & 218 & 25,840 & 12,291 \\
|
||||
5,007 & 2,509 & 4,497 & 759 \\
|
||||
2,317 & 929 & 116,129 & 7,064 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
}
|
||||
\end{table}
|
||||
%------------------------------------------------------------------------------
|
||||
|
||||
@ -1179,7 +1207,7 @@ the same performance under both versions: there is no slowdown. The
|
||||
in the database, but they do benefit from indexing in the dynamic
|
||||
representation of the search space, as their running times halve.
|
||||
|
||||
The \BreastCancer and \GeneExpression applications use data in
|
||||
The \BreastCancer and \GeneExpr applications use data in
|
||||
1NF (that is, unstructured data). The benefit here is mostly from
|
||||
multiple-argument indexing. \BreastCancer is particularly
|
||||
interesting. It consists of 40 binary relations with 65k elements
|
||||
@ -1199,90 +1227,6 @@ indexing. \Thermolysin is smaller and performs some
|
||||
computation per query: even so, indexing improves performance by an
|
||||
order of magnitude.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Memory Performance on Machine Learning (ILP) Datasets: memory
|
||||
usage is given in KB}
|
||||
\label{tab:ilp:memory}
|
||||
\setlength{\tabcolsep}{3pt}
|
||||
\begin {tabular}{|l|r|r||r|r|} \hline %\cline{1-3}
|
||||
& \multicolumn{2}{|c||}{\bf Static Code} & \multicolumn{2}{|c|}{\bf Dynamic Code} \\
|
||||
Benchmark & \textbf{Clause} & {\bf Index} & \textbf{Clause} & {\bf Index} \\
|
||||
% \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\
|
||||
\hline
|
||||
\BreastCancer
|
||||
& 60,940 & 46,887
|
||||
% & 46242 & 3126 & 125
|
||||
& 630 & 14
|
||||
% &42 & 18& 57 &6
|
||||
\\
|
||||
|
||||
\Carcinogenesis
|
||||
& 1801 & 2678
|
||||
% &1225 & 587 & 865
|
||||
& 13,512 & 942
|
||||
%& 291 & 91 & 457 & 102
|
||||
\\
|
||||
|
||||
\Choline & 666 & 174
|
||||
% &67 & 48 & 58
|
||||
& 3172 & 174
|
||||
% & 76 & 4 & 48 & 45
|
||||
\\
|
||||
|
||||
\GeneExpression
|
||||
& 46,726 & 22,629
|
||||
% &6780 & 6473 & 9375
|
||||
& 116,463 & 9015
|
||||
%& 2703 & 932 & 3910 & 1469
|
||||
\\
|
||||
|
||||
\bench{IE-Protein\_Extraction}
|
||||
& 146,033 & 129,333
|
||||
%&39279 & 24322 & 65732
|
||||
& 53,423 & 1531
|
||||
%& 467 & 108 & 868 & 86
|
||||
\\
|
||||
|
||||
\bench{Krki} & 678 & 117
|
||||
%&52 & 24 & 40
|
||||
& 2047 & 24
|
||||
%& 10 & 2 & 10 & 1
|
||||
\\
|
||||
|
||||
\bench{Krki II} & 1866 & 715
|
||||
%&180 & 233 & 301
|
||||
& 2055 & 26
|
||||
%& 11 & 2 & 11 & 1
|
||||
\\
|
||||
|
||||
\bench{Mesh} & 802 & 161
|
||||
%&49 & 18 & 93
|
||||
& 2149 & 109
|
||||
%& 46 & 4 & 35 & 22
|
||||
\\
|
||||
|
||||
\bench{Pyrimidines} & 774 & 218
|
||||
%&76 & 63 & 77
|
||||
& 25,840 & 12,291
|
||||
%& 4847 & 43 & 3510 & 3888
|
||||
\\
|
||||
|
||||
\bench{Susi} & 5007 & 2509
|
||||
%&855 & 578 & 1076
|
||||
& 4497 & 759
|
||||
%& 324 & 58 & 256 & 120
|
||||
\\
|
||||
|
||||
\bench{Thermolysin} & 2317 & 929
|
||||
%&429 & 184 & 315
|
||||
& 116,129 & 7064
|
||||
%& 3295 & 1438 & 2160 & 170
|
||||
\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Table~\ref{tab:ilp:memory} shows the memory cost paid for \JITI. The
|
||||
table presents data obtained at a point near the end of execution.
|
||||
@ -1291,7 +1235,7 @@ memory usage should be at a maximum. The first two numbers show data
|
||||
usage on \emph{static} predicates. Static data-base sizes range from
|
||||
146MB (\bench{IE-Protein\_Extraction} to less than a MB
|
||||
(\bench{Choline}, \bench{Krki}, \bench{Mesh}). Indexing code can grow
|
||||
to be as large as than the original code, as in \Carcinogenesis, or
|
||||
to be as large as than the original code, as in \Carcino, or
|
||||
almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases
|
||||
the YAP \JITI adds at least a third and often a half to the original
|
||||
data-base. A more detailed analysis shows the source of overhead to be
|
||||
@ -1306,16 +1250,15 @@ usage, but is never dominant.
|
||||
This version of ALEPH uses the internal data-base to store the IDB.
|
||||
The size of reflects the search space, and is to some extent
|
||||
independent of the program's static data, although small applications
|
||||
such as \bench{Krki} do tend to have a small search space. ALEPH's
|
||||
such as \bench{Krki} tend to have a small search space. ALEPH's
|
||||
author very carefully designed the system to work around overheads in
|
||||
accessing the data-base, so indexing should not be as critical. The
|
||||
low overheads suggest that the \JITI is working well, as confirmed in
|
||||
a more detailed analysis: most space is spent on hashes tables and on
|
||||
accessing the database, so indexing should not be as critical. The
|
||||
low overheads suggest that \JITI is working well, as confirmed in
|
||||
a more detailed analysis: most space is spent on hash tables and on
|
||||
internal nodes of tree, and relatively little space is spent on
|
||||
\TryRetryTrust chains.
|
||||
|
||||
|
||||
|
||||
\section{Concluding Remarks}
|
||||
%===========================
|
||||
\begin{itemize}
|
||||
|
Reference in New Issue
Block a user