From 63a4ae736d90a4ba256a6399991af2bbc1cf733e Mon Sep 17 00:00:00 2001 From: vsc Date: Sat, 10 Mar 2007 19:05:26 +0000 Subject: [PATCH] *** empty log message *** git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1824 b08c6af1-5177-4d33-ba66-4b1c6b8b522a --- docs/index/iclp07.tex | 419 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 416 insertions(+), 3 deletions(-) diff --git a/docs/index/iclp07.tex b/docs/index/iclp07.tex index 91157caef..25f5496e8 100644 --- a/docs/index/iclp07.tex +++ b/docs/index/iclp07.tex @@ -1,7 +1,7 @@ %============================================================================== \documentclass{llncs} %------------------------------------------------------------------------------ -\usepackage{a4wide} +%\usepackage{a4wide} \usepackage{float} \usepackage{xspace} \usepackage{epsfig} @@ -921,9 +921,10 @@ applications. For the benchmarks of Sect.~\ref{sec:perf:overhead} and~\ref{sec:perf:speedups}, which involve both systems, we used a 2.4 GHz P4-based laptop with 512~MB of memory running Linux and report times in milliseconds. For the benchmarks of Sect.~\ref{sec:perf:ILP}, -which involve YAP only, we used a +which involve YAP only, we used a 8-node cluster, where each node is a +dual-core AMD 2600+ machine with 2GB of memory % -VITOR PLEASE ADD +%VITOR PLEASE ADD % and report times in seconds. @@ -1147,6 +1148,418 @@ static code, suggest a factor of two from indexing on the IDB in this case. +% Our experience with the indexing algorithm described here shows a +% significant performance improvement over the previous indexing code in +% our system. Quite often, this has allowed us to tackle applications +% which previously would not have been feasible. We next present some +% results that show how useful the algorithms can be. + +Next, we present performance results for demand-driven indexing on a +number of benchmarks and real-life applications. Throughout, we +compare performance with single argument indexing. We use YAP-5.1.2 +and XXX in our comparisons. + +As a base reference, our first dataset is a set of well known small +tabling benchmarks from the XSB Prolog benchmark collection. We chose +these datasets first because they are relatively small and easy to +understand. The benchmarks are: \texttt{cylinder}, computes which +nodes in a cylinder are ..., the well-known \texttt{fibonacci} +function, \texttt{first} that computes the first $k$ terminal symbols +in a grammar, a version of the \texttt{knap-sack} problem, and path +reachability benchmarks in two \texttt{cycle} graphs: a \texttt{chain} +graph, and a \texttt{tree} graph. The \texttt{path} benchmarks use a +right-recursive with base clause first (\texttt{LRB)} definition of +\texttt{path/3}. The YAP results were obtained on an AMD-64 4600+ +machine running Ubuntu 6.10. + +\begin{table}[ht] +%\vspace{-\intextsep} +%\begin{table}[htbp] +%\centering + \centering + \begin {tabular}{|l|r|r||r|r|} \hline %\cline{1-3} + & \multicolumn{2}{|c|}{\bf YAP} & \multicolumn{2}{||c|}{\bf XXX} \\ + {\bf Benchs.} & JITI & \bf WAM & \bf JITI & \bf WAM \\ + \hline + \texttt{cyl} & 3 & 48 & &\\ + \texttt{9queens} & 67 & 74 & &\\ + \texttt{cubes} & 24 & 24 & &\\ + \texttt{fib\_atoms} & 8 & 8 & &\\ + \texttt{fib\_list} 13 & 12 & & & \\ + \texttt{first} & 5 & 6 & & \\ + \texttt{ks2} & 49 & 44 & & \\ + \hline + \texttt{cycle} & 26 & 28 & & \\ + \texttt{tree} & 25 & 31 & & \\ + \hline +\end{tabular} +\caption{Tabling Benchmarks: Time is measured in msecs in all cases.} +\label{tab:aleph} +\end{table} + +Notice that these are very fast benchmarks: we ran the results 10 +times and present the average. We then used a standard unpaired t-test +to verify whether the results are significantly different. Our results +do not show significant variations between JITI and WAM indexing on +\texttt{fibonacci}, \texttt{first} and \texttt{ks2} benchmarks. Both +\texttt{fibonaccis} are small core recursive programs, most effort is +spent in constructing lists or manipulating atoms. The \texttt{first} +and \texttt{ks2} manipulate small amounts of data that is well indexed +through the first argument. + +The JITI brings a significant benefit in the \texttt{cyl} dataset.Most +work in the dataset consists of calling \texttt{cyl/2} facts. +Inspecting the program shows three different call modes for +\texttt{cyl/2}: both arguments are unbound; the first argument is +bound; or the \emph{only the second argument is bound}. The JITI +improves performance in the latter case only, but this does make a +large difference, as the WAM code has to visit all thousand clauses if +the second argument is unbound. + +The graph reachability datasets because they both use the same +program, but on different databases. The t-test does not show a +significant difference + +} the database +itself. The JITI brings little benefits on the linear graphs if we +call the \texttt{path/3} predicates with left or right recursion. On +the other hand, it always improves performance when using the doubly +recursive version, and it always improves performance on the tree +graph. + +To understand why, we first consider the simplest execution pattern, +given by the left-recursive procedure. The code for the LRF is: + +\begin{verbatim} +path1(X,Y,[X,Y]) :- arc(X,Y). +path1(X,Y,[X|P]) :- arc(X,Z), + path1(Z,Y,P), + not_member(X,P). +\end{verbatim} +\noindent +Careful inspection of the program shows that \texttt{arc/3} can be +accessed with different modes. First, given the top-level goal +$path1(X,Y,\_)$ the two clauses for \texttt{path1/3} call +\texttt{arc/3} with both arguments free. Second, the recursive call +to \texttt{path1/3} can call \texttt{arc/3} in the base clause with +\emph{both arguments bound}. If the graph is linear, the second +argument is functionally dependent on the first, and indexing on the +first argument is sufficient. But, if the graph has a branching factor +$> 1$, WAM style first argument indexing will lead to backtracking, +whereas the JITI can perform direct lookup through the hash tables. +This explains the performance improvement for the \texttt{tree} +graphs. + +Do such improvements hold for real applications? An interesting +application of tabled Prolog is in program analysis, often based in +Anderson's points-to analysis~\cite{anderson-phd}. In this framework, +imperative programs are encoded as a set of facts, and properties of +interest are encoded rules. Program properties can be verified by +checking the closure of the rules. Such programs therefore have +similar properties to the \texttt{path} benchmarks, and should +generate similar performance. Table~\ref{tab:pa} shows such +applications. The first analyses a smallish program and the second the +\texttt{javac} benchmark. + +\begin{table}[ht] + \centering + \begin {tabular}{|l|r|r||r|r|r||} \hline %\cline{1-3} + & \multicolumn{2}{|c||}{\bf Time in sec.} & + \multicolumn{3}{|c||}{\bf Static Space in KB.} \\ + {\bf Benchs.} & \bf $A_1$ & \bf JITI & \bf Clause & \multicolumn{2}{|c||}{\bf Indices} \\ + & & & & \bf $A_1$ & \bf JITI \\ + \hline + \texttt{pta} & 14 & 1.7 & 845 & 318 & 351 \\ + \texttt{tea} & 800 & 36.9 & 36781 & 1793 & 2848 \\ + \hline +\end{tabular} +\caption{Program Analysis} +\label{tab:pa} +\end{table} + +Table~\ref{tab:pa} shows total running times, and size of static +data-base in KB for a YAP run. The first column shows the size in +clauses, the other two show the size of the indices when using +single-argument indexing and the JITI. + + +\begin{table}[ht] +%\vspace{-\intextsep} +%\begin{table}[htbp] +%\centering + \centering + \begin {tabular}{|l|r|r|r|r|} \hline %\cline{1-3} + & \multicolumn{2}{|c|}{\bf Time in sec.} & \bf JITI \\ + {\bf Benchs.} & \bf $A1$ & \bf JITI & \bf Ratio \\ + \hline + \texttt{BreastCancer} & 1450 & 88 & 16\\ + \texttt{Carcinogenesis} & 17,705 & 192 &92\\ + \texttt{Choline} & 14,766 & 1,397 & 11 \\ + \texttt{GeneExpression} & 193,283 & 7,483 & 26 \\ + \texttt{IE-Protein\_Extraction} & 1,677,146 & 2,909 & 577 \\ + \texttt{Krki} & 0.3 & 0.3 & 1 \\ + \texttt{Krki II} & 1.3 & 1.3 & 1 \\ + \texttt{Mesh} & 4 & 3 & 1.3 \\ + \texttt{Mutagenesis} & 51,775 & 27,746 & 1.9\\ + \texttt{Pyrimidines} & 487,545 & 253,235 & 1.9 \\ + \texttt{Susi} & 105,091 & 307 & 342 \\ + \texttt{Thermolysin} & 50,279 & 5,213 & 10 \\ + \hline +\end{tabular} +\caption{Machine Learning (ILP) Datasets} +\label{tab:aleph} +\end{table} + + + +JITI was originally motivated by applications in the area of Machine +Learning that try to learn rules from databases (our compiler is used +on a number of such systems). Table~\ref{tab:aleph} shows performance +for one of the most popular such systems in some detail. The datasets +\texttt{Carcinogenesis}, \texttt{Choline}, \texttt{Mutagenesis}, +\texttt{Pyrimidines}, and \texttt{Thermolysin} are about predicting +chemical properties of compounds. Most queries perform very simple +queries in an extensional database; \texttt{Mutagenesis} includes +several predicates defined as rules; and \texttt{Thermolysin} performs +simple 3D distance computations. \texttt{Krki} are chess end-games. +\texttt{GeneExpression} processes micro-array data, +\texttt{BreastCancer} real-life patient reports, +\texttt{IE-Protein\_Extraction} information extraction from paper +abstracts that mention proteins, \texttt{Susi} shopping patterns, and +\texttt{Mesh} finite-methods mesh design. Several of these datasets +are standard across Machine Learning literature. +\texttt{GeneExpression} and \texttt{BreastCancer} were partly +developed by the authors. + +\begin{table}[ht] +%\vspace{-\intextsep} +%\begin{table}[htbp] +%\centering + \centering + \begin {tabular}{|l|r|r|r|r|} \hline %\cline{1-3} + & \multicolumn{2}{|c|}{\bf Time in sec.} & \bf JITI \\ + {\bf Benchs.} & \bf $A1$ & \bf JITI & \bf Ratio \\ + \hline + \texttt{BreastCancer} & 1450 & 88 & 16\\ + \texttt{Carcinogenesis} & 17,705 & 192 &92\\ + \texttt{Choline} & 14,766 & 1,397 & 11 \\ + \texttt{GeneExpression} & 193,283 & 7,483 & 26 \\ + \texttt{IE-Protein\_Extraction} & 1,677,146 & 2,909 & 577 \\ + \texttt{Krki} & 0.3 & 0.3 & 1 \\ + \texttt{Krki II} & 1.3 & 1.3 & 1 \\ + \texttt{Mesh} & 4 & 3 & 1.3 \\ + \texttt{Mutagenesis} & 51,775 & 27,746 & 1.9\\ + \texttt{Pyrimidines} & 487,545 & 253,235 & 1.9 \\ + \texttt{Susi} & 105,091 & 307 & 342 \\ + \texttt{Thermolysin} & 50,279 & 5,213 & 10 \\ + \hline +\end{tabular} +\caption{Machine Learning (ILP) Datasets} +\label{tab:aleph} +\end{table} + +We compare times for 10 runs of the saturation/refinement cycle of the +ILP system. Table~\ref{tab:aleph} shows very clearly the advantages +of JITI: speedups range up to two orders of magnitude. Applications +such as \texttt{BreastCancer} and \texttt{GeneExpression} manipulate +1NF data (that is, unstructured data). The first benefit is from +multiple-argument indexing. Multi-argument is available in other +Prolog systems~\cite{BIM,xsb-manual,ZhTaUs-small,SWI}), but using +it would require extra user information that would be hard to most ILP +users: the JITI provides that for free. Just multi-argument indexing +does not explain everything. \texttt{BreastCancer} results were of +particular interest to us because the dataset was to a large extent +developed by the authors. It consists of 40 binary relations which are +most often used with the first argument as a key (it is almost +propositional learning). We did not expect a huge speedup, but the +results show the opposite: calls with both arguments bound, or with +the second argument bound may not be very frequent, but they are +frequent enough to justify indexing. This would be difficult to +predict beforehand, even to experienced Prolog programmers. + +\texttt{IE-Protein\_Extraction} and \texttt{Thermolysin} are example +applications that manipulate structured data. +\texttt{IE-Protein\_Extraction} is a large dataset, therefore indexing +is simply critical: we could not run the application in reasonable +time without JITI. \texttt{Thermolysin} is smaller and performs +significant computation per query: even so, indexing is very +important. + +Indexing is no magical bullet. On the flip side, \texttt{Mutagenesis} +is an example where indexing does help, but not by much. The problem +is that most time is spent on recursive predicates that were built to +use the first argument. \texttt{Mutagenesis} also shows a concern with +JITI: we generate large indices but we do not benefit very much. + +\begin{table*}[ht] + \centering + \begin {tabular}{|l|r|r|r|r|r||r|r|r|r|r|r|} \hline %\cline{1-3} + & \multicolumn{5}{|c||}{\bf Static Code} & \multicolumn{6}{|c|}{\bf Dynamic Code \& IDB} \\ + & \textbf{Clause} & \multicolumn{4}{|c||}{\bf Indexing Code} & \textbf{Clause} & \multicolumn{5}{|c|}{\bf Indexing Code} \\ + \textbf{Benchmarks} & & Total & T & W & S & & Total & T & C & W & S \\ + \hline + \texttt{BreastCancer} & 60940 & 46887 & 46242 & + 3126 & 125 & 630 & 14 &42 & 18& 57 &6 \\ + + \texttt{Carcinogenesis} & 1801 & 2678 + &1225 & 587 & 865 & 13512 & 942 & 291 & 91 & 457 & 102 + \\ + + \texttt{Choline} & 666 & 174 + &67 & 48 & 58 & 3172 & 174 + & 76 & 4 & 48 & 45 + \\ + \texttt{GeneExpression} & 46726 & 22629 + &6780 & 6473 & 9375 & 116463 & 9015 + & 2703 & 932 & 3910 & 1469 + \\ + + \texttt{IE-Protein\_Extraction} &146033 & 129333 + &39279 & 24322 & 65732 & 53423 & 1531 + & 467 & 108 & 868 & 86 + \\ + + \texttt{Krki} & 678 & 117 + &52 & 24 & 40 & 2047 & 24 + & 10 & 2 & 10 & 1 + \\ + + \texttt{Krki II} & 1866 & 715 + &180 & 233 & 301 & 2055 & 26 + & 11 & 2 & 11 & 1 + \\ + + \texttt{Mesh} & 802 & 161 + &49 & 18 & 93 & 2149 & 109 + & 46 & 4 & 35 & 22 + \\ + + \texttt{Mutagenesis} & 1412 & 1848 + &1045 & 291 & 510 & 4302 & 595 + & 156 & 114 & 264 & 61 + \\ + + \texttt{Pyrimidines} & 774 & 218 + &76 & 63 & 77 & 25840 & 12291 + & 4847 & 43 & 3510 & 3888 + \\ + + \texttt{Susi} & 5007 & 2509 + &855 & 578 & 1076 & 4497 & 759 + & 324 & 58 & 256 & 120 + \\ + + \texttt{Thermolysin} & 2317 & 929 + &429 & 184 & 315 & 116129 & 7064 + & 3295 & 1438 & 2160 & 170 + \\ + + \hline +\end{tabular} +\caption{Memory Performance on Machine Learning (ILP) Datasets} +\label{tab:ilpmem} +\end{table*} + + +In general, one would wonder whether the benefits in time correspond +to costs in space. Figure~\ref{tab:ilpmem} shows memory performance at +a point near the end of execution. Numbers are given in KB. Because +dynamic memory expands and contracts, we chose a point where dynamic +memory should be at maximum usage. The first five columns show data +usage on static predicates. The leftmost sub-column represents the +code used for clause; the next sub-columns represent space used in +indices for static predicates: the first column gives total usage, +which consists of space used in the main tree, the expanded +wait-nodes, and hash-tables. + +Static data-base sizes range from 146MB to 666KB, the latter mostly in +system libraries. The impact of indexing code varies widely: it is +more than the original code for \texttt{Mutagenesis}, almost as much +for \texttt{IE-Protein\_Extraction}, and in most cases it adds at +least a third and often a half to the original data-base. It is +interesting to check the source of the space overhead: if the source +are hash-tables, we can expect this is because of highly-complex +indices. If overhead is in \emph{wait-nodes}, this again suggests a +sophisticated indexing structure. Overhead in the main tree may be +caused by a large number of nodes, or may be caused by \texttt{try} +nodes. + +One first conclusion is that \emph{wait-nodes} are costly space-wise, +even if they are needed to achieve sensible compilation times. On the +other hand, whether the space is allocated to the tree or to the +hashes varies widely. \texttt{IE-Protein\_Extraction} is an example +where the indices seem very useful: most space was spent in the +hash-tables, although we still are paying much for \emph{wait-nodes}. +\texttt{BreastCancer} has very small hash-tables, because it +attributes range over small domains, but indexing is useful (we +believe this is because we are only interested in the first solution +in this case). + +This version of the ILP system stores most dynamic data in the IDB. +The size of reflects the search space, and is largely independent of +the program's static data (notice that small applications such as +\texttt{Krki} do tend to have a small search space). Aleph's author +very carefully designed the system to work around overheads in +accessing the data-base, so indexing should not be as important. In +fact, indexing has a much lower space overhead in this case, +suggesting it is not so critical. On the other hand, looking at the +actual results shows that indexing is working well: most space is +spent on hashes and the tree, little space is spent on \texttt{try} +instructions. It is hard to separate the contributions of JITI on +static and dynamic data, but the results for \texttt{Mesh} and +\texttt{Mutagenesis}, where the JITI probably has little impact on +static code, suggest a factor of two from indexing on the IDB in this +case. + +Last, we discuss a natural language application, Van Noord's FSA +toolbox. This is an implementation of a set of finite state automata +for natural language tasks. The system includes a test suite with 150 +tasks. We selected the 10 tasks with longer-running times in the +single argument version. + +\begin{table}[ht] + \centering + \begin {tabular}{|l|r|r||r|r|r||} \hline %\cline{1-3} + & \multicolumn{2}{|c||}{\bf Time in msec.} & + \multicolumn{3}{|c||}{\bf Dynamic Space in KB.} \\ + {\bf Benchs.} & \bf $A_1$ & \bf JITI & \bf Clause & \multicolumn{2}{|c||}{\bf Indices} \\ + & & & & \bf $A_1$ & \bf JITI \\ + \hline + \texttt{k963} & 1944 & 684 & 1348 & 26 & 40 \\ + \texttt{k961} & 1972 & 652 & 1348 & 26 & 40 \\ + \texttt{k962} & 1996 & 668 & 1350 & 26 & 40 \\ + \texttt{drg3} & 3532 & 3641 & 649 & 19 & 35 \\ + \texttt{d2ph} & 3612 & 3667 & 649 & 19 & 35 \\ + \texttt{d2m} & 3952 & 3668 & 649 & 19 & 35 \\ + \texttt{ld1} & 4084 & 4016 & 649 & 19 & 35 \\ + \texttt{dg5} & 6084 & 1352 & 3305 & 39 & 61 \\ + \texttt{g2p} & 25212& 14120 & 10373 & 47 & 67 \\ + \texttt{tl3} & 74476& 14925 & 14306 & 70 & 49 \\ + \hline +\end{tabular} +\caption{Performance on a Natural Language Application} +\label{tab:fsa} +\end{table} + +FSA is very different from the two previous examples. These are +relatively complex algorithms, and there is relatively little +``data''. Even so, Table~\ref{tab:fsa} shows significant speedups from +using JITI. Note that Table~\ref{tab:fsa} only shows memory +performance on dynamic data: static data does not show very +significant differences. The results show two different types of +tasks. In cases such as \texttt{tl3} or \texttt{dg5} JITI gives a +significant speedup; in tasks such as \texttt{drg3} the difference +does not seem to be significant, and it some cases JITI is slower. +Analysis show that the tasks that do well are the tasks that use +dynamic predicates. In this case, indexing is beneficial. Although +there is an increase in total code, the indices are good: there is a +reduction in the code for \texttt{try} instructions, and an increase +in code for hash-tables, which indicates dynamic predicates are +indexing well. In tasks such as \texttt{drg3} and friends, the JITI +does not bring much benefits, whereas it spends extra time compiling +and takes extra space. + + \section{Concluding Remarks} %=========================== \begin{itemize}