compact 7.3

git-svn-id: https://yap.svn.sf.net/svnroot/yap/trunk@1838 b08c6af1-5177-4d33-ba66-4b1c6b8b522a
This commit is contained in:
vsc 2007-03-11 23:54:55 +00:00
parent 974d481661
commit 352267fc59

View File

@ -1219,45 +1219,29 @@ time to improve by an order of magnitude. Like \sgCyl, this
suggests that even a small percentage of badly indexed calls can end suggests that even a small percentage of badly indexed calls can end
up dominating runtime. up dominating runtime.
\IEProtein and \Thermolysin are example \IEProtein and \Thermolysin are example applications that manipulate
applications that manipulate structured data. structured data. \IEProtein is the largest dataset we consider, and
\IEProtein is the largest dataset we consider, indexing is absolutely critical: it is simply not possible to run the
and indexing is absolutely critical: it is not possible to run the application in reasonable time with first argument indexing.
application in reasonable time with first argument \Thermolysin is smaller and performs some computation per query, but
indexing. \Thermolysin is smaller and performs some even so, indexing improves performance by an order of magnitude.
computation per query: even so, indexing improves performance by an
order of magnitude.
Table~\ref{tab:ilp:memory} shows the memory cost paid for \JITI. The Table~\ref{tab:ilp:memory} also shows memory usage with \JITI. The
table presents data obtained at a point near the end of execution. table presents data obtained at a point near the end of execution; we
Because dynamic memory expands and contracts, we chose a point where chose a point where memory usage should be at a maximum. The second
memory usage should be at a maximum. The first two numbers show data and third columns show data usage on \emph{static} predicates. The
usage on \emph{static} predicates. Static data-base sizes range from cost varies widely, from 10\% to the worst case, \Carcino, where the
146MB (\bench{IE-Protein\_Extraction} to less than a MB index tree takes more room than the original program. Hash-tables
(\bench{Choline} and \bench{Mesh}). Indexing code can grow dominate usage in \IEProtein and \Susi, whereas \TryRetryTrust chains
to be as large as than the original code, as in \Carcino, or dominate in \BreastCancer. In most other cases no single component
almost as much, e.g., \bench{IE-Protein\_Extraction}. In most cases dominates memory usage. Memory usage for dynamic data is shown in the
the YAP \JITI adds at least a third and often a half to the original last two columns; note that dynamic data is mostly used to store the
data-base. A more detailed analysis shows the source of overhead to be search space. One can observe that there is a much lower overhead in
very different from dataset to dataset. In this case. A more detailed analysis shows that most space is spent on
\bench{IE-Protein\_Extraction} the problem is that hash tables are hash tables and on internal nodes of tree, and that relatively little
very large. Hash tables are also where most space is spent in space is spent on \TryRetryTrust chains, suggesting that \JITI is
\bench{Susi}. In \BreastCancer hash tables are actually small, so most working well.
space is spent in \TryRetryTrust chains. Storing sets of matching
clauses at \jitiSTAR nodes takes usually over 10\% of total memory
usage, but is never dominant.
This version of ALEPH uses the internal data-base to store the IDB.
The size of reflects the search space, and is to some extent
independent of the program's static data, although small applications
such as \Mesh tend to have a small search space. ALEPH's
author very carefully designed the system to work around overheads in
accessing the database, so indexing should not be as critical. The
low overheads suggest that \JITI is working well, as confirmed in
a more detailed analysis: most space is spent on hash tables and on
internal nodes of tree, and relatively little space is spent on
\TryRetryTrust chains.
\section{Concluding Remarks} \section{Concluding Remarks}