2277 lines
90 KiB
Plaintext
2277 lines
90 KiB
Plaintext
|
\documentclass[11pt]{article}
|
||
|
\usepackage{times}
|
||
|
\usepackage{pl}
|
||
|
\usepackage{plpage}
|
||
|
\usepackage{alltt}
|
||
|
\usepackage{html}
|
||
|
\usepackage{verbatim}
|
||
|
\sloppy
|
||
|
\makeindex
|
||
|
|
||
|
\onefile
|
||
|
\htmloutput{.} % Output directory
|
||
|
\htmlmainfile{semweb} % Main document file
|
||
|
\bodycolor{white} % Page colour
|
||
|
|
||
|
\renewcommand{\runningtitle}{SWI-Prolog Semantic Web Library}
|
||
|
|
||
|
\newcommand{\elem}[1]{{\tt\string<#1\string>}}
|
||
|
|
||
|
\begin{document}
|
||
|
|
||
|
\title{SWI-Prolog Semantic Web Library}
|
||
|
\author{Jan Wielemaker \\
|
||
|
University of Amsterdam/VU University Amsterdam \\
|
||
|
The Netherlands \\
|
||
|
E-mail: \email{J.Wielemaker@cs.vu.nl}}
|
||
|
|
||
|
\maketitle
|
||
|
|
||
|
\begin{abstract}
|
||
|
This document describes a library for dealing with standards from the
|
||
|
\url[W3C]{http://www.w3c.org/} standard for the \emph{Semantic Web}.
|
||
|
Like the standards themselves (RDF, RDFS and OWL) this infrastructure
|
||
|
is modular. It consists of Prolog packages for reading, querying and
|
||
|
storing semantic web documents as well as XPCE libraries that provide
|
||
|
visualisation and editing. The Prolog libraries can be used without
|
||
|
the XPCE GUI modules. The library has been actively used with upto 10
|
||
|
million triples, using approximately 1GB of memory. Its scalability
|
||
|
is limited by memory only. The library can be used both on 32-bit
|
||
|
and 64-bit platforms.
|
||
|
\end{abstract}
|
||
|
|
||
|
\vfill
|
||
|
\pagebreak
|
||
|
\tableofcontents
|
||
|
|
||
|
\newpage
|
||
|
|
||
|
|
||
|
\section{Introduction}
|
||
|
|
||
|
SWI-Prolog has started support for web-documents with the development of
|
||
|
a small and fast SGML/XML parser, followed by an RDF parser (early
|
||
|
2000). With the \file{semweb} library we provide more high level support
|
||
|
for manipulating semantic web documents. The semantic web is the likely
|
||
|
point of orientation for knowledge representation in the future, making
|
||
|
a library designed in its spirit promising.
|
||
|
|
||
|
|
||
|
\section{Provided libraries}
|
||
|
|
||
|
Central to this library is the module \pllib{semweb/rdf_db.pl},
|
||
|
providing storage and basic querying for RDF triples. This triple store
|
||
|
is filled using the RDF parser realised by \pllib{rdf.pl}. The storage
|
||
|
module can quickly save and load (partial) databases. The modules
|
||
|
\pllib{semweb/rdfs.pl} and \pllib{semweb/owl.pl} add querying in terms
|
||
|
of the more powerful RDFS and OWL languages. Module
|
||
|
\pllib{semweb/rdf_edit.pl} adds editing, undo, journaling and
|
||
|
change-forwarding. Finally, a variety of XPCE modules visualise and edit
|
||
|
the database. Figure \figref{modules} summarised the modular design.
|
||
|
|
||
|
\postscriptfig[width=0.8\linewidth]{modules}
|
||
|
{Modules for the Semantic Web library}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% RDF_DB %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
|
||
|
\section{Library semweb/rdf_db}
|
||
|
|
||
|
The central module is called \file{rdf_db}. It provides storage and
|
||
|
indexed querying of RDF triples. Triples are stored as a quintuple.
|
||
|
The first three elements denote the RDF triple. \arg{File} and
|
||
|
\arg{Line} provide information about the origin of the triple.
|
||
|
|
||
|
\begin{quote}
|
||
|
\{\arg{Subject} \arg{Predicate} \arg{Object} \arg{File} \arg{Line}\}
|
||
|
\end{quote}
|
||
|
|
||
|
The actual storage is provided by the \jargon{foreign language (C)}
|
||
|
module \file{rdf_db.c}. Using a dedicated C-based implementation we
|
||
|
can reduced memory usage and improve indexing capabilities.%
|
||
|
\footnote{The orginal implementation was in Prolog. This
|
||
|
version was implemented in 3 hours, where the C-based
|
||
|
implementation costed a full week. The C-based
|
||
|
implementation requires about half the memory and
|
||
|
provides about twice the performance.}
|
||
|
Currently the following indexing is provided.
|
||
|
|
||
|
\begin{itemize}
|
||
|
\item Any of the 3 fields of the triple
|
||
|
\item \arg{Subject} + \arg{Predicate} and \arg{Predicate} + \arg{Object}
|
||
|
\item \arg{Predicates} are indexed on the \jargon{highest property}. In
|
||
|
other words, if predicates are related through
|
||
|
\const{subPropertyOf} predicates indexing happens on the most
|
||
|
abstract predicate. This makes calls to rdf_has/4 very
|
||
|
efficient.
|
||
|
\item String literal \arg{Objects} are indexed case-insensitive to make
|
||
|
case-insensitive queries fully indexed. See rdf/3.
|
||
|
\end{itemize}
|
||
|
|
||
|
\subsection{Query the RDF database}
|
||
|
\label{sec:rdfquery}
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf}{3}{?Subject, ?Predicate, ?Object}
|
||
|
Elementary query for triples. \arg{Subject} and \arg{Predicate} are
|
||
|
atoms representing the fully qualified URL of the resource. \arg{Object}
|
||
|
is either an atom representing a resource or \term{literal}{Value} if
|
||
|
the object is a literal value. If a value of the form
|
||
|
\infixterm{:}{NameSpaceID}{LocalName} is provided it is expanded to a
|
||
|
ground atom using expand_goal/2. This implies you can use this construct
|
||
|
in compiled code without paying a performance penalty. See also
|
||
|
\secref{rdfns}. Literal values take one of the following forms:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{Atom}{}
|
||
|
If the value is a simple atom it is the textual representation of
|
||
|
a string literal without explicit type or language (\const{xml:lang})
|
||
|
qualifier.
|
||
|
|
||
|
\termitem{lang}{LangID, Atom}
|
||
|
\arg{Atom} represents the text of a string literal qualified with
|
||
|
the given language.
|
||
|
|
||
|
\termitem{type}{TypeID, Value}
|
||
|
Used for attributes qualified using the \const{rdf:datatype}
|
||
|
\arg{TypeID}. The \arg{Value} is either the textual representation or a
|
||
|
natural Prolog representation. See the option
|
||
|
\term{convert_typed_literal}{:Convertor} of the parser. The storage
|
||
|
layer provides efficient handling of atoms, integers (64-bit) and floats
|
||
|
(native C-doubles). All other data is represented as a Prolog
|
||
|
record.
|
||
|
\end{description}
|
||
|
|
||
|
For string querying purposes, \arg{Object} can be of the form
|
||
|
\term{literal}{+Query, -Value}, where \arg{Query} is one of the
|
||
|
terms below. Details of literal matching and indexing are described
|
||
|
in \secref{litindex}.
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{plain}{+Text}
|
||
|
Perform exact match \textbf{and} demand the language or
|
||
|
type qualifiers to match. This query is fully indexed.%
|
||
|
\footnote{This should have been the default when using
|
||
|
literal with one argument because it is logically
|
||
|
consisent (i.e., (rdf(S,P,literal(X)), X == hello)
|
||
|
would have been the same as
|
||
|
rdf(S,P,literal(hello). In addition, this is
|
||
|
consistent with SPARQL literal identity
|
||
|
definition.}
|
||
|
\termitem{exact}{+Text}
|
||
|
Perform exact, but case-insensitive match. This query is
|
||
|
fully indexed.
|
||
|
\termitem{substring}{+Text}
|
||
|
Match any literal that contains \arg{Text} as a case-insensitive
|
||
|
substring. The query is not indexed on \arg{Object}.
|
||
|
\termitem{word}{+Text}
|
||
|
Match any literal that contains \arg{Text} delimited by
|
||
|
a non alpha-numeric character, the start or end of the
|
||
|
string. The query is not indexed on \arg{Object}.
|
||
|
\termitem{prefix}{+Text}
|
||
|
Match any literal that starts with \arg{Text}. This call
|
||
|
is intended for \jargon{completion}. The query is indexed
|
||
|
using the binary tree of literals. See \secref{litindex}
|
||
|
for details.
|
||
|
\termitem{like}{+Pattern}
|
||
|
Match any literal that matches \arg{Pattern} case
|
||
|
insensitively, where the `*' character in \arg{Pattern}
|
||
|
matches zero or more characters.
|
||
|
\end{description}
|
||
|
|
||
|
Backtracking never returns duplicate triples. Duplicates can be
|
||
|
retrieved using rdf/4. The predicate rdf/3 raises a type-error if called
|
||
|
with improper arguments. If rdf/3 is called with a term
|
||
|
\term{literal}{_} as \arg{Subject} or \arg{Predicate} object it fails
|
||
|
silently. This allows for graph matching goals like
|
||
|
\verb$rdf(S,P,O),rdf(O,P2,O2)$ to proceed without errors.%
|
||
|
\footnote{Discussion in the SPARQL community votes for allowing
|
||
|
literal values as subject. Although we have no
|
||
|
principal objections, we fear such an extension will
|
||
|
promote poor modelling practice.}
|
||
|
|
||
|
\predicate{rdf}{4}{?Subject, ?Predicate, ?Object, ?Source}
|
||
|
As rdf/3 but in addition return the source-location of the triple. The
|
||
|
source is either a plain atom or a term of the format
|
||
|
\infixterm{:}{Atom}{Integer} where \arg{Atom} is intended to be used as
|
||
|
filename or URL and \arg{Integer} for representing the line-number.
|
||
|
Unlike rdf/3, this predicate does not remove duplicates from the result
|
||
|
set.
|
||
|
|
||
|
\predicate{rdf_has}{4}{?Subject, ?Predicate, ?Object, -TriplePred}
|
||
|
This query exploits the RDFS \const{subPropertyOf} relation. It
|
||
|
returns any triple whose stored predicate equals \arg{Predicate} or
|
||
|
can reach this by following the recursive \arg{subPropertyOf} relation.
|
||
|
The actual stored predicate is returned in \arg{TriplePred}. The example
|
||
|
below gets all subclasses of an RDFS (or OWL) class, even if the
|
||
|
relation used is not \const{rdfs:subClassOf}, but a user-defined
|
||
|
sub-property thereof.%
|
||
|
\footnote{This predicate realises semantics defined in
|
||
|
RDF-Schema rather than RDF. It is part of the
|
||
|
\pllib{rdf_db} module because the indexing of
|
||
|
this module incorporates the \const{rdfs:subClassOf}
|
||
|
predicate.}
|
||
|
|
||
|
\begin{code}
|
||
|
subclasses(Class, SubClasses) :-
|
||
|
findall(S, rdf_has(S, rdfs:subClassOf, Class), SubClasses).
|
||
|
\end{code}
|
||
|
|
||
|
Note that rdf_has/4 and rdf_has/3 can return duplicate answers if
|
||
|
they use a different \arg{TriplePred}.
|
||
|
|
||
|
\predicate{rdf_has}{3}{?Subject, ?Predicate, ?Object}
|
||
|
Same as \term{rdf_has}{Subject, Predicate, Object, _}.
|
||
|
|
||
|
\predicate{rdf_reachable}{3}{?Subject, +Predicate, ?Object}
|
||
|
Is true if \arg{Object} can be reached from \arg{Subject} following
|
||
|
the transitive predicate \arg{Predicate} or a sub-property thereof.
|
||
|
When used with either \arg{Subject} or \arg{Object} unbound, it first
|
||
|
returns the origin, followed by the reachable nodes in breath-first
|
||
|
search-order. It never generates the same node twice and is robust
|
||
|
against cycles in the transitive relation. With all arguments
|
||
|
instantiated it succeeds deterministically of the relation if a
|
||
|
path can be found from \arg{Subject} to \arg{Object}. Searching
|
||
|
starts at \arg{Subject}, assuming the branching factor is normally
|
||
|
lower. A call with both \arg{Subject} and \arg{Object} unbound
|
||
|
raises an instantiation error. The following example generates
|
||
|
all subclasses of \const{rdfs:Resource}:
|
||
|
|
||
|
\begin{code}
|
||
|
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource').
|
||
|
|
||
|
X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ;
|
||
|
|
||
|
X = 'http://www.w3.org/2000/01/rdf-schema#Class' ;
|
||
|
|
||
|
X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ;
|
||
|
|
||
|
...
|
||
|
\end{code}
|
||
|
|
||
|
\predicate{rdf_reachable}{5}{?Subject, +Predicate, ?Object, +MaxD, -D}
|
||
|
Same as rdf_reachable/3, but in addition, \arg{MaxD} limits the number
|
||
|
of relations expanded and \arg{D} is unified with the `distance' between
|
||
|
\arg{Subject} and \arg{Object}. Distance 0 means \arg{Subject} and
|
||
|
\arg{Object} are the same resource. \arg{MaxD} can be the constant
|
||
|
\const{infinite} to impose no distance-limit.
|
||
|
|
||
|
\predicate{rdf_subject}{1}{?Subject}
|
||
|
Enumerate resources appearing as a subject in a triple. The main reason
|
||
|
for this predicate is to generate the known subjects \emph{without
|
||
|
duplicates} as one gets using \term{rdf}{Subject, _, _}.
|
||
|
|
||
|
\predicate{rdf_current_literal}{1}{-Literal}
|
||
|
Enumerate all known literals. Like rdf_subject/1, the motivation is
|
||
|
to provide access to literals without generation duplicates. Otherwise
|
||
|
the call is the same as \term{rdf}{_,_,literal(Literal)}.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{Literal matching and indexing} \label{sec:litindex}
|
||
|
|
||
|
Starting with version 2.5.0 of this library, literal values are ordered
|
||
|
and indexed using a balanced binary tree (AVL tree). The aim of this
|
||
|
index is threefold.
|
||
|
|
||
|
\begin{itemize}
|
||
|
\item Unlike hash-tables, binary trees allow for efficient
|
||
|
\jargon{prefix} matching. Prefix matching is very useful in
|
||
|
interactive applications to provide feedback while typing such
|
||
|
as auto-completion.
|
||
|
|
||
|
\item Having a table of unique literals we generate creation and
|
||
|
destruction events (see rdf_monitor/2). These events can
|
||
|
be used to maintain additional indexing on literals, such
|
||
|
as `by word'.
|
||
|
|
||
|
\item A binary table allow for fast interval matching on typed
|
||
|
numeric literals.\footnote{Not yet implemented}
|
||
|
\end{itemize}
|
||
|
|
||
|
As string literal matching is most frequently used for searching
|
||
|
purposes, the match is executed case-insensitive and after removal of
|
||
|
diacritics. Case matching and diacritics removal is based on Unicode
|
||
|
character properties and independent from the current locale. Case
|
||
|
conversion is based on the `simple uppercase mapping' defined by Unicode
|
||
|
and diacritic removal on the `decomposition type'. The approach is
|
||
|
lightweight, but somewhat simpleminded for some languages. The
|
||
|
tables are generated for Unicode characters upto 0x7fff. For more
|
||
|
information, please check the source-code of the mapping-table generator
|
||
|
\file{unicode_map.pl} available in the sources of this package.
|
||
|
|
||
|
Currently the total order of literals is first based on the type of
|
||
|
literal using the ordering $$numeric < string < term$$ Numeric values
|
||
|
(integer and float) are ordered by value, integers preceed floats if
|
||
|
they represent the same value. strings are sorted alphabetically after
|
||
|
case-mapping and diacritic removal as described above. If they match
|
||
|
equal, uppercase preceeds lowercase and diacritics are ordered on their
|
||
|
unicode value. If they still compare equal literals without any
|
||
|
qualifier preceeds literals with a type qualifier which preceeds
|
||
|
literals with a language qualifier. Same qualifiers (both type or
|
||
|
both language) are sorted alphabetically.%
|
||
|
\footnote{The ordering defined above may change in future versions
|
||
|
to deal with new queries for literals.}
|
||
|
|
||
|
The ordered tree is used for indexed execution of
|
||
|
\term{literal}{\term{prefix}{Prefix}, Literal} as well as
|
||
|
\term{literal}{\term{like}{Like}, Literal} if \arg{Like} does not start
|
||
|
with a `*'. Note that results of queries that use the tree index
|
||
|
are returned in alphabetical order.
|
||
|
|
||
|
|
||
|
\subsection{Predicate properties} \label{sec:predproperty}
|
||
|
|
||
|
The predicates below form an experimental interface to provide more
|
||
|
reasoning inside the kernel of the rdb_db engine. Note that
|
||
|
\const{symetric}, \const{inverse_of} and \const{transitive} are not yet
|
||
|
supported by the rest of the engine.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_current_predicate}{1}{?Predicate}
|
||
|
Enumerate all predicates that are used in at least one triple. Behaves
|
||
|
as the code below, but much more efficient.
|
||
|
|
||
|
\begin{code}
|
||
|
rdf_current_predicate(Predicate) :-
|
||
|
findall(P, rdf(_,P,_), Ps),
|
||
|
sort(Ps, S),
|
||
|
member(Predicate, S).
|
||
|
\end{code}
|
||
|
|
||
|
Note that there is no relation to defined RDF properties. Properties
|
||
|
that have no triples are not reported by this predicate, while
|
||
|
predicates that are involved in triples do not need to be defined
|
||
|
as an instance of rdf:Property.
|
||
|
|
||
|
\predicate{rdf_set_predicate}{2}{+Predicate, +Property}
|
||
|
Define a property of the predicate. This predicate currently supports
|
||
|
the properties \const{symmetric}, \const{inverse_of} and
|
||
|
\const{transitive} as defined with rdf_predicate_property/2. Adding
|
||
|
an $A$ inverse_of $B$ also adds $B$ inverse_of $A$. An inverse relation
|
||
|
is deleted using \term{inverse_of}{[]}.
|
||
|
`
|
||
|
\predicate{rdf_predicate_property}{2}{?Predicate, -Property}
|
||
|
Query properties of a defined predicate. Currently defined properties
|
||
|
are given below.
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{symmetric}{Bool}
|
||
|
True if the predicate is defined to be symetric. I.e.\
|
||
|
\mbox{\{A\} P \{B\}} implies \mbox{\{B\} P \{A\}}.
|
||
|
|
||
|
\termitem{inverse_of}{Inverse}
|
||
|
True if this predicate is the inverse of \arg{Inverse}.
|
||
|
|
||
|
\termitem{transitive}{Bool}
|
||
|
True if this predicate is transitive.
|
||
|
|
||
|
\termitem{triples}{Triples}
|
||
|
Unify \arg{Triples} with the number of existing triples using
|
||
|
this predicate as second argument. Reporting the number of
|
||
|
triples is intended to support query optimization.
|
||
|
|
||
|
\termitem{rdf_subject_branch_factor}{-Float}
|
||
|
Unify \arg{Float} with the average number of triples associated with
|
||
|
each unique value for the subject-side of this relation. If there
|
||
|
are no triples the value 0.0 is returned. This value is cached with
|
||
|
the predicate and recomputed only after substantial changes to the
|
||
|
triple set associated to this relation. This property is indented
|
||
|
for path optimalisation when solving conjunctions of rdf/3 goals.
|
||
|
|
||
|
\termitem{rdf_object_branch_factor}{-Float}
|
||
|
Unify \arg{Float} with the average number of triples associated with
|
||
|
each unique value for the object-side of this relation. In addition
|
||
|
to the comments with the subject_branch_factor property, uniqueness
|
||
|
of the object value is computed from the hash key rather than the
|
||
|
actual values.
|
||
|
|
||
|
\termitem{rdfs_subject_branch_factor}{-Float}
|
||
|
Same as \functor{rdf_subject_branch_factor}{1}, but also considering
|
||
|
triples of `subPropertyOf' this relation. See also rdf_has/3.
|
||
|
|
||
|
\termitem{rdfs_object_branch_factor}{-Float}
|
||
|
Same as \functor{rdf_object_branch_factor}{1}, but also considering
|
||
|
triples of `subPropertyOf' this relation. See also rdf_has/3.
|
||
|
\end{description}
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Modifying the database} \label{sec:rdfmodify}
|
||
|
|
||
|
As depicted in \figref{modules}, there are two levels of modification.
|
||
|
The \file{rdf_db} module simply modifies, where the \file{rdf_edit}
|
||
|
library provides transactions and undo on top of this. Applications
|
||
|
that wish to use the \file{rdf_edit} layer must \emph{never} use the
|
||
|
predicates from this section directly.
|
||
|
|
||
|
\subsubsection{Modifying predicates} \label{sec:modpreds}
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_assert}{3}{+Subject, +Predicate, +Object}
|
||
|
Assert a new triple into the database. This is equivalent to
|
||
|
rdf_assert/4 using \arg{SourceRef} \const{user}. \arg{Subject} and
|
||
|
\arg{Predicate} are resources. \arg{Object} is either a resource or a
|
||
|
term \term{literal}{Value}. See rdf/3 for an explanation of \arg{Value}
|
||
|
for typed and language qualified literals. All arguments are subject to
|
||
|
name-space expansion (see \secref{rdfns}).
|
||
|
|
||
|
\predicate{rdf_assert}{4}{+Subject, +Predicate, +Object, +SourceRef}
|
||
|
As rdf_assert/3, adding \arg{SourceRef} to specify the orgin of the
|
||
|
triple. \arg{SourceRef} is either an atom or a term of the format
|
||
|
\arg{Atom}:\arg{Int} where \arg{Atom} normally refers to a filename
|
||
|
and \arg{Int} to the line-number where the description starts.
|
||
|
|
||
|
\predicate{rdf_retractall}{3}{?Subject, ?Predicate, ?Object}
|
||
|
Removes all matching triples from the database. Previous Prolog
|
||
|
implementations also provided a backtracking \predref{rdf_retract}{3},
|
||
|
but this proved to be rarely used and could always be replaced with
|
||
|
rdf_retractall/3. As rdf_retractall/4 using an unbound \arg{SourceRef}.
|
||
|
|
||
|
\predicate{rdf_retractall}{4}{?Subject, ?Predicate, ?Object, ?SourceRef}
|
||
|
As rdf_retractall/4, also matching on the \arg{SourceRef}. This is
|
||
|
particulary useful to update all triples coming from a loaded file.
|
||
|
|
||
|
\predicate{rdf_update}{4}{+Subject, +Predicate, +Object, +Action}
|
||
|
Replaces one of the three fields on the matching triples depending
|
||
|
on \arg{Action}:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{subject}{Resource}
|
||
|
Changes the first field of the triple.
|
||
|
\termitem{predicate}{Resource}
|
||
|
Changes the second field of the triple.
|
||
|
\termitem{object}{Object}
|
||
|
Changes the last field of the triple to the given resource or
|
||
|
\term{literal}{Value}.
|
||
|
\termitem{source}{Source}
|
||
|
Changes the source location (\jargon{payload}). Note that updating the
|
||
|
source has no consequences for the semantics and therefore the
|
||
|
\jargon{generation} (see rdf_generation/1) is \emph{not} updated.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_update}{5}{+Subject, +Predicate, +Object,
|
||
|
+Source,+Action}
|
||
|
As rdf_update/4 but allows for specifying the source.
|
||
|
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{Transactions} \label{transactions}
|
||
|
|
||
|
\index{transaction}%
|
||
|
The predicates from \secref{modpreds} perform immediate and atomic
|
||
|
modifications to the database. There are two cases where this is not
|
||
|
desirable:
|
||
|
|
||
|
\begin{enumerate}
|
||
|
\item
|
||
|
If the database is modified using information based on reading the same
|
||
|
database. A typical case is a forward reasoner examining the database
|
||
|
and asserting new triples that can be deduced from the already existing
|
||
|
ones. For example, \emph{if $length(X) > 2$ then size(X) is large}:
|
||
|
|
||
|
\begin{code}
|
||
|
( rdf(X, length, literal(L)),
|
||
|
atom_number(L, IL),
|
||
|
IL > 2,
|
||
|
rdf_assert(X, size, large),
|
||
|
fail
|
||
|
; true
|
||
|
).
|
||
|
\end{code}
|
||
|
|
||
|
Running this code without precautions causes an error because
|
||
|
rdf_assert/3 tries to get a write lock on the database which has
|
||
|
an a read operation (rdf/3 has choicepoints) in progress.
|
||
|
|
||
|
\item
|
||
|
Multi-threaded access making multiple changes to the database that
|
||
|
must be handled as a unit.
|
||
|
\end{enumerate}
|
||
|
|
||
|
Where the second case is probably obvious, the first case is less so.
|
||
|
The storage layer may require reindexing after adding or deleting
|
||
|
triples. Such reindexing operatations however are not possible while
|
||
|
there are active read operations in other threads or from choicepoints
|
||
|
that can be in the same thread. For this reason we added
|
||
|
rdf_transaction/2. Note that, like the predicates from
|
||
|
\secref{modpreds}, rdf_transaction/2 raises a permission error exception
|
||
|
if the calling thread has active choicepoints on the database. The
|
||
|
problem is illustrated below. The rdf/3 call leaves a choicepoint and
|
||
|
as the read lock originates from the calling thread itself the system
|
||
|
will deadlock if it would not generate an exception.
|
||
|
|
||
|
\begin{code}
|
||
|
1 ?- rdf_assert(a,b,c).
|
||
|
|
||
|
Yes
|
||
|
2 ?- rdf_assert(a,b,d).
|
||
|
|
||
|
Yes
|
||
|
3 ?- rdf(a,b,X), rdf_transaction(rdf_assert(a,b,e)).
|
||
|
ERROR: No permission to write rdf_db `default' (Operation would deadlock)
|
||
|
^ Exception: (8) rdf_db:rdf_transaction(rdf_assert(a, b, e)) ? no debug
|
||
|
4 ?-
|
||
|
\end{code}
|
||
|
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_transaction}{1}{:Goal}
|
||
|
Same as \term{rdf_transaction}{Goal, \const{user}}.
|
||
|
|
||
|
\predicate{rdf_transaction}{2}{:Goal, +Id}
|
||
|
After starting a transaction, all predicates from \secref{modpreds}
|
||
|
append their operation to the \emph{transaction} instead of modifying
|
||
|
the database. If \arg{Goal} succeeds rdf_transaction cuts all
|
||
|
choicepoints in \arg{Goal} and executes all recorded operations. If
|
||
|
\arg{Goal} fails or throws an exception, all recorded operations are
|
||
|
discarded and rdf_transaction/1 fails or re-throws the exception.
|
||
|
|
||
|
On entry, rdf_transaction/1 gains exclusive access to the database, but
|
||
|
does allow readers to come in from all threads. After the successful
|
||
|
completion of \arg{Goal} rdf_transaction/1 gains completely exclusive
|
||
|
access while performing the database updates.
|
||
|
|
||
|
Transactions may be nested. Committing a nested transactions merges
|
||
|
its change records into the outer transaction, while discarding a
|
||
|
nested transaction simply destroys the change records belonging to
|
||
|
the nested transaction.
|
||
|
|
||
|
The \arg{Id} argument may be used to identify the transaction. It is
|
||
|
passed to the begin/end events posted to hooks registered with
|
||
|
rdf_monitor/2. The \arg{Id} \term{log}{Term} can be used to enrich the
|
||
|
journal files with additional history context. See \secref{enrich}.
|
||
|
|
||
|
\predicate{rdf_active_transaction}{1}{?Id}
|
||
|
True if \arg{Id} is the identifier of a currently active transaction
|
||
|
(i.e.\ rdf_active_transaction/1 is called from rdf_transaction/2 with
|
||
|
matching \arg{Id}). Note that transaction identifier is not copied and
|
||
|
therefore need not be ground and can be further instantiated during the
|
||
|
transaction. \arg{Id} is first unified with the innermost transaction
|
||
|
and backtracking with the identifier of other active transaction. Fails
|
||
|
if there is no matching transaction active, which includes the case
|
||
|
where there is no transaction in progress.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Loading and saving to file} \label{sec:rdffile}
|
||
|
|
||
|
The \file{rdf_db} module can read and write RDF-XML for import and
|
||
|
export as well as a binary format built for quick load and save
|
||
|
described in \secref{rdffastfile}. Here are the predicates
|
||
|
for portable RDF load and save.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_load}{1}{+InOrList}
|
||
|
Load triples from \arg{In}, which is either a stream opened for reading,
|
||
|
an atom specifying a filename, a URL or a list of valid inputs. This
|
||
|
predicate calls process_rdf/3 to read the source one description at a
|
||
|
time, avoiding limits to the size of the input. By default, this
|
||
|
predicate provides for caching the results for quick-load using
|
||
|
rdf_load_db/1 described below. Caching strategy and options are
|
||
|
description in \secref{rdfcache}.
|
||
|
|
||
|
\predicate{rdf_load}{2}{+FileOrList, +Options}
|
||
|
As rdf_load/1, providing additional options. The options are handed
|
||
|
to the RDF parser and implemented by process_rdf/3. In addition, the
|
||
|
following options are provided:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{cache}{+Bool}
|
||
|
If \const{true} (default), try to use cached data or create a cache
|
||
|
file. Otherwise load the source.
|
||
|
|
||
|
\termitem{db}{+Graph}
|
||
|
Deprecated. New code should use the \term{graph}{+Graph} option.
|
||
|
|
||
|
\termitem{format}{+Format}
|
||
|
Specify the source format explicitly. Normally this is deduced from
|
||
|
the filename extension or the mime-type. The core library understands
|
||
|
the formats \const{xml} (RDF/XML) and \const{triples} (internal quick
|
||
|
load and cache format).
|
||
|
|
||
|
\termitem{graph}{+Graph}
|
||
|
Load the data in the given named graph. The default is the URL of the
|
||
|
source.
|
||
|
|
||
|
\termitem{if}{+Condition}
|
||
|
Condition under which to load the source. \arg{Condition} is the same as
|
||
|
for the Prolog load_files/2 predicate: \const{changed} (default) load
|
||
|
the source if it was not loaded before or has changed; \const{true}
|
||
|
(re-)loads the source unconditionally and \const{not_loaded} loads the
|
||
|
source if it was not loaded, but does not check for modifications.
|
||
|
|
||
|
\termitem{silent}{+Bool}
|
||
|
If \arg{Bool} is \const{true}, the message reporting completion is
|
||
|
printed using level \const{silent}. Otherwise the level is
|
||
|
\const{informational}. See also print_message/2.
|
||
|
|
||
|
\termitem{register_namespaces}{+Bool}
|
||
|
If \const{true} (default \const{false}), register \verb$xmlns:ns=url$
|
||
|
namespace declarations as rdf_db:ns(ns,url) namespaces if there is no
|
||
|
conflict.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_unload}{1}{+Spec}
|
||
|
Remove all triples loaded from \arg{Spec}. \arg{Spec} is either a graph
|
||
|
name or a source specificatipn. If \arg{Spec} does not refer to a loaded
|
||
|
database the predicate succeeds silently.
|
||
|
|
||
|
\predicate{rdf_save}{1}{+File}
|
||
|
Save all known triples to the given \arg{File}. Same as
|
||
|
\term{rdf_save}{File, []}.
|
||
|
|
||
|
\predicate{rdf_save}{2}{+File, +Options}
|
||
|
Save with options. Provided options are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{graph}{+URI}
|
||
|
Save all triples that belong to the named-graph \arg{URI}. Saving
|
||
|
arbitrary selections is possible using predicates from
|
||
|
\secref{partsave}.
|
||
|
|
||
|
\termitem{db}{+FileRef}
|
||
|
Deprecated synonym for \term{graph}{URI}.
|
||
|
|
||
|
\termitem{anon}{+Bool}
|
||
|
if \term{anon}{false} is provided anonymous resources are only saved
|
||
|
if the resource appears in the object field of another triple that is
|
||
|
saved.
|
||
|
|
||
|
\termitem{base_uri}{+BaseURI}
|
||
|
If provided, emit \const{xml:base}="\arg{BaseURI}" in the header and
|
||
|
emit all URIs that are relative to the base-uri. The \const{xml:base}
|
||
|
declaration can be suppressed using the option
|
||
|
\term{write_xml_base}{false}
|
||
|
|
||
|
\termitem{write_xml_base}{+Bool}
|
||
|
If \const{false} (default \const{true}), do \emph{not} emit the
|
||
|
\const{xml:base} declaration from the given \const{base_uri} option.
|
||
|
The idea behind this option is to be able to create documents with
|
||
|
URIs relative to the document itself:
|
||
|
|
||
|
\begin{code}
|
||
|
...,
|
||
|
rdf_save(File,
|
||
|
[ base_uri(BaseURI),
|
||
|
write_xml_base(false)
|
||
|
]),
|
||
|
...
|
||
|
\end{code}
|
||
|
|
||
|
\termitem{convert_typed_literal}{:Converter}
|
||
|
If present, raw literal values are first passed to \arg{Converter} to
|
||
|
apply the reverse of the \const{convert_typed_literal} option of the
|
||
|
RDF parser. The \arg{Converter} is called with the same arguments
|
||
|
as in the RDF parser, but now with the last argument instantiated
|
||
|
and the first two unbound. A proper convertor that can be used for
|
||
|
both loading and saving must be a logical predicate.
|
||
|
|
||
|
\termitem{encoding}{+Encoding}
|
||
|
Define the XML encoding used for the file. Defined values are
|
||
|
\const{utf8} (default), \const{iso_latin_1} and \const{ascii}.
|
||
|
Using \const{iso_latin_1} or \const{ascii}, characters not covered by
|
||
|
the encoding are emitted as XML character entities (\verb$&#...;$).
|
||
|
|
||
|
\termitem{document_language}{+XMLLang}
|
||
|
The value \arg{XMLLang} is used for the \const{xml:lang} attribute
|
||
|
in the outermost \const{rdf:RDF} element. This language acts as
|
||
|
a default, which implies that the \const{xml:lang} tag is only used
|
||
|
for literals with a \emph{different} language identifier. Please note
|
||
|
that this option will cause all literals without language tag to be
|
||
|
interpreted using \arg{XMLLang}.
|
||
|
|
||
|
\termitem{namespaces}{+List}
|
||
|
Explicitely specify saved namespace declarations. See rdf_save_header/2
|
||
|
option namespaces for details.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_graph}{1}{?DB}
|
||
|
True if \arg{DB} is the name of a graph with at least one triple.
|
||
|
|
||
|
\predicate{rdf_source}{1}{?DB}
|
||
|
Deprecated. Use rdf_graph/1 or rdf_source/2 in new code.
|
||
|
|
||
|
\predicate{rdf_source}{2}{?DB, ?SourceURL}
|
||
|
True if the named graph \arg{DB} was loaded from the source
|
||
|
\arg{SourceURL}. A named graph is associated with a \arg{SourceURL} by
|
||
|
rdf_load/2. The association is stored in the internal binary format,
|
||
|
which ensures proper maintenance of the original source through caching
|
||
|
and the persistency layer.
|
||
|
|
||
|
\predicate{rdf_make}{0}{}
|
||
|
Re-load all RDF sourcefiles (see rdf_source/1) that have changed since
|
||
|
they were loaded the last time. This implies all triples that originate
|
||
|
from the file are removed and the file is re-loaded. If the file is
|
||
|
cached a new cache-file is written. Please note that the new triples
|
||
|
are added at the end of the database, possibly changing the order of
|
||
|
(conflicting) triples.
|
||
|
\end{description}
|
||
|
|
||
|
\subsubsection{Caching triples}
|
||
|
\label{sec:rdfcache}
|
||
|
|
||
|
The library \pllib{semweb/rdf_cache} defines the caching strategy for
|
||
|
triples sources. When using large RDF sources, caching triples greatly
|
||
|
speedup loading RDF documents. The cache library implements two caching
|
||
|
strategies that are controlled by rdf_set_cache_options/1.
|
||
|
|
||
|
\paragraph{Local caching} This approach applies to files only. Triples
|
||
|
are cached in a sub-directory of the directory holding the source. This
|
||
|
directory is called \file{.cache} (\file{_cache} on Windows). If the
|
||
|
cache option \const{create_local_directory} is \const{true}, a cache
|
||
|
directory is created if posible.
|
||
|
|
||
|
\paragraph{Global caching} This approach applies to all sources, except
|
||
|
for unnamed streams. Triples are cached in directory defined by the
|
||
|
cache option \const{global_directory}.
|
||
|
|
||
|
When loading an RDF file, the system scans the configured cache files
|
||
|
unless \term{cache}{false} is specified as option to rdf_load/2 or
|
||
|
caching is disabled. If caching is enabled but no cache exists, the
|
||
|
system will try to create a cache file. First it will try to do this
|
||
|
locally. On failure it will try to configured global cache.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_set_cache_options}{1}{+Options}
|
||
|
Set cache options. Defined options are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{enabled}{Bool}
|
||
|
If \const{true} (default), caching is enabled.
|
||
|
|
||
|
\termitem{local_directory}{Atom}
|
||
|
Local directory to use for caching. Default \const{.cache}
|
||
|
(Windows: \const{_cache}).
|
||
|
|
||
|
\termitem{create_local_directory}{Bool}
|
||
|
If \const{true} (default \const{false}), create a local cache
|
||
|
directory if none exists and the directory can be created.
|
||
|
|
||
|
\termitem{global_directory}{Atom}
|
||
|
Global directory to use for caching. The directory is created if the
|
||
|
option \const{create_global_directory} is also given and set to
|
||
|
\const{true}. Sub-directories are created to speedup indexing on
|
||
|
filesystems that perform poorly on directories with large numbers of
|
||
|
files. Initially not defined.
|
||
|
|
||
|
\termitem{create_global_directory}{Bool}
|
||
|
If \const{true} (default \const{false}), create a global cache
|
||
|
directory if none exists.
|
||
|
|
||
|
\end{description}
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{Partial save} \label{sec:partsave}
|
||
|
|
||
|
Sometimes it is necessary to make more arbitrary selections of material
|
||
|
to be saved or exchange RDF descriptions over an open network link. The
|
||
|
predicates in this section provide for this. Character encoding issues
|
||
|
are derived from the encoding of the \arg{Stream}, providing support for
|
||
|
\const{utf8}, \const{iso_latin_1} and \const{ascii}.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_save_header}{2}{+Stream, +Options}
|
||
|
Save an RDF header, with the XML header, \const{DOCTYPE},
|
||
|
\const{ENTITY} and opening the \const{rdf:RDF} element with appropriate
|
||
|
namespace declarations. It uses the primitives from \secref{rdfns} to
|
||
|
generate the required namespaces and desired short-name. \arg{Options}
|
||
|
is one of:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{graph}{+URI}
|
||
|
Only search for namespaces used in triples that belong to the
|
||
|
given named graph.
|
||
|
|
||
|
\termitem{db}{+FileRef}
|
||
|
Deprecated synonym for \term{graph}{FileRef}.
|
||
|
|
||
|
\termitem{namespaces}{+List}
|
||
|
Where \arg{List} is a list of namespace abbreviations (see
|
||
|
\secref{rdfns}). With this option, the expensive search for
|
||
|
all namespaces that may be used by your data is omitted. The
|
||
|
namespaces \const{rdf} and \const{rdfs} are added to the provided
|
||
|
\arg{List}. If a namespace is not declared, the resource is
|
||
|
emitted in non-abreviated form.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_save_footer}{1}{+Stream}
|
||
|
Close the work opened with rdf_save_header/2.
|
||
|
|
||
|
\predicate{rdf_save_subject}{3}{+Stream, +Subject, +FileRef}
|
||
|
Save everything known about \arg{Subject} that matches \arg{FileRef}.
|
||
|
Using an variable for \arg{FileRef} saves all triples with
|
||
|
\arg{Subject}.
|
||
|
|
||
|
\predicate{rdf_quote_uri}{2}{+URI, -Quoted}
|
||
|
Quote a UNICODE \arg{URI}. First the Unicode is represented as UTF-8
|
||
|
and then the unsafe characters are mapped to %XX. Quotes can always
|
||
|
be represented as US-ASCII.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{Fast loading and saving} \label{sec:rdffastfile}
|
||
|
|
||
|
Loading and saving RDF format is relatively slow. For this reason we
|
||
|
designed a binary format that is more compact, avoids the complications
|
||
|
of the RDF parser and avoids repetitive lookup of (URL) identifiers.
|
||
|
Especially the speed improvement of about 25 times is worth-while when
|
||
|
loading large databases. These predicates are used for caching by
|
||
|
rdf_load/[1,2] under certain conditions.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_save_db}{1}{+File}
|
||
|
Save all known triples into \arg{File}. The saved version includes the
|
||
|
\arg{SourceRef} information.
|
||
|
|
||
|
\predicate{rdf_save_db}{1}{+File, +FileRef}
|
||
|
Save all triples with \arg{SourceRef} \arg{FileRef}, regardless of the
|
||
|
line-number. For example, using \const{user} all information added
|
||
|
using rdf_assert/3 is stored in the database.
|
||
|
|
||
|
\predicate{rdf_load_db}{1}{+File}
|
||
|
Load triples from \arg{File}.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{MD5 digests}
|
||
|
|
||
|
The \file{rdf_db} library provides for \jargon{MD5 digests}. An MD5
|
||
|
digest is a 128 bit long hash key computed from the triples based on the
|
||
|
RFC-1321 standard. MD5 keys are computed for each individual triple
|
||
|
and added together to compute the final key, resulting in a key that
|
||
|
describes the triple-set but is independant from the order in which
|
||
|
the triples appear. It is claimed that it is practically impossible
|
||
|
for two different datasets to generate the same MD5 key. The
|
||
|
Triple20 editor uses the MD5 key for detecting whether the triples
|
||
|
associated to a file have changed as well as to maintain a directory
|
||
|
with snapshots of versioned ontology files.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_md5}{2}{+Source, -MD5}
|
||
|
Return the MD5 digest for all triples in the database associated to
|
||
|
\arg{Source}. The \arg{MD5} digest itself is represented as an atom
|
||
|
holding a 32-character hexadecimal string. The library maintains the
|
||
|
digest incrementally on rdf_load/[1,2], rdf_load_db/1, rdf_assert/[3,4]
|
||
|
and rdf_retractall/[3,4]. Checking whether the digest has changed since
|
||
|
the last rdf_load/[1,2] call provides a practical means for checking
|
||
|
whether the file needs to be saved.
|
||
|
|
||
|
\predicate{rdf_atom_md5}{3}{+Text, +Times, -MD5}
|
||
|
Computes the MD5 hash from \arg{Text}, which is an atom, string or
|
||
|
list of character codes. \arg{Times} is an integer $\geq 1$. When
|
||
|
$> 0$, the MD5 algorithm is repeated \arg{Times} times on the
|
||
|
generated hash. This can be used for password encryption algorithms
|
||
|
to make generate-and-test loops slow.
|
||
|
|
||
|
This predicate bears little relation to RDF handling. It is provided
|
||
|
because the RDF library already contains the MD5 algorithm and semantic
|
||
|
web services may involve security and consistency checking. This
|
||
|
predicate provides a platform independant alternative to the
|
||
|
\pllib{crypt} library provided with the \texttt{clib} package.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Namespace Handling} \label{sec:rdfns}
|
||
|
|
||
|
Prolog code often contains references to constant resources in a known
|
||
|
XML namespace. For example,
|
||
|
\const{http://www.w3.org/2000/01/rdf-schema\#Class} refers to the most
|
||
|
general notion of a class. Readability and maintability concerns require
|
||
|
for abstraction here. The dynamic and multifile predicate rdf_db:ns/2
|
||
|
maintains a mapping between short meaningful names and namespace
|
||
|
locations very much like the XML \const{xmlns} construct. The initial
|
||
|
mapping contains the namespaces required for the semantic web languages
|
||
|
themselves:
|
||
|
|
||
|
\begin{code}
|
||
|
ns(rdf, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#').
|
||
|
ns(rdfs, 'http://www.w3.org/2000/01/rdf-schema#').
|
||
|
ns(owl, 'http://www.w3.org/2002/7/owl#').
|
||
|
ns(xsd, 'http://www.w3.org/2000/10/XMLSchema#').
|
||
|
ns(dc, 'http://purl.org/dc/elements/1.1/').
|
||
|
ns(dcterms, 'http://purl.org/dc/terms/').
|
||
|
ns(skos, 'http://www.w3.org/2004/02/skos/core#').
|
||
|
ns(eor, 'http://dublincore.org/2000/03/13/eor#').
|
||
|
|
||
|
\end{code}
|
||
|
|
||
|
All predicates for the semweb libraries use goal_expansion/2 rules to
|
||
|
make the SWI-Prolog compiler rewrite terms of the form
|
||
|
\infixterm{:}{Id}{Local} into the fully qualified URL. In addition,
|
||
|
the following predicates are supplied:
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_equal}{2}{Resource1, Resource2}
|
||
|
Defined as \infixterm{=}{Resource1}{Resource2}. As this predicate is
|
||
|
subject to goal-expansion it can be used to obtain or test global URL
|
||
|
values to readable values. The following goal unifies \arg{X} with
|
||
|
\const{http://www.w3.org/2000/01/rdf-schema\#Class} without more
|
||
|
runtime overhead than normal Prolog unification.
|
||
|
|
||
|
\begin{code}
|
||
|
rdf_equal(rdfs:'Class', X)
|
||
|
\end{code}
|
||
|
|
||
|
\predicate[nondet]{rdf_current_ns}{2}{?Alias, ?URI}
|
||
|
Query defined namespace aliases (prefixes).\footnote{Older versions
|
||
|
of this library did not export the table rdf_db:ns/2. Please use
|
||
|
this new public interface.}
|
||
|
|
||
|
\predicate{rdf_register_ns}{2}{+Alias, +URL}
|
||
|
Same as \term{rdf_register_ns}{Alias, URL, []}.
|
||
|
|
||
|
\predicate{rdf_register_ns}{2}{+Alias, +URL, +Options}
|
||
|
Register \arg{Alias} as a shorthand for \arg{URL}. Note that the
|
||
|
registration must be done before loading any files using them as
|
||
|
namespace aliases are handled at compiletime through goal_expansion/2.
|
||
|
If \arg{Alias} already exists the default is to raise a permission
|
||
|
error. If the option \term{force}{true} is provided, the alias is
|
||
|
silently modified. Rebinding an alias must be done \emph{before} any
|
||
|
code is compiled that relies on the alias. If the option
|
||
|
\term{keep}{true} is provided the new registration is silently ignored.
|
||
|
|
||
|
\predicate{rdf_global_id}{2}{?Alias:Local, ?Global}
|
||
|
Runtime translation between \arg{Alias} and \arg{Local} and a
|
||
|
\arg{Global} URL. Expansion is normally done at compiletime. This
|
||
|
predicate is often used to turn a global URL into a more readable
|
||
|
term.
|
||
|
|
||
|
\predicate{rdf_global_object}{2}{?Object, ?NameExpandedObject}
|
||
|
As rdf_global_id/2, but also expands the type field if the object
|
||
|
is of the form \term{literal}{\term{type}{Type, Value}}. This predicate
|
||
|
is used for goal expansion of the object fields in rdf/3 and similar
|
||
|
goals.
|
||
|
|
||
|
\predicate{rdf_global_term}{2}{+Term0, -Term}
|
||
|
Expands all \arg{Alias}:\arg{Local} in \arg{Term0} and return the
|
||
|
result in \arg{Term}. Use infrequently for runtime expansion of
|
||
|
namespace identifiers.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsubsection{Namespace handling for custom predicates}
|
||
|
|
||
|
If we implement a new predicate based on one of the predicates of
|
||
|
the semweb libraries that expands namespaces, namespace expansion
|
||
|
is not automatically available to it. Consider the following code
|
||
|
computing the number of distinct objects for a certain property
|
||
|
on a certain object.
|
||
|
|
||
|
\begin{code}
|
||
|
cardinality(S, P, C) :-
|
||
|
( setof(O, rdf_has(S, P, O), Os)
|
||
|
-> length(Os, C)
|
||
|
; C = 0
|
||
|
).
|
||
|
\end{code}
|
||
|
|
||
|
Now assume we want to write labels/2 that returns the number of
|
||
|
distict labels of a resource:
|
||
|
|
||
|
\begin{code}
|
||
|
labels(S, C) :-
|
||
|
cardinality(S, rdfs:label, C).
|
||
|
\end{code}
|
||
|
|
||
|
This code will \emph{not work} as \verb$rdfs:label$ is not expanded
|
||
|
at compile time. To make this work, we need to add an rdf_meta/1
|
||
|
declaration.
|
||
|
|
||
|
\begin{code}
|
||
|
:- rdf_meta
|
||
|
cardinality(r,r,-).
|
||
|
\end{code}
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_meta}{1}{:Heads}
|
||
|
This predicate defines the argument types of the named predicates,
|
||
|
which will force compile time namespace expansion for these predicates.
|
||
|
\arg{Heads} is a coma-separated list of callable terms. Defined
|
||
|
argument properties are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{:}{}
|
||
|
Argument is a goal. The goal is processed using expand_goal/2,
|
||
|
recursively applying goal transformation on the argument.
|
||
|
\termitem{+}{}
|
||
|
The argument is instantiated at entry. Nothing is changed.
|
||
|
\termitem{-}{}
|
||
|
The argument is not instantiated at entry. Nothing is changed.
|
||
|
\termitem{?}{}
|
||
|
The argument is unbound or instantiated at entry. Nothing is changed.
|
||
|
\termitem{@}{}
|
||
|
The argument is not changed.
|
||
|
\termitem{r}{}
|
||
|
The argument must be a resource. If it is a term <namespace>:<local>
|
||
|
it is translated.
|
||
|
\termitem{o}{}
|
||
|
The argument is an object or resource.
|
||
|
\termitem{t}{}
|
||
|
The argument is a term that must be translated. Expansion will translate
|
||
|
all occurences of <namespace>:<local> appearing anywhere in the term.
|
||
|
\end{description}
|
||
|
|
||
|
As it is subject to term_expansion/2, the rdf_meta/1 declaration can
|
||
|
only be used as a \emph{directive}. The directive must be processed
|
||
|
before the definition of the predicates as well as before compiling code
|
||
|
that uses the rdf meta-predicates. The atom \verb$rdf_meta$ is declared
|
||
|
as an operator exported from library \file{rdf_db.pl}. Files using
|
||
|
rdf_meta/1 \emph{must} explicitely load \file{rdf_db.pl}. The example
|
||
|
below defines the rule concept/1.
|
||
|
|
||
|
\begin{code}
|
||
|
:- use_module(library(semweb/rdf_db)). % for rdf_meta
|
||
|
:- use_module(library(semweb/rdfs)). % for rdfs_individual_of
|
||
|
|
||
|
:- rdf_meta
|
||
|
concept(r).
|
||
|
|
||
|
%% concept(?C) is nondet.
|
||
|
%
|
||
|
% True if C is a concept.
|
||
|
|
||
|
concept(C) :-
|
||
|
rdfs_individual_of(C, skos:'Concept').
|
||
|
\end{code}
|
||
|
\end{description}
|
||
|
|
||
|
In addition to expanding \emph{calls}, rdf_meta/1 also causes expansion
|
||
|
of clause-heads for predicates that match a declaration. This is
|
||
|
typically used write Prolog statements about resources. The following
|
||
|
example produces three clauses with expanded (single-atom) arguments:
|
||
|
|
||
|
\begin{code}
|
||
|
:- use_module(library(semweb/rdf_db)).
|
||
|
|
||
|
:- rdf_meta
|
||
|
label_predicate(r).
|
||
|
|
||
|
label_predicate(rdfs:label).
|
||
|
label_predicate(skos:prefLabel).
|
||
|
label_predicate(skos:altLabel).
|
||
|
\end{code}
|
||
|
|
||
|
|
||
|
\subsection{Monitoring the database} \label{sec:rdfmonitor}
|
||
|
|
||
|
Considering performance and modularity, we are working on a replacement
|
||
|
of the \file{rdf_edit} (see \secref{rdfedit}) layered design to deal
|
||
|
with updates, journalling, transactions, etc. Where the rdf_edit
|
||
|
approach creates a single layer on top of rdf_db and code using the
|
||
|
RDF database must select whether to use rdf_db.pl or rdf_edit.pl, the
|
||
|
new approach allows to register \jargon{monitors}. This allows multiple
|
||
|
modules to provide additional services, while these services will be
|
||
|
used regardless of how the database is modified.
|
||
|
|
||
|
Monitors are used by the persistency library (\secref{persistency})
|
||
|
and the literal indexing library (\secref{rdflitindex}).
|
||
|
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_monitor}{2}{:Goal, +Mask}
|
||
|
\arg{Goal} is called for modifications of the database. It is called
|
||
|
with a single argument that describes the modification. Defined
|
||
|
events are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{assert}{+S, +P, +O, +DB}
|
||
|
A triple has been asserted.
|
||
|
\termitem{retract}{+S, +P, +O, +DB}
|
||
|
A triple has been deleted.
|
||
|
\termitem{update}{+S, +P, +O, +DB, +Action}
|
||
|
A triple has been updated.
|
||
|
\termitem{new_literal}{+Literal}
|
||
|
A new literal has been created. \arg{Literal} is the argument of
|
||
|
\term{literal}{Arg} of the triple's object. This event is introduced
|
||
|
in version 2.5.0 of this library.
|
||
|
\termitem{old_literal}{+Literal}
|
||
|
The literal \arg{Literal} is no longer used by any triple.
|
||
|
\termitem{transaction}{+BeginOrEnd, +Id}
|
||
|
Mark begin or end of the \emph{commit} of a transaction started by
|
||
|
rdf_transaction/2. \arg{BeginOrEnd} is \term{begin}{Nesting} or
|
||
|
\term{end}{Nesting}. \arg{Nesting} expresses the nesting level of
|
||
|
transactions, starting at `0' for a toplevel transaction. \arg{Id} is
|
||
|
the second argument of rdf_transaction/2. The following transaction Ids
|
||
|
are pre-defined by the library:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{parse}{Id}
|
||
|
A file is loaded using rdf_load/2. \arg{Id} is one of \term{file}{Path}
|
||
|
or \term{stream}{Stream}.
|
||
|
\termitem{unload}{DB}
|
||
|
All triples with source \arg{DB} are being unloaded using rdf_unload/1.
|
||
|
\termitem{reset}{}
|
||
|
Issued by rdf_reset_db/0.
|
||
|
\end{description}
|
||
|
|
||
|
\termitem{load}{+BeginOrEnd, +Spec}
|
||
|
Mark begin or end of rdf_load_db/1 or load through rdf_load/2 from
|
||
|
a cached file. \arg{Spec} is currently defined as \term{file}{Path}.
|
||
|
\termitem{rehash}{+BeginOrEnd}
|
||
|
Marks begin/end of a re-hash due to required re-indexing or garbage
|
||
|
collection.
|
||
|
\end{description}
|
||
|
|
||
|
\arg{Mask} is a list of events this monitor is interested in. Default
|
||
|
(empty list) is to report all events. Otherwise each element is of the
|
||
|
form +Event or -Event to include or exclude monitoring for certain
|
||
|
events. The event-names are the functor names of the events described
|
||
|
above. The special name \const{all} refers to all events and
|
||
|
\term{assert}{load} to assert events originating from rdf_load_db/1. As
|
||
|
loading triples using rdf_load_db/1 is very fast, monitoring this at the
|
||
|
triple level may seriously harm performance.
|
||
|
|
||
|
This predicate is intended to maintain derived data, such as a journal,
|
||
|
information for \emph{undo}, additional indexing in literals, etc. There
|
||
|
is no way to remove registered monitors. If this is required one should
|
||
|
register a monitor that maintains a dynamic list of subscribers like the
|
||
|
XPCE broadcast library. A second subscription of the same hook predicate
|
||
|
only re-assignes the mask.
|
||
|
|
||
|
The monitor hooks are called in the order of registration and in the
|
||
|
same thread that issued the database manipulation. To process all
|
||
|
changes in one thread they should be send to a thread message queue. For
|
||
|
all updating events, the monitor is called while the calling thread has
|
||
|
a write lock on the RDF store. This implies that these events are
|
||
|
processed strickly synchronous, even if modifications originate from
|
||
|
multiple threads. In particular, the \const{transaction} \emph{begin},
|
||
|
\ldots{} \emph{updates} \ldots{} \emph{end} sequence is never
|
||
|
interleaved with other events. Same for \const{load} and \const{parse}.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Miscellaneous predicates}
|
||
|
|
||
|
This section describes the remaining predicates of the \file{rdf_db}
|
||
|
module.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_node}{1}{-Id}
|
||
|
Generate a unique reference. The returned atom is guaranteed not to
|
||
|
occur in the current database in any field of any triple.
|
||
|
|
||
|
\predicate{rdf_bnode}{1}{-Id}
|
||
|
Generate a unique blank node reference. The returned atom is guaranteed
|
||
|
not to occur in the current database in any field of any triple and
|
||
|
starts with '__bnode'.
|
||
|
|
||
|
\predicate{rdf_is_bnode}{1}{+Id}
|
||
|
Succeeds if \arg{Id} is a blank node identifier (also called
|
||
|
\jargon{anonymous resource}). In the current implementation this
|
||
|
implies it is an atom starting with a double underscore.
|
||
|
|
||
|
\predicate{rdf_is_resource}{1}{+Id}
|
||
|
Succeeds if \arg{Id} is a resource. Note that this resource need
|
||
|
not to appear in any triple.
|
||
|
|
||
|
\predicate{rdf_is_literal}{1}{+Id}
|
||
|
Succeeds if \arg{Id} is an RDF literal term. Note that this
|
||
|
literal need not to appear in any triple.
|
||
|
|
||
|
\predicate{rdf_source_location}{2}{+Subject, -SourceRef}
|
||
|
Return the source-location as \arg{File}:\arg{Line} of the first triple
|
||
|
that is about \arg{Subject}.
|
||
|
|
||
|
\predicate{rdf_generation}{1}{-Generation}
|
||
|
Returns the \arg{Generation} of the database. Each modification to the
|
||
|
database increments the generation. It can be used to check the validity
|
||
|
of cached results deduced from the database. Modifications changing
|
||
|
multiple triples increment \arg{Generation} with the number of triples
|
||
|
modified, providing a heuristic for `how dirty' cached results may be.
|
||
|
|
||
|
\predicate{rdf_estimate_complexity}{4}{?Subject, ?Predicate, ?Object,
|
||
|
-Complexity}
|
||
|
Return the number of alternatives as indicated by the database
|
||
|
internal hashed indexing. This is a rough measure for the number
|
||
|
of alternatives we can expect for an rdf_has/3 call using the
|
||
|
given three arguments. When called with three variables, the total
|
||
|
number of triples is returned. This estimate is used in query
|
||
|
optimisation. See also rdf_predicate_property/2 and rdf_statistics/1 for
|
||
|
additional information to help optimisers.
|
||
|
|
||
|
\predicate{rdf_statistics}{1}{?Statistics}
|
||
|
Report statistics collected by the \file{rdf_db} module. Defined
|
||
|
values for \arg{Statistics} are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{lookup}{?Index, -Count}
|
||
|
Number of lookups using a pattern of instantiated fields. \arg{Index}
|
||
|
is a term \term{rdf}{S,P,O}, where \arg{S}, \arg{P} and \arg{O} are
|
||
|
either \const{+} or \const{-}. For example \term{rdf}{+,+,-} returns
|
||
|
the lookups with subject and predicate specified and object unbound.
|
||
|
|
||
|
\termitem{properties}{-Count}
|
||
|
Number of unique values for the second field of the triple set.
|
||
|
|
||
|
\termitem{sources}{-Count}
|
||
|
Number of files loaded through rdf_load/1.
|
||
|
|
||
|
\termitem{subjects}{-Count}
|
||
|
Number of unique values for the first field of the triple set.
|
||
|
|
||
|
\termitem{literals}{-Count}
|
||
|
Total number of unique literal values in the database. See also
|
||
|
\secref{litindex}.
|
||
|
|
||
|
\termitem{triples}{-Count}
|
||
|
Total number of triples in the database.
|
||
|
|
||
|
\termitem{triples_by_file}{?File, -Count}
|
||
|
Enumerate the number of triples associated to each file.
|
||
|
|
||
|
\termitem{searched_nodes}{-Count}
|
||
|
Number of nodes explored in rdf_reachable/3.
|
||
|
|
||
|
\termitem{gc}{-Count, -Time}
|
||
|
Number of garbage collections and time spent in seconds represented as
|
||
|
a float.
|
||
|
|
||
|
\termitem{rehash}{-Count, -Time}
|
||
|
Number of times the hash-tables were enlarged and time spent in seconds
|
||
|
represented as a float.
|
||
|
|
||
|
\termitem{core}{-Bytes}
|
||
|
Core used by the triple store. This includes all memory allocated on
|
||
|
behalf of the library, but \emph{not} the memory allocated in
|
||
|
Prolog atoms referenced (only) by the triple store.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_match_label}{3}{+Method, +Search, +Atom}
|
||
|
True if \arg{Search} matches \arg{Atom} as defined by \arg{Method}.
|
||
|
All matching is performed case-insensitive. Defines methods are:
|
||
|
\begin{description}
|
||
|
\termitem{exact}{}
|
||
|
Perform exact, but case-insensitive match.
|
||
|
\termitem{substring}{}
|
||
|
\arg{Search} is a sub-string of \arg{Text}.
|
||
|
\termitem{word}{}
|
||
|
\arg{Search} appears as a whole-word in \arg{Text}.
|
||
|
\termitem{prefix}{}
|
||
|
\arg{Text} start with \arg{Search}.
|
||
|
\termitem{like}{}
|
||
|
\arg{Text} matches \arg{Search}, case insensitively, where
|
||
|
the `*' character in \arg{Search} matches zero or more
|
||
|
characters.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{lang_matches}{2}{+Lang, +Pattern}
|
||
|
True if \arg{Lang} matches \arg{Pattern}. This implements XML language
|
||
|
matching conform RFC 4647. Both \arg{Lang} and \arg{Pattern} are
|
||
|
dash-separated strings of identifiers or (for \arg{Pattern}) the
|
||
|
wildcart \texttt{*}. Identifiers are matched case-insensitive and a
|
||
|
\texttt{*} matches any number of identifiers. A short pattern is the
|
||
|
same as \texttt{*}.
|
||
|
|
||
|
\predicate{rdf_reset_db}{0}{}
|
||
|
Erase all triples from the database and reset all counts and statistics
|
||
|
information.
|
||
|
|
||
|
\predicate{rdf_version}{1}{-Version}
|
||
|
Unify \arg{Version} with the library version number. This number is,
|
||
|
like to the SWI-Prolog version flag, defined as $10,000 \times
|
||
|
Major + 100 \times Minor + Patch$.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Issues with rdf_db} \label{sec:rdfissues}
|
||
|
|
||
|
This RDF low-level module has been created after two year experimenting
|
||
|
with a plain Prolog based module and a brief evaluation of a second
|
||
|
generation pure Prolog implementation. The aim was to be able to handle
|
||
|
upto about 5 million triples on standard (notebook) hardware and deal
|
||
|
efficiently with \const{subPropertyOf} which was identified as a crucial
|
||
|
feature of RDFS to realise fusion of different data-sets.
|
||
|
|
||
|
The following issues are identified and not solved in suitable manner.
|
||
|
|
||
|
\begin{description}
|
||
|
\item [\const{subPropertyOf} of \const{subPropertyOf}] is not
|
||
|
supported.
|
||
|
|
||
|
\item [Equivalence]
|
||
|
Similar to \const{subPropertyOf}, it is likely to be profitable to
|
||
|
handle resource identity efficient. The current system has no support
|
||
|
for it.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% PLUGIN %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Plugin modules for rdf_db}
|
||
|
\label{sec:plugin}
|
||
|
|
||
|
The \pllib{rdf_db} module provides several hooks for extending its
|
||
|
functionality. Database updates can be monitored and acted upon through
|
||
|
the features described in \secref{rdfmonitor}. The predicate rdf_load/2
|
||
|
can be hooked to deal with different formats such as \jargon{rdfturtle},
|
||
|
different input sources (e.g.\ http) and different strategies for
|
||
|
caching results.
|
||
|
|
||
|
\subsection{Hooks into the RDF library}
|
||
|
\label{sec:hooks}
|
||
|
|
||
|
The hooks below are used to add new RDF file formats and sources from
|
||
|
which to load data to the library. They are used by the modules
|
||
|
described below and distributed with the package. Please examine the
|
||
|
source-code if you want to add new formats or locations.
|
||
|
|
||
|
\begin{description}
|
||
|
\item[\file{rdf_turtle.pl}]
|
||
|
Load files in the Turtle format. See \secref{rdfturtle}.
|
||
|
\item[\file{rdf_zlib_plugin.pl}]
|
||
|
Load \program{gzip} compressed files transparently. See \secref{zlib}.
|
||
|
\item[\file{rdf_http_plugin.pl}]
|
||
|
Load RDF documents from HTTP servers. See \secref{http}.
|
||
|
\end{description}
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_db:rdf_open_hook}{3}{+Input, -Stream, -Format}
|
||
|
Open an input. \arg{Input} is one of \term{file}{+Name},
|
||
|
\term{stream}{+Stream} or \term{url}{Protocol, URL}. If this hook
|
||
|
succeeds, the RDF will be read from Stream using rdf_load_stream/3.
|
||
|
Otherwise the default open functionality for file and stream are
|
||
|
used.
|
||
|
|
||
|
\predicate{rdf_db:rdf_load_stream}{3}{+Format, +Stream, +Options}
|
||
|
Actually load the RDF from \arg{Stream} into the RDF database.
|
||
|
\arg{Format} describes the format and is produced either by
|
||
|
rdf_input_info/3 or rdf_file_type/2.
|
||
|
|
||
|
\predicate{rdf_db:rdf_input_info}{3}{+Input, -Modified, -Format}
|
||
|
Gather information on \arg{Input}. \arg{Modified} is the last
|
||
|
modification time of the source as a POSIX time-stamp (see time_file/2).
|
||
|
\arg{Format} is the RDF format of the file. See rdf_file_type/2 for
|
||
|
details. It is allowed to leave the output variables unbound. Ultimately
|
||
|
the default modified time is `0' and the format is assumed to be
|
||
|
\const{xml}.
|
||
|
|
||
|
\predicate{rdf_db:rdf_file_type}{2}{?Extension, ?Format}
|
||
|
True if \arg{Format} is the default RDF file format for files
|
||
|
with the given extension. \arg{Extension} is lowercase and
|
||
|
without a '.'. E.g.\ \const{owl}. \arg{Format} is either a
|
||
|
built-in format (\const{xml} or \const{triples}) or a format
|
||
|
understood by the rdf_load_stream/3 hook.
|
||
|
|
||
|
\predicate{rdf_db:url_protocol}{1}{?Protocol}
|
||
|
True if \arg{Protocol} is a URL protocol recognised by rdf_load/2.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Library semweb/rdf_zlib_plugin}
|
||
|
\label{sec:zlib}
|
||
|
|
||
|
\index{gz, format}\index{gzip}\index{compressed data}%
|
||
|
This module uses the \pllib{zlib} library to load compressed files
|
||
|
on the fly. The extension of the file must be \fileext{gz}. The
|
||
|
file format is deduced by the extension after stripping the \fileext{gz}
|
||
|
extension. E.g.\ \exam{rdf_load('file.rdf.gz')}.
|
||
|
|
||
|
|
||
|
\subsection{Library semweb/rdf_http_plugin}
|
||
|
\label{sec:http}
|
||
|
|
||
|
\index{xhtml}%
|
||
|
This module allows for \exam{rdf_load('http://...')}. It exploits the
|
||
|
library \pllib{http/http_open.pl}. The format of the URL is determined
|
||
|
from the mime-type returned by the server if this is one of
|
||
|
\const{text/rdf+xml}, \const{application/x-turtle} or
|
||
|
\const{application/turtle}. As RDF mime-types are not yet widely
|
||
|
supported, the plugin uses the extension of the URL if the claimed
|
||
|
mime-type is not one of the above. In addition, it recognises
|
||
|
\const{text/html} and \const{application/xhtml+xml}, scanning
|
||
|
the XML content for embedded RDF.
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% LITINDEX %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\subsection{Library semweb/rdf_litindex: Indexing words in literals}
|
||
|
\label{sec:rdflitindex}
|
||
|
|
||
|
The library \pllib{semweb/rdf_litindex.pl} exploits the primitives
|
||
|
of \secref{rdflitmap} and the NLP package to provide indexing on words
|
||
|
inside literal constants. It also allows for fuzzy matching using
|
||
|
stemming and `sounds-like' based on the \jargon{double metaphone}
|
||
|
algorithm of the NLP package.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_find_literals}{2}{+Spec, -ListOfLiterals}
|
||
|
Find literals (without type or language specification) that satisfy
|
||
|
\arg{Spec}. The required indices are created as needed and kept
|
||
|
up-to-date using hooks registered with rdf_monitor/2. Numerical
|
||
|
indexing is currently limited to integers in the range $\pm 2^30$
|
||
|
($\pm 2^62 on 64-bit platforms$). \arg{Spec} is defined as:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{and}{Spec1, Spec2}
|
||
|
Intersection of both specifications.
|
||
|
|
||
|
\termitem{or}{Spec1, Spec2}
|
||
|
Union of both specifications.
|
||
|
|
||
|
\termitem{not}{Spec}
|
||
|
Negation of \arg{Spec}. After translation of the full specification to
|
||
|
\jargon{Disjunctive Normal Form} (DNF), negations are only allowed
|
||
|
inside a conjunction with at least one positive literal.
|
||
|
|
||
|
\termitem{case}{Word}
|
||
|
Matches all literals containing the word \arg{Word}, doing the match
|
||
|
case insensitive and after removing diacritics.
|
||
|
|
||
|
\termitem{stem}{Like}
|
||
|
Matches all literals containing at least one word that has the same stem
|
||
|
as \arg{Like} using the Porter stem algorithm. See NLP package for
|
||
|
details.
|
||
|
|
||
|
\termitem{sounds}{Like}
|
||
|
Matches all literals containing at least one word that `sounds like'
|
||
|
\arg{Like} using the double metaphone algorithm. See NLP package for
|
||
|
details.
|
||
|
|
||
|
\termitem{prefix}{Prefix}
|
||
|
Matches all literals containing at least one word that starts with
|
||
|
Prefix, discarding diacritics and case.
|
||
|
|
||
|
\termitem{between}{Low, High}
|
||
|
Matches all literals containing an integer token in the range
|
||
|
\arg{Low}..\arg{High}, including the boundaries.
|
||
|
|
||
|
\termitem{ge}{Low}
|
||
|
Matches all literals containing an integer token with value
|
||
|
\arg{Low} or higher.
|
||
|
|
||
|
\termitem{le}{High}
|
||
|
Matches all literals containing an integer token with value
|
||
|
\arg{High} or lower.
|
||
|
|
||
|
\termitem{Token}{}
|
||
|
Matches all literals containing the given token. See tokenize_atom/2
|
||
|
of the NLP package for details.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_token_expansions}{2}{+Spec, -Expansions}
|
||
|
Uses the same database as rdf_find_literals/2 to find possible
|
||
|
expansions of \arg{Spec}, i.e.\ which words `sound like', `have prefix',
|
||
|
etc. \arg{Spec} is a compound expression as in rdf_find_literals/2.
|
||
|
\arg{Expansions} is unified to a list of terms \term{sounds}{Like,
|
||
|
Words}, \term{stem}{Like, Words} or \term{prefix}{Prefix, Words}. On
|
||
|
compound expressions, only combinations that provide literals are
|
||
|
returned. Below is an example after loading the ULAN%
|
||
|
\footnote{Unified List of Artist Names from the Getty
|
||
|
Foundation.}
|
||
|
database and showing all words that sounds like `rembrandt' and
|
||
|
appear together in a literal with the word `Rijn'. Finding this
|
||
|
result from the 228,710 literals contained in ULAN requires 0.54
|
||
|
milliseconds (AMD 1600+).
|
||
|
|
||
|
\begin{code}
|
||
|
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L).
|
||
|
|
||
|
L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt',
|
||
|
'Rembrand', 'Rembrandt', 'Rembrandtsz',
|
||
|
'Rembrant', 'Rembrants', 'Rijmbrand'])]
|
||
|
\end{code}
|
||
|
|
||
|
Here is another example, illustrating handling of diacritics:
|
||
|
|
||
|
\begin{quote}\begin{alltt}
|
||
|
?- rdf_token_expansions(case(cafe), L).
|
||
|
|
||
|
L = [case(cafe, [cafe, caf\'e])]
|
||
|
\end{alltt}\end{quote}
|
||
|
|
||
|
\predicate{rdf_tokenize_literal}{2}{+Literal, -Tokens}
|
||
|
Tokenize a literal, returning a list of atoms and integers in the range
|
||
|
$-1073741824 \ldots 1073741823$. As tokenization is in general domain
|
||
|
and task-dependent this predicate first calls the hook
|
||
|
\term{rdf_litindex:tokenization}{Literal, -Tokens}. On failure it
|
||
|
calls tokenize_atom/2 from the NLP package and deletes the following:
|
||
|
atoms of length 1, floats, integers that are out of range and the
|
||
|
english words \const{and}, \const{an}, \const{or}, \const{of},
|
||
|
\const{on}, \const{in}, \const{this} and \const{the}. Deletion first
|
||
|
calls the hook \term{rdf_litindex:exclude_from_index}{token, X}. This
|
||
|
hook is called as follows:
|
||
|
|
||
|
\begin{code}
|
||
|
no_index_token(X) :-
|
||
|
exclude_from_index(token, X), !.
|
||
|
no_index_token(X) :-
|
||
|
...
|
||
|
\end{code}
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{Literal maps: Creating additional indices on literals}
|
||
|
\label{sec:rdflitmap}
|
||
|
|
||
|
`Literal maps' provide a relation between literal values, intended to
|
||
|
create additional indexes on literals. The current implementation can
|
||
|
only deal with integers and atoms (string literals). A literal map
|
||
|
maintains an ordered set of \jargon{keys}. The ordering uses the same
|
||
|
rules as described in \secref{litindex}. Each key is associated with an
|
||
|
ordered set of \jargon{values}. Literal map objects can be shared
|
||
|
between threads, using a locking strategy that allows for multiple
|
||
|
concurrent readers.
|
||
|
|
||
|
Typically, this module is used together with rdf_monitor/2 on the
|
||
|
channals \const{new_literal} and \const{old_literal} to maintain an
|
||
|
index of words that appear in a literal. Further abstraction using
|
||
|
Porter stemming or Metaphone can be used to create additional search
|
||
|
indices. These can map either directly to the literal values, or
|
||
|
indirectly to the plain word-map. The SWI-Prolog NLP package provides
|
||
|
complimentary building blocks, such as a tokenizer, Porter stem and
|
||
|
Double Metaphone.
|
||
|
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_new_literal_map}{1}{-Map}
|
||
|
Create a new literal map, returning an opaque handle.
|
||
|
|
||
|
\predicate{rdf_destroy_literal_map}{1}{+Map}
|
||
|
Destroy a literal map. After this call, further use of the \arg{Map}
|
||
|
handle is illegal. Additional synchronisation is needed if maps that
|
||
|
are shared between threads are destroyed to guarantee the handle is
|
||
|
no longer used. In some scenarios rdf_reset_literal_map/1
|
||
|
provides a safe alternative.
|
||
|
|
||
|
\predicate{rdf_reset_literal_map}{1}{+Map}
|
||
|
Delete all content from the literal map.
|
||
|
|
||
|
\predicate{rdf_insert_literal_map}{3}{+Map, +Key, +Value}
|
||
|
Add a relation between \arg{Key} and \arg{Value} to the map. If
|
||
|
this relation already exists no action is performed.
|
||
|
|
||
|
\predicate{rdf_insert_literal_map}{4}{+Map, +Key, +Value, -KeyCount}
|
||
|
As rdf_insert_literal_map/3. In addition, if \arg{Key} is a new key in
|
||
|
\arg{Map}, unify \arg{KeyCount} with the number of keys in \arg{Map}.
|
||
|
This serves two purposes. Derived maps, such as the stem and metaphone
|
||
|
maps need to know about new keys and it avoids additional foreign calls
|
||
|
for doing the progress in \file{rdf_litindex.pl}.
|
||
|
|
||
|
\predicate{rdf_delete_literal_map}{2}{+Map, +Key}
|
||
|
Delete \arg{Key} and all associated values from the map. Succeeds
|
||
|
always.
|
||
|
|
||
|
\predicate{rdf_delete_literal_map}{2}{+Map, +Key, +Value}
|
||
|
Delete the association between \arg{Key} and \arg{Value} from the map.
|
||
|
Succeeds always.
|
||
|
|
||
|
\predicate[det]{rdf_find_literal_map}{3}{+Map, +KeyList, -ValueList}
|
||
|
Unify \arg{ValueList} with an ordered set of values associated to
|
||
|
all keys from \arg{KeyList}. Each key in \arg{KeyList} is either an
|
||
|
atom, an integer or a term \term{not}{Key}. If not-terms are provided,
|
||
|
there must be at least one positive keywords. The negations are tested
|
||
|
after establishing the positive matches.
|
||
|
|
||
|
\predicate{rdf_keys_in_literal_map}{3}{+Map, +Spec, -Answer}
|
||
|
Realises various queries on the key-set:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{all}{}
|
||
|
Unify \arg{Answer} with an ordered list of all keys.
|
||
|
|
||
|
\termitem{key}{+Key}
|
||
|
Succeeds if \arg{Key} is a key in the map and unify \arg{Answer}
|
||
|
with the number of values associated with the key. This provides
|
||
|
a fast test of existence without fetching the possibly large associated
|
||
|
value set as with rdf_find_literal_map/3.
|
||
|
|
||
|
\termitem{prefix}{+Prefix}
|
||
|
Unify \arg{Answer} with an ordered set of all keys that have the
|
||
|
given prefix. See \secref{rdfquery} for details on prefix matching.
|
||
|
\arg{Prefix} must be an atom. This call is intended for auto-completion
|
||
|
in user interfaces.
|
||
|
|
||
|
\termitem{ge}{+Min}
|
||
|
Unify \arg{Answer} with all keys that are larger or equal to the
|
||
|
integer \arg{Min}.
|
||
|
|
||
|
\termitem{le}{+Max}
|
||
|
Unify \arg{Answer} with all keys that are smaller or equal to the
|
||
|
integer \arg{Max}.
|
||
|
|
||
|
|
||
|
\termitem{between}{+Min, +Max}
|
||
|
Unify \arg{Answer} with all keys between \arg{Min} and \arg{Max}
|
||
|
(including).
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_statistics_literal_map}{2}{+Map, +Key(-Arg...)}
|
||
|
Query some statistics of the map. Provides keys are:
|
||
|
\begin{description}
|
||
|
\termitem{size}{-Keys, -Relations}
|
||
|
Unify \arg{Keys} with the total key-count of the index and
|
||
|
\arg{Relation} with the total \arg{Key}-\arg{Value} count.
|
||
|
\end{description}
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% PERSISTENCY %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\subsection{Library semweb/rdf_persistency}
|
||
|
\label{sec:persistency}
|
||
|
|
||
|
\index{Persistent store}%
|
||
|
The \pllib{semweb/rdf_persistency} provides reliable persistent storage
|
||
|
for the RDF data. The store uses a directory with files for each source
|
||
|
(see rdf_source/1) present in the database. Each source is represented
|
||
|
by two files, one in binary format (see rdf_save_db/2) representing the
|
||
|
base state and one represented as Prolog terms representing the changes
|
||
|
made since the base state. The latter is called the \jargon{journal}.
|
||
|
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_attach_db}{2}{+Directory, +Options}
|
||
|
Attach \arg{Directory} as the persistent database. If \arg{Directory}
|
||
|
does not exist it is created. Otherwise all sources defined in the
|
||
|
directory are loaded into the RDF database. Loading a source means
|
||
|
loading the base state (if any) and replaying the journal (if any). The
|
||
|
current implementation does not synchronise triples that are in the
|
||
|
store before attaching a database. They are not removed from the
|
||
|
database, nor added to the presistent store. Different merging options
|
||
|
may be supported through the \arg{Options} argument later. Currently
|
||
|
defined options are:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{concurrency}{+PosInt}
|
||
|
Number of threads used to reload databased and journals from the
|
||
|
files in \arg{Directory}. Default is the number of physical CPUs
|
||
|
determined by the Prolog flag \const{cpu_count} or 1 (one) on
|
||
|
systems where this number is unknown. See also concurrent/3.
|
||
|
|
||
|
\termitem{max_open_journals}{+PosInt}
|
||
|
The library maintains a pool of open journal files. This option
|
||
|
specifies the size of this pool. The default is 10. Raising the
|
||
|
option can make sense if many writes occur on many different named
|
||
|
graphs. The value can be lowered for scenarios where write operations
|
||
|
are very infrequent.
|
||
|
|
||
|
\termitem{silent}{Boolean}
|
||
|
If \const{true}, supress loading messages from rdf_attach_db/2.
|
||
|
|
||
|
\termitem{log_nested_transactions}{Boolean}
|
||
|
If \const{true}, nested \emph{log} transactions are added to the journal
|
||
|
information. By default (\const{false}), no log-term is added for nested
|
||
|
transactions.
|
||
|
\end{description}
|
||
|
|
||
|
The database is locked against concurrent access using a file
|
||
|
\file{lock} in \arg{Directory}. An attempt to attach to a locked
|
||
|
database raises a \const{permission_error} exception. The error
|
||
|
context contains a term \term{rdf_locked}{Args}, where args is
|
||
|
a list containing \term{time}{Stamp} and \term{pid}{PID}. The
|
||
|
error can be caught by the application. Otherwise it prints:
|
||
|
|
||
|
\begin{code}
|
||
|
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB'
|
||
|
ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
|
||
|
\end{code}
|
||
|
|
||
|
\predicate{rdf_detach_db}{0}{}
|
||
|
Detaches the persistent store. No triples are removed from the RDF
|
||
|
triple store.
|
||
|
|
||
|
\predicate{rdf_current_db}{1}{-Directory}
|
||
|
Unify \arg{Directory} with the current database directory. Fails if no
|
||
|
persistent database is attached.
|
||
|
|
||
|
\predicate{rdf_persistency}{2}{+DB, +Bool}
|
||
|
Change presistency of named database (4th argument of rdf/4). By default
|
||
|
all databases are presistent. Using \const{false}, the journal and
|
||
|
snapshot for the database are deleted and further changes to triples
|
||
|
associated with \arg{DB} are not recorded. If \arg{Bool} is \const{true}
|
||
|
a snapshot is created for the current state and further modifications
|
||
|
are monitored. Switching persistency does not affect the triples in the
|
||
|
in-memory RDF database.
|
||
|
|
||
|
\predicate{rdf_flush_journals}{1}{+Options}
|
||
|
Flush dirty journals. With the option \term{min_size}{KB} only journals
|
||
|
larger than \arg{KB} Kbytes are merged with the base state. Flushing a
|
||
|
journal takes the following steps, ensuring a stable state can be
|
||
|
recovered at any moment.
|
||
|
\begin{enumerate}
|
||
|
\item Save the current database in a new file using the
|
||
|
extension \fileext{new}.
|
||
|
\item On success, delete the journal
|
||
|
\item On success, atomically move the \fileext{new} file
|
||
|
over the base state.
|
||
|
\end{enumerate}
|
||
|
|
||
|
Note that journals are \emph{not} merged automatically for two reasons.
|
||
|
First of all, some applications may decide never to merge as the journal
|
||
|
contains a complete \jargon{changelog} of the database. Second, merging
|
||
|
large databases can be slow and the application may wish to schedule
|
||
|
such actions at quiet times or scheduled maintenance periods.
|
||
|
\end{description}
|
||
|
|
||
|
\subsubsection{Enriching the journals}
|
||
|
\label{sec:enrich}
|
||
|
|
||
|
The above predicates suffice for most applications. The predicates in
|
||
|
this section provide access to the journal files and the base state
|
||
|
files and are intented to provide additional services, such as reasoning
|
||
|
about the journals, loaded files, etc.%
|
||
|
\footnote{A library \pllib{rdf_history} is under development
|
||
|
exploiting these features supporting wiki style editing
|
||
|
of RDF.}
|
||
|
|
||
|
Using \term{rdf_transaction}{Goal, log(Message)}, we can add additional
|
||
|
records to enrich the journal of affected databases with \arg{Term} and
|
||
|
some additional bookkeeping information. Such a transaction adds a term
|
||
|
\term{begin}{Id, Nest, Time, Message} before the change operations on
|
||
|
each affected database and \term{end}{Id, Nest, Affected} after the
|
||
|
change operations. Here is an example call and content of the journal
|
||
|
file \file{mydb.jrn}. A full explanation of the terms that appear in
|
||
|
the journal is in the description of rdf_journal_file/2.
|
||
|
|
||
|
\begin{code}
|
||
|
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
|
||
|
\end{code}
|
||
|
|
||
|
\begin{code}
|
||
|
start([time(1183540570)]).
|
||
|
begin(1, 0, 1183540570.36, by(jan)).
|
||
|
assert(s, p, o).
|
||
|
end(1, 0, []).
|
||
|
end([time(1183540578)]).
|
||
|
\end{code}
|
||
|
|
||
|
Using \term{rdf_transaction}{Goal, log(Message, DB)}, where \arg{DB} is
|
||
|
an atom denoting a (possibly empty) named graph, the system guarantees
|
||
|
that a non-empty transaction will leave a possibly empty transaction
|
||
|
record in DB. This feature assumes named graphs are named after the user
|
||
|
making the changes. If a user action does not affect the user's graph,
|
||
|
such as deleting a triple from another graph, we still find record of
|
||
|
all actions performed by some user in the journal of that user.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdf_journal_file}{2}{?DB, ?JournalFile} True if
|
||
|
\arg{File} is the absolute file name of an existing named graph
|
||
|
\arg{DB}. A journal file contains a sequence of Prolog terms of the
|
||
|
following format.%
|
||
|
\footnote{Future versions of this library may use an XML
|
||
|
based language neutral format.}
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{start}{Attributes}
|
||
|
Journal has been opened. Currently \arg{Attributes} contains a
|
||
|
term \term{time}{Stamp}.
|
||
|
|
||
|
\termitem{end}{Attributes}
|
||
|
Journal was closed. Currently \arg{Attributes} contains a
|
||
|
term \term{time}{Stamp}.
|
||
|
|
||
|
\termitem{assert}{Subject, Predicate, Object}
|
||
|
A triple \{Subject, Predicate, Object\} was added to the database.
|
||
|
|
||
|
\termitem{assert}{Subject, Predicate, Object, Line}
|
||
|
A triple \{Subject, Predicate, Object\} was added to the database
|
||
|
with given \arg{Line} context.
|
||
|
|
||
|
\termitem{retract}{Subject, Predicate, Object}
|
||
|
A triple \{Subject, Predicate, Object\} was deleted from the database.
|
||
|
Note that an rdf_retractall/3 call can retract multiple triples. Each
|
||
|
of them have a record in the journal. This allows for `undo'.
|
||
|
|
||
|
\termitem{retract}{Subject, Predicate, Object, Line}
|
||
|
Same as above, for a triple with associated line info.
|
||
|
|
||
|
\termitem{update}{Subject, Predicate, Object, Action}
|
||
|
See rdf_update/4.
|
||
|
|
||
|
\termitem{begin}{Id, Nest, Time, Message}
|
||
|
Added before the changes in each database affected by a transaction
|
||
|
with transaction identifier \term{log}{Message}. \arg{Id} is an
|
||
|
integer counting the logged transactions to this database. Numbers
|
||
|
are increasing and designed for binary search within the journal file.
|
||
|
\arg{Nest} is the nesting level, where `0' is a toplevel transaction.
|
||
|
\arg{Time} is a time-stamp, currently using float notation with two
|
||
|
fractional digits. \arg{Message} is the term provided by the user
|
||
|
as argument of the \term{log}{Message} transaction.
|
||
|
|
||
|
\termitem{end}{Id, Nest, Others}
|
||
|
Added after the changes in each database affected by a transaction with
|
||
|
transaction identifier \term{log}{Message}. \arg{Id} and \arg{Nest}
|
||
|
match the begin-term. \arg{Others} gives a list of other databases
|
||
|
affected by this transaction and the \arg{Id} of these records. The
|
||
|
terms in this list have the format \arg{DB}:\arg{Id}.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdf_db_to_file}{2}{?DB, ?FileBase}
|
||
|
Convert between \arg{DB} (see rdf_source/1) and file base-file used
|
||
|
for storing information on this database. The full file is located
|
||
|
in the directory described by rdf_current_db/1 and has the extension
|
||
|
\fileext{trp} for the base state and \fileext{jrn} for the journal.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% TURTLE %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\input{rdfturtle.tex}
|
||
|
\input{rdfturtlewrite.tex}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% RDFS %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Library semweb/rdfs}
|
||
|
\label{sec:rdfs}
|
||
|
|
||
|
\index{RDF-Schema}%
|
||
|
The \pllib{semweb/rdfs} library adds interpretation of the triple store
|
||
|
in terms of concepts from RDF-Schema (RDFS). There are two ways to
|
||
|
provide support for more high level languages in RDF. One is to view
|
||
|
such languages as a set of \jargon{entailment rules}. In this model the
|
||
|
rdfs library would provide a predicate \predref{rdfs}{3} providing the
|
||
|
same functionality as rdf/3 on union of the raw graph and triples that
|
||
|
can be derived by applying the RDFS entailment rules.
|
||
|
|
||
|
Alternatively, RDFS provides a view on the RDF store in terms of
|
||
|
individuals, classes, properties, etc., and we can provide predicates
|
||
|
that query the database with this view in mind. This is the approach
|
||
|
taken in the \pllib{semweb/rdfs.p}l library, providing calls like
|
||
|
\term{rdfs_individual_of}{?Resource, ?Class}.%
|
||
|
\footnote{The SeRQL language is based on querying the deductive
|
||
|
closure of the triple set. The SWI-Prolog SeRQL
|
||
|
library provides \jargon{entailment modules} that
|
||
|
take the approach outlined above.}
|
||
|
|
||
|
|
||
|
\subsection{Hierarchy and class-individual relations}
|
||
|
|
||
|
The predicates in this section explore the \const{rdfs:subPropertyOf},
|
||
|
\const{rdfs:subClassOf} and \const{rdf:type} relations. Note that the
|
||
|
most fundamental of these, \const{rdfs:subPropertyOf}, is also used
|
||
|
by rdf_has/[3,4].
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfs_subproperty_of}{2}{?SubProperty, ?Property}
|
||
|
True if \arg{SubProperty} is equal to \arg{Property} or \arg{Property}
|
||
|
can be reached from \arg{SubProperty} following the
|
||
|
\const{rdfs:subPropertyOf} relation. It can be used to test as well as
|
||
|
generate sub-properties or super-properties. Note that the commonly used
|
||
|
semantics of this predicate is wired into rdf_has/[3,4].%
|
||
|
\bug{The current implementation cannot deal with
|
||
|
cycles}.%
|
||
|
\bug{The current implementation cannot deal with predicates
|
||
|
that are an \const{rdfs:subPropertyOf} of
|
||
|
\const{rdfs:subPropertyOf}, such as
|
||
|
\const{owl:samePropertyAs}.}
|
||
|
|
||
|
\predicate{rdfs_subclass_of}{2}{?SubClass, ?Class}
|
||
|
True if \arg{SubClass} is equal to \arg{Class} or \arg{Class}
|
||
|
can be reached from \arg{SubClass} following the
|
||
|
\const{rdfs:subClassOf} relation. It can be used to test as
|
||
|
well as generate sub-classes or super-classes.%
|
||
|
\bug{The current implementation cannot deal with
|
||
|
cycles}.
|
||
|
|
||
|
\predicate{rdfs_class_property}{2}{+Class, ?Property}
|
||
|
True if the domain of \arg{Property} includes \arg{Class}. Used to
|
||
|
generate all properties that apply to a class.
|
||
|
|
||
|
\predicate{rdfs_individual_of}{2}{?Resource, ?Class}
|
||
|
True if \arg{Resource} is an indivisual of \arg{Class}. This implies
|
||
|
\arg{Resource} has an \const{rdf:type} property that refers to
|
||
|
\arg{Class} or a sub-class thereof. Can be used to test, generate
|
||
|
classes \arg{Resource} belongs to or generate individuals described
|
||
|
by \arg{Class}.
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{Collections and Containers}
|
||
|
|
||
|
\index{parseType,Collection}%
|
||
|
\index{Collection,parseType}%
|
||
|
The RDF construct \const{rdf:parseType}=\const{Collection} constructs
|
||
|
a list using the \const{rdf:first} and \const{rdf:next} relations.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfs_member}{2}{?Resource, +Set}
|
||
|
Test or generate the members of \arg{Set}. \arg{Set} is either an
|
||
|
individual of \const{rdf:List} or \const{rdf:Container}.
|
||
|
|
||
|
\predicate{rdfs_list_to_prolog_list}{2}{+Set, -List}
|
||
|
Convert \arg{Set}, which must be an individual of \const{rdf:List} into
|
||
|
a Prolog list of objects.
|
||
|
|
||
|
\predicate{rdfs_assert_list}{2}{+List, -Resource}
|
||
|
Equivalent to rdfs_assert_list/3 using \arg{DB} = \const{user}.
|
||
|
|
||
|
\predicate{rdfs_assert_list}{3}{+List, -Resource, +DB}
|
||
|
If \arg{List} is a list of resources, create an RDF list \arg{Resource}
|
||
|
that reflects these resources. \arg{Resource} and the sublist resources
|
||
|
are generated with rdf_bnode/1. The new triples are associated with the
|
||
|
database \arg{DB}.
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{Labels and textual search}
|
||
|
|
||
|
Textual search is partly handled by the predicates from the
|
||
|
\pllib{rdf_db} module and its underlying C-library. For example,
|
||
|
literal objects are hashed case-insensitive to speed up the commonly
|
||
|
used case-insensitive search.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate[multi]{rdfs_label}{3}{?Resource, ?Language, ?Label}
|
||
|
Extract the label from \arg{Resource} or generate all resources with the
|
||
|
given \arg{Label}. The label is either associated using a sub-property
|
||
|
of \const{rdfs:label} or it is extracted from \arg{Resource} by taking
|
||
|
the part after the last \chr{\#} or \chr{/}. If this too fails,
|
||
|
\arg{Label} is unified with \arg{Resource}. \arg{Language} is unified
|
||
|
to the value of the \const{xml:lang} attribute of the label or a
|
||
|
variable if the label has no language specified.
|
||
|
|
||
|
\predicate{rdfs_label}{2}{?Resource, ?Label}
|
||
|
Defined as \term{rdfs_label}{Resource, _, Label}.
|
||
|
|
||
|
\predicate{rdfs_ns_label}{3}{?Resource, ?Language, ?Label}
|
||
|
Similar to rdfs_label/2, but prefixes the result using the declared
|
||
|
namespace alias (see \secref{rdfns}) to facilitate user-friendly labels
|
||
|
in applications using multiple namespaces that may lead to confusion.
|
||
|
|
||
|
\predicate{rdfs_ns_label}{2}{?Resource, ?Label}
|
||
|
Defined as \term{rdfs_ns_label}{Resource, _, Label}.
|
||
|
|
||
|
\predicate{rdfs_find}{5}{+String, +Description, ?Properties, +Method, -Subject}
|
||
|
\index{search}%
|
||
|
Find (on backtracking) \arg{Subject}s that satisfy a search
|
||
|
specification for textual attributes. \arg{String} is the string
|
||
|
searched for. \arg{Description} is an OWL description (see \secref{owl})
|
||
|
specifying candidate resources. \arg{Properties} is a list of properties
|
||
|
to search for literal objects, \arg{Method} defines the textual
|
||
|
matching algorithm. All textual mapping is performed case-insensitive.
|
||
|
The matching-methods are described with rdf_match_label/3. If
|
||
|
\arg{Properties} is unbound, the search is performed in any property and
|
||
|
\arg{Properties} is unified with a list holding the property on which
|
||
|
the match was found.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% LIBRARY %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\input{rdflib.tex}
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% PLDOC LIBRARIES %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\input{sparqlclient.tex}
|
||
|
\input{rdfcompare.tex}
|
||
|
\input{rdfportray.tex}
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% RDF-EDIT %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Library semweb/rdf_edit} \label{sec:rdfedit}
|
||
|
|
||
|
\begin{quote}\em
|
||
|
It is anticipated that this library will eventually be superseeded by
|
||
|
facilities running on top of the native rdf_transaction/2 and
|
||
|
rdf_monitor/2 facilities. See \secref{rdfmonitor}.
|
||
|
\end{quote}
|
||
|
|
||
|
\index{undo}\index{journal}\index{transactions}
|
||
|
The module \file{rdf_edit.pl} is a layer than encasulates the
|
||
|
modification predicates from \secref{rdfmodify} for use from
|
||
|
a (graphical) editor of the triple store. It adds the
|
||
|
following features:
|
||
|
|
||
|
\begin{itemlist}
|
||
|
\item [Transaction management]
|
||
|
Modifications are grouped into \emph{transactions} to safeguard
|
||
|
the system from failing operations as well as provide meaningfull
|
||
|
chunks for undo and journalling.
|
||
|
|
||
|
\item [Undo]
|
||
|
Undo and redo-transactions using a single mechanism to support
|
||
|
user-friendly editing.
|
||
|
|
||
|
\item [Journalling]
|
||
|
Record all actions to support analysis, versioning, crash-recovery
|
||
|
and an alternative to saving.
|
||
|
\end{itemlist}
|
||
|
|
||
|
\subsection{Transaction management}
|
||
|
|
||
|
Transactions group low-level modification actions together.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_transaction}{1}{:Goal}
|
||
|
Run \arg{Goal}, recording all modifications to the triple store made
|
||
|
through \secref{rdfeencap}. Execution is performed as in once/1. If
|
||
|
\arg{Goal} succeeds the changes are committed. If \arg{Goal} fails
|
||
|
or throws an exception the changes are reverted.
|
||
|
|
||
|
Transactions may be nested. A failing nested transaction only reverts
|
||
|
the actions performed inside the nested transaction. If the outer
|
||
|
transaction succeeds it is committed normally. Contrary, if the
|
||
|
outer transaction fails, comitted nested transactions are reverted
|
||
|
as well. If any of the modifications inside the transaction modifies
|
||
|
a protected file (see rdfe_set_file_property/2) the transaction is
|
||
|
reverted and rdfe_transaction/1 throws a permission error.
|
||
|
|
||
|
A successful outer transaction (`level-0') may be undone using
|
||
|
rdfe_undo/0.
|
||
|
|
||
|
\predicate{rdfe_transaction}{2}{:Goal, +Name}
|
||
|
As rdfe_transaction/1, naming the transaction \arg{Name}. Transaction
|
||
|
naming is intended for the GUI to give the user an idea of the next undo
|
||
|
action. See also rdfe_set_transaction_name/1 and
|
||
|
rdfe_transaction_name/2.
|
||
|
|
||
|
\predicate{rdfe_set_transaction_name}{1}{+Name}
|
||
|
Set the `name' of the current transaction to \arg{Name}.
|
||
|
|
||
|
\predicate{rdfe_transaction_name}{2}{?TID, ?Name}
|
||
|
Query assigned transaction names.
|
||
|
|
||
|
\predicate{rdfe_transaction_member}{2}{+TID, -Action}
|
||
|
Enumerate the actions that took place inside a transaction. This can
|
||
|
be used by a GUI to optimise the MVC (Model-View-Controller) feedback
|
||
|
loop. \arg{Action} is one of:
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{assert}{Subject, Predicate, Object}
|
||
|
\termitem{retract}{Subject, Predicate, Object}
|
||
|
\termitem{update}{Subject, Predicate, Object, Action}
|
||
|
\termitem{file}{load(Path)}
|
||
|
\termitem{file}{unload(Path)}
|
||
|
\end{description}
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{File management} \label{sec:file}
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_is_modified}{1}{?File}
|
||
|
Enumerate/test whether \arg{File} is modified sinds it was loaded or
|
||
|
sinds the last call to rdfe_clear_modified/1. Whether or not a file
|
||
|
is modified is determined by the MD5 checksum of all triples belonging
|
||
|
to the file.
|
||
|
|
||
|
\predicate{rdfe_clear_modified}{1}{+File}
|
||
|
Set the \emph{unmodified-MD5} to the current MD5 checksum. See also
|
||
|
rdfe_is_modified/1.
|
||
|
|
||
|
\predicate{rdfe_set_file_property}{2}{+File, +Property}
|
||
|
Control access right and default destination of new triples.
|
||
|
\arg{Property} is one of
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{access}{+Access}
|
||
|
Where access is one of \const{ro} or \const{rw}. Access \const{ro}
|
||
|
is default when a file is loaded for which the user has no write
|
||
|
access. If a transaction (see rdfe_transaction/1) modifies a file
|
||
|
with access \const{ro} the transaction is reversed.
|
||
|
|
||
|
\termitem{default}{+Default}
|
||
|
Set this file to be the default destination of triples. If
|
||
|
\arg{Default} is \const{fallback} it is only the default for
|
||
|
triples that have no clear default destination. If it is \const{all}
|
||
|
all new triples are added to this file.
|
||
|
\end{description}
|
||
|
|
||
|
\predicate{rdfe_get_file_property}{2}{?File, ?Property}
|
||
|
Query properties set with rdfe_set_file_property/2.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Encapsulated predicates} \label{sec:rdfeencap}
|
||
|
|
||
|
The following predicates encapsulate predicates from the \file{rdf_db}
|
||
|
module that modify the triple store. These predicates can only be called
|
||
|
when inside a \emph{transaction}. See rdfe_transaction/1.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_assert}{3}{+Subject, +Predicate, +Object}
|
||
|
Encapsulates rdf_assert/3.
|
||
|
\predicate{rdfe_retractall}{3}{?Subject, ?Predicate, ?Object}
|
||
|
Encapsulates rdf_retractall/3.
|
||
|
\predicate{rdfe_update}{4}{+Subject, +Predicate, +Object, +Action}
|
||
|
Encapsulates rdf_update/4.
|
||
|
\predicate{rdfe_load}{1}{+In}
|
||
|
Encapsulates rdf_load/1.
|
||
|
\predicate{rdfe_unload}{1}{+In}
|
||
|
Encapsulates rdf_unload/1.
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{High-level modification predicates} \label{sec:rdfeedit}
|
||
|
|
||
|
This section describes a (yet very incomplete) set of more high-level
|
||
|
operations one would like to be able to perform. Eventually this set
|
||
|
may include operations based on RDFS and OWL.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_delete}{1}{+Resource}
|
||
|
Delete all traces of \arg{resource}. This implies all triples where
|
||
|
\arg{Resource} appears as \emph{subject}, \emph{predicate} or
|
||
|
\emph{object}. This predicate starts a transation.
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{Undo}
|
||
|
|
||
|
\index{undo}%
|
||
|
Undo aims at user-level undo operations from a (graphical) editor.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_undo}{0}{}
|
||
|
Revert the last outermost (`level 0') transaction (see
|
||
|
rdfe_transaction/1). Successive calls go further back in history. Fails
|
||
|
if there is no more undo information.
|
||
|
|
||
|
\predicate{rdfe_redo}{0}{}
|
||
|
Revert the last rdfe_undo/0. Successive calls revert more rdfe_undo/0
|
||
|
operations. Fails if there is no more redo information.
|
||
|
|
||
|
\predicate{rdfe_can_undo}{1}{-TID}
|
||
|
Test if there is another transaction that can be reverted. Used for
|
||
|
activating menus in a graphical environment. \arg{TID} is unified to
|
||
|
the transaction id of the action that will be reverted.
|
||
|
|
||
|
\predicate{rdfe_can_redo}{1}{-TID}
|
||
|
Test if there is another undo that can be reverted. Used for
|
||
|
activating menus in a graphical environment. \arg{TID} is unified to
|
||
|
the transaction id of the action that will be reverted.
|
||
|
\end{description}
|
||
|
|
||
|
\subsection{Journalling}
|
||
|
|
||
|
\index{journal}%
|
||
|
Optionally, every action through this module is immediately send to a
|
||
|
\jargon{journal-file}. The journal provides a full log of all actions
|
||
|
with a time-stamp that may be used for inspection of behaviour, version
|
||
|
management, crash-recovery or an alternative to regular save operations.
|
||
|
|
||
|
\begin{description}
|
||
|
\predicate{rdfe_open_journal}{2}{+File, +Mode}
|
||
|
Open a existing or new journal. If \arg{Mode} equala \const{append}
|
||
|
and \arg{File} exists, the journal is first replayed. See
|
||
|
rdfe_replay_journal/1. If \arg{Mode} is \const{write} the journal is
|
||
|
truncated if it exists.
|
||
|
|
||
|
\predicate{rdfe_close_journal}{0}{}
|
||
|
Close the currently open journal.
|
||
|
|
||
|
\predicate{rdfe_current_journal}{1}{-Path}
|
||
|
Test whether there is a journal and to which file the actions are
|
||
|
journalled.
|
||
|
|
||
|
\predicate{rdfe_replay_journal}{1}{+File}
|
||
|
Read a journal, replaying all actions in it. To do so, the system
|
||
|
reads the journal a transaction at a time. If the transaction is
|
||
|
closed with a \emph{commit} it executes the actions inside the journal.
|
||
|
If it is closed with a \emph{rollback} or not closed at all due to a
|
||
|
crash the actions inside the journal are discarded. Using this
|
||
|
predicate only makes sense to inspect the state at the end of a journal
|
||
|
without modifying the journal. Normally a journal is replayed using the
|
||
|
\const{append} mode of rdfe_open_journal/2.
|
||
|
\end{description}
|
||
|
|
||
|
|
||
|
\subsection{Broadcasting change events}
|
||
|
|
||
|
\index{event}\index{broadcast}%
|
||
|
To realise a modular graphical interface for editing the triple store,
|
||
|
the system must use some sort of \emph{event} mechanism. This is
|
||
|
implemented by the XPCE library \pllib{broadcast} which is described
|
||
|
in the \url[XPCE User
|
||
|
Guide]{http://hcs.science.uva.nl/projects/xpce/UserGuide/libbroadcast.html}.
|
||
|
In this section we describe the terms brodcasted by the library.
|
||
|
|
||
|
\begin{description}
|
||
|
\termitem{rdf_transaction}{+Id}
|
||
|
A `level-0' transaction has been committed. The system passes the
|
||
|
identifier of the transaction in \arg{Id}. In the current implementation
|
||
|
there is no way to find out what happened inside the transaction. This
|
||
|
is likely to change in time.
|
||
|
|
||
|
If a transaction is reverted due to failure or exception \emph{no} event
|
||
|
is broadcasted. The initiating GUI element is supposed to handle this
|
||
|
possibility itself and other components are not affected as the triple
|
||
|
store is not changed.
|
||
|
|
||
|
\termitem{rdf_undo}{+Type, +Id}
|
||
|
This event is broadcasted after an rdfe_undo/0 or rdfe_redo/0.
|
||
|
\arg{Type} is one of \const{undo} or \const{redo} and \arg{Id}
|
||
|
identifies the transaction as above.
|
||
|
\end{description}
|
||
|
|
||
|
\section{Related packages and issues}
|
||
|
|
||
|
\index{Sesame}\index{SeRQL}%
|
||
|
The SWI-Prolog SemWeb package is designed to provide access to the
|
||
|
Semantic Web languages from Prolog. It consists of the low level
|
||
|
\file{rdf_db.pl} store with layers such as \pllib{semweb/rdfs.pl} to provide
|
||
|
more high level querying of a triple set with relations such as
|
||
|
rdfs_individual_of/2, rdfs_subclass_of/2, etc.
|
||
|
\url[SeRQL]{http://www.openrdf.org} is a semantic web query language
|
||
|
taking another route. Instead of providing alternative relations
|
||
|
SeRQL defines a graph query on de \jargon{deductive closure} of the
|
||
|
triple set. For example, under assumption of RDFS entailment rules
|
||
|
this makes the query \term{rdf}{S, rdf:type, Class} equivalent to
|
||
|
\term{rdfs_individual_of}{S, Class}.
|
||
|
|
||
|
\index{optimising,query}%
|
||
|
We developed a parser for SeRQL which compiles SeRQL path expressions
|
||
|
into Prolog conjunctions of \term{rdf}{Subject, Predicate, Object}
|
||
|
calls. \jargon{Entailment modules} realise a fully logical
|
||
|
implementation of rdf/3 including the entailment reasoning required to
|
||
|
deal with a Semantic Web language or application specific reasoning. The
|
||
|
infra structure is completed with a query optimiser and an HTTP server
|
||
|
compliant to the \url[Sesame]{http://www.openrdf.org} implementation of
|
||
|
the SeRQL language. The Sesame Java client can be used to access Prolog
|
||
|
servers from Java, while the Prolog client can be used to access the
|
||
|
Sesame SeRQL server. For further details, see the
|
||
|
\url[project
|
||
|
home]{http://gollem.science.uva.nl/twiki/pl/bin/view/Library/SeRQL}.
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% OWL %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{OWL} \label{sec:owl}
|
||
|
|
||
|
\index{OWL}%
|
||
|
The SWI-Prolog Semantic Web library provides no direct support for OWL.
|
||
|
OWL(-2) support is provided through Thea, an OWL library for SWI-Prolog
|
||
|
See \url{http://www.semanticweb.gr/TheaOWLLib/}.
|
||
|
|
||
|
|
||
|
\section*{Acknowledgements}
|
||
|
|
||
|
This research was supported by the following projects: MIA and
|
||
|
MultimediaN project (www.multimedian.nl) funded through the BSIK
|
||
|
programme of the Dutch Government, the FP-6 project HOPS of the
|
||
|
European Commision.
|
||
|
|
||
|
The implementation of AVL trees is based on libavl by Brad Appleton.
|
||
|
See the source file \file{avl.c} for details.
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
% FOOTER %
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\printindex
|
||
|
|
||
|
\end{document}
|
||
|
|
||
|
|