This repository has been archived on 2023-08-20. You can view files and clone it, but cannot push or open issues or pull requests.
yap-6.3/packages/semweb/rdflib.doc

430 lines
15 KiB
Plaintext

\section{Managing RDF input files}
\label{sec:rdflib}
Complex projects require RDF resources from many locations and typically
wish to load these in different combinations. For example loading a
small subset of the data for debugging purposes or load a different set
of files for experimentation. The library \pllib{semweb/rdf_library.pl}
manages sets of RDF files spread over different locations, including
file and network locations. RDF files are annotated using a
\jargon{Manifest} file in RDF format.
Currently (September 2007), the E-culture server loads more than 120 RDF
files, containing many different schemas, instance repositories and
ontology mappings. Some resources, such as the W3C version of Wordnet
come in many files. The server is initialised by loading (a subset of)
these files. The subset is defined by predicates called
\predref{load_medium}{0}, \predref{load_tgn}{1}, etc. This has become
unmanageable. There is no way to find out exactly what will be loaded or
whether all RDF files are in place except for actually executing the
load. There is also no easy way to exploit concurrency to speedup the
process.
For this reason we introduce RDF \jargon{Manifest} files that describe
one or more RDF resources and their dependencies. The manifest file can
be distributed along with a set of RDF files, providing a machine
readable portable and declarative description of how the RDF files are
intended to be combined. Software allows for listing the content of the
library or loading an entry with all dependencies.
\subsection{The Manifest file}
A manifest file is an RDF file, often in Turtle \cite{turtle} format,
that provides meta-data about RDF resources. Often a manifest will
describe RDF files in the current directory, but it can also describe
RDF resources at arbitrary URL locations. The RDF schema for RDF library
meta-data can be found in \file{rdf_library.ttl}. The namespace for the
RDF library format is defined as
\url{http://www.swi-prolog.org/rdf/library/} and abbreviated as
\const{lib}.
The schema defines three root classes: lib:Namespace, lib:Ontology and
lib:Virtual, which we describe below.
\begin{description}
\resitem{lib:Ontology}
This is a subclass of owl:Ontology. It has two subclasses, lib:Schema
and lib:Instances. These three classes are currently processed equally.
The following properties are recognised on lib:Ontology:
\begin{description}
\resitem {dc:title}
Title of the ontology. Displayed by rdf_list_library/0.
\resitem {owl:versionInfo}
Version of the ontology. Displayed by rdf_list_library/0.
\resitem {owl:imports}
Ontologies imported. If rdf_load_library/2 is used to load this
ontology, the ontologies referenced here are loaded as well. There
are two subProperties: lib:schema and lib:instances with the obvious
meaning.
\resitem {owl:providesNamespace}
Informally, providing a namespace is defined as providing subjects that
resides in the namespace.
\resitem {owl:usesNamespace}
Informally, using a namespace is defined as providing objects that
reside in the namespace.
\resitem {owl:source}
Defines the named graph into which the resource is loaded. If this
ends in a \const{/}, the basename of each loaded file is appended to
the given source. Defaults to the URL the RDF is loaded from.
\resitem {owl:baseURI}
Defines the base for processing the RDF data. If not provided this
defaults to the named graph, which in turn defaults to the URL the
RDF is loaded from.
\resitem {owl:blankNodes}
One of \const{share} or \const{noshare}. A SWI-Prolog RDF library
extension that allows for sharing equivalent blank nodes. Sharing
is the default.
\end{description}
\resitem{lib:Virtual}
Virtual ontologies do not refer to an RDF resource themselves. They
only import other resources. For example the W3C WordNet manifest
defines \const{wn-basic} and \const{wn-full} as virtual resources.
The lib:Virtual resource is used as a second rdf:type:
\begin{code}
<wn-basic>
a lib:Ontology ;
a lib:Virtual ;
...
\end{code}
\resitem{lib:Namespace}
Defines a URL to be a namespace. The definition provides the preferred
mnemonic and can be referenced in the lib:providesNamespace and
lib:usesNamespace properties. The rdf_load_library/2 predicates
registers encountered namespace mnemonics with rdf-db using
rdf_register_ns/2. Typically namespace declarations use @{prefix}
declarations. E.g.\
\begin{code}
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:rdfs
a lib:Namespace ;
lib:mnemonic "rdfs" ;
lib:namespace rdfs: .
\end{code}
\end{description}
\subsubsection{Finding manifest files}
The initial manifest file(s) are loaded into the system using
rdf_attach_library/1.
\begin{description}
\predicate{rdf_attach_library}{1}{+FileOrDirectory}
Load meta-data on RDF repositories from \arg{FileOrDirectory}. If the
argument is a directory, this directory is processed recursively and
each file named \file{Manifest.ttl} or \file{Manifest.rdf} is loaded.
Declared namespaces are added to the rdf-db namespace list. Encountered
ontologies are added to a private database of
\file{rdf_list_library.pl}.%
\footnote{We could have used the global RDF store, but
decided against that to avoid poluting the triple
space.}
Each ontology is given an \jargon{identifier}, derived from the
basename of the URL without the extension. This, using the
declaration below, the identifier of the declared ontology is
\const{wn-basic}.
\begin{code}
<wn-basic>
a lib:Ontology ;
a lib:Virtual ;
dc:title "Basic WordNet" ;
...
\end{code}
\predicate{rdf_list_library}{0}{}
List the available resources in the library. Currently only lists
resources that have a dc:title property. See \secref{usage} for
an example.
\end{description}
It is possible for the initial set of manifests to refer to RDF files
that are not covered by a manifest. If such a reference is encountered
while loading or listing a library, the library manager will look for a
manifest file in the directory holding the referenced RDF file and load
this manifest. If a manifest is found that covers the referenced file,
the directives found in the manifest will be followed. Otherwise the RDF
resource is simply loaded using the current defaults.
Further exploration of the library is achieved using rdf_list_library/1
or rdf_list_library/2:
\begin{description}
\predicate{rdf_list_library}{1}{+Id}
Same as \term{rdf_list_library}{Id, []}.
\predicate{rdf_list_library}{2}{+Id, +Options}
Lists the resources that will be loaded if \arg{Id} is handed to
rdf_load_library/2. See rdf_attach_library/2 for how ontology
identifiers are generated. In addition it checks the existence of each
resource to help debugging library dependencies. Before doing its work,
rdf_list_library/2 reloads manifests that have changed since they were
loaded the last time. For HTTP resources it uses the HEAD method to
verify existence and last modification time of resources.
\predicate{rdf_load_library}{2}{+Id, +Options}
Load the given library. First rdf_load_library/2 will establish what
resources need to be loaded and whether all resources exist. Than it
will load the resources.
\end{description}
\subsection{Usage scenarios}
\label{sec:usage}
Typically, a project will use a single file using the same format as a
manifest file that defines alternative configurations that can be
loaded. This file is loaded at program startup using
rdf_attach_library/1. Users can now list the available libraries
using rdf_list_libraries/0 and rdf_list_libraries/1:
\begin{code}
1 ?- rdf_list_library.
ec-core-vocabularies E-Culture core vocabularies
ec-all-vocabularies All E-Culture vocabularies
ec-hacks Specific hacks
ec-mappings E-Culture ontology mappings
ec-core-collections E-Culture core collections
ec-all-collections E-Culture all collections
ec-medium E-Culture medium sized data (artchive+aria)
ec-all E-Culture all data
\end{code}
Now we can list a specific category using rdf_list_library/1. Note this
loads two additional manifests referenced by resources encountered in
\const{ec-mappings}. If a resource does not exist is is flagged using
\const{[NOT FOUND]}.
\begin{code}
2 ?- rdf_list_library('ec-mappings').
% Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl
% Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl
<file:///home/jan/src/eculture/src/server/ec-mappings>
. <file:///home/jan/src/eculture/vocabularies/mappings/mappings>
. . <file:///home/jan/src/eculture/vocabularies/mappings/interface>
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/properties>
. . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/situations>
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl
. <file:///home/jan/src/eculture/collections/aul/aul>
. . file:///home/jan/src/eculture/collections/aul/aul.rdfs
. . file:///home/jan/src/eculture/collections/aul/aul.rdf
. . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf
. . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf
. . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf
\end{code}
\subsubsection{Referencing resources}
Resources and manifests are located either on the local filesystem or on
a network resource. The initial manifest can also be loaded from a file
or a URL. This defines the initial \jargon{base URL} of the document.
The base URL can be overruled using the Turtle @{base} directive. Other
documents can be referenced relative to this base URL by exploiting
Turtle's URI expansion rules. Turtle resources can be specified in three
ways, as absolute URLs (e.g.\
\verb$<http://www.example.com/rdf/ontology.rdf$>), as relative URL to
the base (e.g.\ \verb$<../rdf/ontology.rdf$>) or following a
\jargon{prefix} (e.g.\ prefix:ontology).
The prefix notation is powerful as we can define multiple of them and
define resources relative to them. Unfortunately, prefixes can only be
defined as absolute URLs or URLs relative to the base URL. Notably, they
cannot be defined relative to other prefixes. In addition, a prefix can
only be followed by a Qname, which excludes \verb$.$ and \verb$/$.
Easily relocatable manifests must define all resources relative to the
base URL. Relocation is automatical if the manifest remains in the same
hierarchy as the resources it references. If the manifest is copied
elsewhere (i.e.\ for creating a local version) it can use @{base} to
refer to the resource hierarchy. We can point to directories holding
manifest files using @{prefix} declarations. There, we can reference
\jargon{Virtual} resources using prefix:name. Here is an example, were
we first give some line from the initial manifest followed by the
definition of the virtual RDFS resource.
\begin{code}
@base <http://gollem.science.uva.nl/e-culture/rdf/> .
@prefix base: <base_ontologies/> .
<ec-core-vocabularies>
a lib:Ontology ;
a lib:Virtual ;
dc:title "E-Culture core vocabularies" ;
owl:imports
base:rdfs ,
base:owl ,
base:dc ,
base:vra ,
...
\end{code}
\begin{code}
<rdfs>
a lib:Schema ;
a lib:Virtual ;
rdfs:comment "RDF Schema" ;
lib:source rdfs: ;
lib:providesNamespace :rdfs ;
lib:schema <rdfs.rdfs> .
\end{code}
\subsection{Putting it all together}
In this section we provide skeleton code for filling the RDF database
from a password protected HTTP repository. The first line loads the
application. Next we include modules that enable us to manage the RDF
library, RDF database caching and HTTP connections. Then we setup the
HTTP authetication, enable caching of processed RDF files and load the
initial manifest. Finally load_data/0 loads all our RDF data.
\begin{code}
:- use_module(server).
:- use_module(library(http/http_open)).
:- use_module(library(semweb/rdf_library)).
:- use_module(library(semweb/rdf_cache)).
:- http_set_authorization('http://www.example.org/rdf',
basic(john, secret)).
:- rdf_set_cache_options([ global_directory('RDF-Cache'),
create_global_directory(true)
]).
:- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl').
%% load_data
%
% Load our RDF data
load_data :-
rdf_load_library('all').
\end{code}
\subsection{Example: A Manifest for W3C WordNet}
\label{sec:w3cmanifest}
The manifest below allows for loading WordNet in the two predefined
versions using one of
\begin{code}
?- rdf_load_library('wn-basic', []).
?- rdf_load_library('wn-full', []).
\end{code}
\begin{code}
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix wn20schema: <http://www.w3.org/2006/03/wn/wn20/schema/> .
@prefix wn20instances: <http://www.w3.org/2006/03/wn/wn20/instances/> .
# Source from http://www.cs.vu.nl/~mark/pub/wntestrdf.zip
:wn20instances
a lib:Namespace ;
lib:mnemonic "wn20instances" ;
lib:namespace wn20instances: .
:wn20schema
a lib:Namespace ;
lib:mnemonic "wn20schema" ;
lib:namespace wn20schema: .
:dc
a lib:Namespace ;
lib:mnemonic "dc" ;
lib:namespace dc: .
:owl
a lib:Namespace ;
lib:mnemonic "owl" ;
lib:namespace owl: .
# WordNet
<wn-common>
a lib:Instances ;
a lib:Virtual ;
rdfs:comment "Common files between full and basic version of WordNet" ;
lib:source wn20instances: ;
lib:instances <wordnet-attribute.rdf> ;
lib:instances <wordnet-causes.rdf> ;
lib:instances <wordnet-classifiedby.rdf> ;
lib:instances <wordnet-entailment.rdf> ;
lib:instances <wordnet-frame.rdf> ;
lib:instances <wordnet-glossary.rdf> ;
lib:instances <wordnet-hyponym.rdf> ;
lib:instances <wordnet-membermeronym.rdf> ;
lib:instances <wordnet-partmeronym.rdf> ;
lib:instances <wordnet-sameverbgroupas.rdf> ;
lib:instances <wordnet-similarity.rdf> ;
lib:instances <wordnet-synset.rdf> ;
lib:instances <wordnet-substancemeronym.rdf> .
<wnbasic.rdfs>
a lib:Schema ;
lib:source wn20schema: ;
lib:usesNamespace :owl .
<wn-basic>
a lib:Ontology ;
a lib:Virtual ;
dc:title "Basic WordNet" ;
owl:versionInfo "2.0" ;
rdfs:comment "Light version of W3C WordNet" ;
lib:schema <wnbasic.rdfs> ;
lib:source wn20instances: ;
lib:instances <wn-common> ;
lib:instances <wordnet-senselabels.rdf> ;
lib:providesNamespace :wn20schema ;
lib:providesNamespace :wn20instances .
<wnfull.rdfs>
a lib:Schema ;
lib:source wn20schema: ;
lib:usesNamespace :owl .
<wn-full>
a lib:Ontology ;
a lib:Virtual ;
dc:title "Full WordNet" ;
owl:versionInfo "2.0" ;
rdfs:comment "Full version of W3C WordNet" ;
lib:schema <full/wnfull.rdfs> ;
lib:source wn20instances: ;
lib:instances <wn-common> ;
lib:instances <wordnet-antonym.rdf> ;
lib:instances <wordnet-derivationallyrelated.rdf> ;
lib:instances <wordnet-participleof.rdf> ;
lib:instances <wordnet-pertainsto.rdf> ;
lib:instances <wordnet-seealso.rdf> ;
lib:instances <wordnet-wordsensesandwords.rdf> ;
lib:providesNamespace :wn20schema ;
lib:providesNamespace :wn20instances .
\end{code}
%%