430 lines
15 KiB
Plaintext
430 lines
15 KiB
Plaintext
\section{Managing RDF input files}
|
|
\label{sec:rdflib}
|
|
|
|
Complex projects require RDF resources from many locations and typically
|
|
wish to load these in different combinations. For example loading a
|
|
small subset of the data for debugging purposes or load a different set
|
|
of files for experimentation. The library \pllib{semweb/rdf_library.pl}
|
|
manages sets of RDF files spread over different locations, including
|
|
file and network locations. RDF files are annotated using a
|
|
\jargon{Manifest} file in RDF format.
|
|
|
|
Currently (September 2007), the E-culture server loads more than 120 RDF
|
|
files, containing many different schemas, instance repositories and
|
|
ontology mappings. Some resources, such as the W3C version of Wordnet
|
|
come in many files. The server is initialised by loading (a subset of)
|
|
these files. The subset is defined by predicates called
|
|
\predref{load_medium}{0}, \predref{load_tgn}{1}, etc. This has become
|
|
unmanageable. There is no way to find out exactly what will be loaded or
|
|
whether all RDF files are in place except for actually executing the
|
|
load. There is also no easy way to exploit concurrency to speedup the
|
|
process.
|
|
|
|
For this reason we introduce RDF \jargon{Manifest} files that describe
|
|
one or more RDF resources and their dependencies. The manifest file can
|
|
be distributed along with a set of RDF files, providing a machine
|
|
readable portable and declarative description of how the RDF files are
|
|
intended to be combined. Software allows for listing the content of the
|
|
library or loading an entry with all dependencies.
|
|
|
|
|
|
\subsection{The Manifest file}
|
|
|
|
A manifest file is an RDF file, often in Turtle \cite{turtle} format,
|
|
that provides meta-data about RDF resources. Often a manifest will
|
|
describe RDF files in the current directory, but it can also describe
|
|
RDF resources at arbitrary URL locations. The RDF schema for RDF library
|
|
meta-data can be found in \file{rdf_library.ttl}. The namespace for the
|
|
RDF library format is defined as
|
|
\url{http://www.swi-prolog.org/rdf/library/} and abbreviated as
|
|
\const{lib}.
|
|
|
|
The schema defines three root classes: lib:Namespace, lib:Ontology and
|
|
lib:Virtual, which we describe below.
|
|
|
|
\begin{description}
|
|
\resitem{lib:Ontology}
|
|
This is a subclass of owl:Ontology. It has two subclasses, lib:Schema
|
|
and lib:Instances. These three classes are currently processed equally.
|
|
The following properties are recognised on lib:Ontology:
|
|
|
|
\begin{description}
|
|
\resitem {dc:title}
|
|
Title of the ontology. Displayed by rdf_list_library/0.
|
|
\resitem {owl:versionInfo}
|
|
Version of the ontology. Displayed by rdf_list_library/0.
|
|
\resitem {owl:imports}
|
|
Ontologies imported. If rdf_load_library/2 is used to load this
|
|
ontology, the ontologies referenced here are loaded as well. There
|
|
are two subProperties: lib:schema and lib:instances with the obvious
|
|
meaning.
|
|
\resitem {owl:providesNamespace}
|
|
Informally, providing a namespace is defined as providing subjects that
|
|
resides in the namespace.
|
|
\resitem {owl:usesNamespace}
|
|
Informally, using a namespace is defined as providing objects that
|
|
reside in the namespace.
|
|
\resitem {owl:source}
|
|
Defines the named graph into which the resource is loaded. If this
|
|
ends in a \const{/}, the basename of each loaded file is appended to
|
|
the given source. Defaults to the URL the RDF is loaded from.
|
|
\resitem {owl:baseURI}
|
|
Defines the base for processing the RDF data. If not provided this
|
|
defaults to the named graph, which in turn defaults to the URL the
|
|
RDF is loaded from.
|
|
\resitem {owl:blankNodes}
|
|
One of \const{share} or \const{noshare}. A SWI-Prolog RDF library
|
|
extension that allows for sharing equivalent blank nodes. Sharing
|
|
is the default.
|
|
\end{description}
|
|
|
|
\resitem{lib:Virtual}
|
|
Virtual ontologies do not refer to an RDF resource themselves. They
|
|
only import other resources. For example the W3C WordNet manifest
|
|
defines \const{wn-basic} and \const{wn-full} as virtual resources.
|
|
The lib:Virtual resource is used as a second rdf:type:
|
|
|
|
\begin{code}
|
|
<wn-basic>
|
|
a lib:Ontology ;
|
|
a lib:Virtual ;
|
|
...
|
|
\end{code}
|
|
|
|
\resitem{lib:Namespace}
|
|
Defines a URL to be a namespace. The definition provides the preferred
|
|
mnemonic and can be referenced in the lib:providesNamespace and
|
|
lib:usesNamespace properties. The rdf_load_library/2 predicates
|
|
registers encountered namespace mnemonics with rdf-db using
|
|
rdf_register_ns/2. Typically namespace declarations use @{prefix}
|
|
declarations. E.g.\
|
|
|
|
\begin{code}
|
|
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
|
|
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
|
|
|
|
:rdfs
|
|
a lib:Namespace ;
|
|
lib:mnemonic "rdfs" ;
|
|
lib:namespace rdfs: .
|
|
\end{code}
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{Finding manifest files}
|
|
|
|
The initial manifest file(s) are loaded into the system using
|
|
rdf_attach_library/1.
|
|
|
|
\begin{description}
|
|
\predicate{rdf_attach_library}{1}{+FileOrDirectory}
|
|
Load meta-data on RDF repositories from \arg{FileOrDirectory}. If the
|
|
argument is a directory, this directory is processed recursively and
|
|
each file named \file{Manifest.ttl} or \file{Manifest.rdf} is loaded.
|
|
|
|
Declared namespaces are added to the rdf-db namespace list. Encountered
|
|
ontologies are added to a private database of
|
|
\file{rdf_list_library.pl}.%
|
|
\footnote{We could have used the global RDF store, but
|
|
decided against that to avoid poluting the triple
|
|
space.}
|
|
Each ontology is given an \jargon{identifier}, derived from the
|
|
basename of the URL without the extension. This, using the
|
|
declaration below, the identifier of the declared ontology is
|
|
\const{wn-basic}.
|
|
|
|
\begin{code}
|
|
<wn-basic>
|
|
a lib:Ontology ;
|
|
a lib:Virtual ;
|
|
dc:title "Basic WordNet" ;
|
|
...
|
|
\end{code}
|
|
|
|
\predicate{rdf_list_library}{0}{}
|
|
List the available resources in the library. Currently only lists
|
|
resources that have a dc:title property. See \secref{usage} for
|
|
an example.
|
|
\end{description}
|
|
|
|
It is possible for the initial set of manifests to refer to RDF files
|
|
that are not covered by a manifest. If such a reference is encountered
|
|
while loading or listing a library, the library manager will look for a
|
|
manifest file in the directory holding the referenced RDF file and load
|
|
this manifest. If a manifest is found that covers the referenced file,
|
|
the directives found in the manifest will be followed. Otherwise the RDF
|
|
resource is simply loaded using the current defaults.
|
|
|
|
Further exploration of the library is achieved using rdf_list_library/1
|
|
or rdf_list_library/2:
|
|
|
|
\begin{description}
|
|
\predicate{rdf_list_library}{1}{+Id}
|
|
Same as \term{rdf_list_library}{Id, []}.
|
|
|
|
\predicate{rdf_list_library}{2}{+Id, +Options}
|
|
Lists the resources that will be loaded if \arg{Id} is handed to
|
|
rdf_load_library/2. See rdf_attach_library/2 for how ontology
|
|
identifiers are generated. In addition it checks the existence of each
|
|
resource to help debugging library dependencies. Before doing its work,
|
|
rdf_list_library/2 reloads manifests that have changed since they were
|
|
loaded the last time. For HTTP resources it uses the HEAD method to
|
|
verify existence and last modification time of resources.
|
|
|
|
\predicate{rdf_load_library}{2}{+Id, +Options}
|
|
Load the given library. First rdf_load_library/2 will establish what
|
|
resources need to be loaded and whether all resources exist. Than it
|
|
will load the resources.
|
|
\end{description}
|
|
|
|
|
|
\subsection{Usage scenarios}
|
|
\label{sec:usage}
|
|
|
|
Typically, a project will use a single file using the same format as a
|
|
manifest file that defines alternative configurations that can be
|
|
loaded. This file is loaded at program startup using
|
|
rdf_attach_library/1. Users can now list the available libraries
|
|
using rdf_list_libraries/0 and rdf_list_libraries/1:
|
|
|
|
\begin{code}
|
|
1 ?- rdf_list_library.
|
|
ec-core-vocabularies E-Culture core vocabularies
|
|
ec-all-vocabularies All E-Culture vocabularies
|
|
ec-hacks Specific hacks
|
|
ec-mappings E-Culture ontology mappings
|
|
ec-core-collections E-Culture core collections
|
|
ec-all-collections E-Culture all collections
|
|
ec-medium E-Culture medium sized data (artchive+aria)
|
|
ec-all E-Culture all data
|
|
\end{code}
|
|
|
|
Now we can list a specific category using rdf_list_library/1. Note this
|
|
loads two additional manifests referenced by resources encountered in
|
|
\const{ec-mappings}. If a resource does not exist is is flagged using
|
|
\const{[NOT FOUND]}.
|
|
|
|
\begin{code}
|
|
2 ?- rdf_list_library('ec-mappings').
|
|
% Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl
|
|
% Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl
|
|
<file:///home/jan/src/eculture/src/server/ec-mappings>
|
|
. <file:///home/jan/src/eculture/vocabularies/mappings/mappings>
|
|
. . <file:///home/jan/src/eculture/vocabularies/mappings/interface>
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl
|
|
. . <file:///home/jan/src/eculture/vocabularies/mappings/properties>
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl
|
|
. . <file:///home/jan/src/eculture/vocabularies/mappings/situations>
|
|
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl
|
|
. <file:///home/jan/src/eculture/collections/aul/aul>
|
|
. . file:///home/jan/src/eculture/collections/aul/aul.rdfs
|
|
. . file:///home/jan/src/eculture/collections/aul/aul.rdf
|
|
. . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf
|
|
. . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf
|
|
. . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf
|
|
\end{code}
|
|
|
|
|
|
\subsubsection{Referencing resources}
|
|
|
|
Resources and manifests are located either on the local filesystem or on
|
|
a network resource. The initial manifest can also be loaded from a file
|
|
or a URL. This defines the initial \jargon{base URL} of the document.
|
|
The base URL can be overruled using the Turtle @{base} directive. Other
|
|
documents can be referenced relative to this base URL by exploiting
|
|
Turtle's URI expansion rules. Turtle resources can be specified in three
|
|
ways, as absolute URLs (e.g.\
|
|
\verb$<http://www.example.com/rdf/ontology.rdf$>), as relative URL to
|
|
the base (e.g.\ \verb$<../rdf/ontology.rdf$>) or following a
|
|
\jargon{prefix} (e.g.\ prefix:ontology).
|
|
|
|
The prefix notation is powerful as we can define multiple of them and
|
|
define resources relative to them. Unfortunately, prefixes can only be
|
|
defined as absolute URLs or URLs relative to the base URL. Notably, they
|
|
cannot be defined relative to other prefixes. In addition, a prefix can
|
|
only be followed by a Qname, which excludes \verb$.$ and \verb$/$.
|
|
|
|
Easily relocatable manifests must define all resources relative to the
|
|
base URL. Relocation is automatical if the manifest remains in the same
|
|
hierarchy as the resources it references. If the manifest is copied
|
|
elsewhere (i.e.\ for creating a local version) it can use @{base} to
|
|
refer to the resource hierarchy. We can point to directories holding
|
|
manifest files using @{prefix} declarations. There, we can reference
|
|
\jargon{Virtual} resources using prefix:name. Here is an example, were
|
|
we first give some line from the initial manifest followed by the
|
|
definition of the virtual RDFS resource.
|
|
|
|
\begin{code}
|
|
@base <http://gollem.science.uva.nl/e-culture/rdf/> .
|
|
|
|
@prefix base: <base_ontologies/> .
|
|
|
|
<ec-core-vocabularies>
|
|
a lib:Ontology ;
|
|
a lib:Virtual ;
|
|
dc:title "E-Culture core vocabularies" ;
|
|
owl:imports
|
|
base:rdfs ,
|
|
base:owl ,
|
|
base:dc ,
|
|
base:vra ,
|
|
...
|
|
\end{code}
|
|
|
|
\begin{code}
|
|
<rdfs>
|
|
a lib:Schema ;
|
|
a lib:Virtual ;
|
|
rdfs:comment "RDF Schema" ;
|
|
lib:source rdfs: ;
|
|
lib:providesNamespace :rdfs ;
|
|
lib:schema <rdfs.rdfs> .
|
|
\end{code}
|
|
|
|
\subsection{Putting it all together}
|
|
|
|
In this section we provide skeleton code for filling the RDF database
|
|
from a password protected HTTP repository. The first line loads the
|
|
application. Next we include modules that enable us to manage the RDF
|
|
library, RDF database caching and HTTP connections. Then we setup the
|
|
HTTP authetication, enable caching of processed RDF files and load the
|
|
initial manifest. Finally load_data/0 loads all our RDF data.
|
|
|
|
\begin{code}
|
|
:- use_module(server).
|
|
|
|
:- use_module(library(http/http_open)).
|
|
:- use_module(library(semweb/rdf_library)).
|
|
:- use_module(library(semweb/rdf_cache)).
|
|
|
|
:- http_set_authorization('http://www.example.org/rdf',
|
|
basic(john, secret)).
|
|
|
|
:- rdf_set_cache_options([ global_directory('RDF-Cache'),
|
|
create_global_directory(true)
|
|
]).
|
|
|
|
|
|
:- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl').
|
|
|
|
%% load_data
|
|
%
|
|
% Load our RDF data
|
|
|
|
load_data :-
|
|
rdf_load_library('all').
|
|
\end{code}
|
|
|
|
\subsection{Example: A Manifest for W3C WordNet}
|
|
\label{sec:w3cmanifest}
|
|
|
|
The manifest below allows for loading WordNet in the two predefined
|
|
versions using one of
|
|
|
|
\begin{code}
|
|
?- rdf_load_library('wn-basic', []).
|
|
?- rdf_load_library('wn-full', []).
|
|
\end{code}
|
|
|
|
|
|
|
|
\begin{code}
|
|
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
|
|
@prefix owl: <http://www.w3.org/2002/07/owl#> .
|
|
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
|
|
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
|
|
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
@prefix dc: <http://purl.org/dc/elements/1.1/> .
|
|
@prefix wn20schema: <http://www.w3.org/2006/03/wn/wn20/schema/> .
|
|
@prefix wn20instances: <http://www.w3.org/2006/03/wn/wn20/instances/> .
|
|
|
|
# Source from http://www.cs.vu.nl/~mark/pub/wntestrdf.zip
|
|
|
|
:wn20instances
|
|
a lib:Namespace ;
|
|
lib:mnemonic "wn20instances" ;
|
|
lib:namespace wn20instances: .
|
|
|
|
:wn20schema
|
|
a lib:Namespace ;
|
|
lib:mnemonic "wn20schema" ;
|
|
lib:namespace wn20schema: .
|
|
|
|
:dc
|
|
a lib:Namespace ;
|
|
lib:mnemonic "dc" ;
|
|
lib:namespace dc: .
|
|
|
|
:owl
|
|
a lib:Namespace ;
|
|
lib:mnemonic "owl" ;
|
|
lib:namespace owl: .
|
|
|
|
# WordNet
|
|
|
|
<wn-common>
|
|
a lib:Instances ;
|
|
a lib:Virtual ;
|
|
rdfs:comment "Common files between full and basic version of WordNet" ;
|
|
lib:source wn20instances: ;
|
|
lib:instances <wordnet-attribute.rdf> ;
|
|
lib:instances <wordnet-causes.rdf> ;
|
|
lib:instances <wordnet-classifiedby.rdf> ;
|
|
lib:instances <wordnet-entailment.rdf> ;
|
|
lib:instances <wordnet-frame.rdf> ;
|
|
lib:instances <wordnet-glossary.rdf> ;
|
|
lib:instances <wordnet-hyponym.rdf> ;
|
|
lib:instances <wordnet-membermeronym.rdf> ;
|
|
lib:instances <wordnet-partmeronym.rdf> ;
|
|
lib:instances <wordnet-sameverbgroupas.rdf> ;
|
|
lib:instances <wordnet-similarity.rdf> ;
|
|
lib:instances <wordnet-synset.rdf> ;
|
|
lib:instances <wordnet-substancemeronym.rdf> .
|
|
|
|
<wnbasic.rdfs>
|
|
a lib:Schema ;
|
|
lib:source wn20schema: ;
|
|
lib:usesNamespace :owl .
|
|
|
|
<wn-basic>
|
|
a lib:Ontology ;
|
|
a lib:Virtual ;
|
|
dc:title "Basic WordNet" ;
|
|
owl:versionInfo "2.0" ;
|
|
rdfs:comment "Light version of W3C WordNet" ;
|
|
lib:schema <wnbasic.rdfs> ;
|
|
lib:source wn20instances: ;
|
|
lib:instances <wn-common> ;
|
|
lib:instances <wordnet-senselabels.rdf> ;
|
|
lib:providesNamespace :wn20schema ;
|
|
lib:providesNamespace :wn20instances .
|
|
|
|
<wnfull.rdfs>
|
|
a lib:Schema ;
|
|
lib:source wn20schema: ;
|
|
lib:usesNamespace :owl .
|
|
|
|
<wn-full>
|
|
a lib:Ontology ;
|
|
a lib:Virtual ;
|
|
dc:title "Full WordNet" ;
|
|
owl:versionInfo "2.0" ;
|
|
rdfs:comment "Full version of W3C WordNet" ;
|
|
lib:schema <full/wnfull.rdfs> ;
|
|
lib:source wn20instances: ;
|
|
lib:instances <wn-common> ;
|
|
lib:instances <wordnet-antonym.rdf> ;
|
|
lib:instances <wordnet-derivationallyrelated.rdf> ;
|
|
lib:instances <wordnet-participleof.rdf> ;
|
|
lib:instances <wordnet-pertainsto.rdf> ;
|
|
lib:instances <wordnet-seealso.rdf> ;
|
|
lib:instances <wordnet-wordsensesandwords.rdf> ;
|
|
lib:providesNamespace :wn20schema ;
|
|
lib:providesNamespace :wn20instances .
|
|
\end{code}
|
|
|
|
%%
|