430 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			430 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | \section{Managing RDF input files} | ||
|  | \label{sec:rdflib} | ||
|  | 
 | ||
|  | Complex projects require RDF resources from many locations and typically | ||
|  | wish to load these in different combinations. For example loading a | ||
|  | small subset of the data for debugging purposes or load a different set | ||
|  | of files for experimentation. The library \pllib{semweb/rdf_library.pl} | ||
|  | manages sets of RDF files spread over different locations, including | ||
|  | file and network locations. RDF files are annotated using a | ||
|  | \jargon{Manifest} file in RDF format. | ||
|  | 
 | ||
|  | Currently (September 2007), the E-culture server loads more than 120 RDF | ||
|  | files, containing many different schemas, instance repositories and | ||
|  | ontology mappings. Some resources, such as the W3C version of Wordnet | ||
|  | come in many files. The server is initialised by loading (a subset of) | ||
|  | these files. The subset is defined by predicates called | ||
|  | \predref{load_medium}{0}, \predref{load_tgn}{1}, etc. This has become | ||
|  | unmanageable. There is no way to find out exactly what will be loaded or | ||
|  | whether all RDF files are in place except for actually executing the | ||
|  | load. There is also no easy way to exploit concurrency to speedup the | ||
|  | process. | ||
|  | 
 | ||
|  | For this reason we introduce RDF \jargon{Manifest} files that describe | ||
|  | one or more RDF resources and their dependencies. The manifest file can | ||
|  | be distributed along with a set of RDF files, providing a machine | ||
|  | readable portable and declarative description of how the RDF files are | ||
|  | intended to be combined. Software allows for listing the content of the | ||
|  | library or loading an entry with all dependencies. | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsection{The Manifest file} | ||
|  | 
 | ||
|  | A manifest file is an RDF file, often in Turtle \cite{turtle} format, | ||
|  | that provides meta-data about RDF resources. Often a manifest will | ||
|  | describe RDF files in the current directory, but it can also describe | ||
|  | RDF resources at arbitrary URL locations. The RDF schema for RDF library | ||
|  | meta-data can be found in \file{rdf_library.ttl}. The namespace for the | ||
|  | RDF library format is defined as | ||
|  | \url{http://www.swi-prolog.org/rdf/library/} and abbreviated as | ||
|  | \const{lib}. | ||
|  | 
 | ||
|  | The schema defines three root classes: lib:Namespace, lib:Ontology and | ||
|  | lib:Virtual, which we describe below. | ||
|  | 
 | ||
|  | \begin{description} | ||
|  |     \resitem{lib:Ontology} | ||
|  | This is a subclass of owl:Ontology.  It has two subclasses, lib:Schema | ||
|  | and lib:Instances.  These three classes are currently processed equally. | ||
|  | The following properties are recognised on lib:Ontology: | ||
|  | 
 | ||
|  |     \begin{description} | ||
|  | 	\resitem {dc:title} | ||
|  | Title of the ontology.  Displayed by rdf_list_library/0. | ||
|  | 	\resitem {owl:versionInfo} | ||
|  | Version of the ontology.  Displayed by rdf_list_library/0. | ||
|  | 	\resitem {owl:imports} | ||
|  | Ontologies imported.  If rdf_load_library/2 is used to load this | ||
|  | ontology, the ontologies referenced here are loaded as well.  There | ||
|  | are two subProperties: lib:schema and lib:instances with the obvious | ||
|  | meaning. | ||
|  | 	\resitem {owl:providesNamespace} | ||
|  | Informally, providing a namespace is defined as providing subjects that | ||
|  | resides in the namespace. | ||
|  | 	\resitem {owl:usesNamespace} | ||
|  | Informally, using a namespace is defined as providing objects that | ||
|  | reside in the namespace. | ||
|  | 	\resitem {owl:source} | ||
|  | Defines the named graph into which the resource is loaded.  If this | ||
|  | ends in a \const{/}, the basename of each loaded file is appended to | ||
|  | the given source.  Defaults to the URL the RDF is loaded from. | ||
|  | 	\resitem {owl:baseURI} | ||
|  | Defines the base for processing the RDF data.  If not provided this | ||
|  | defaults to the named graph, which in turn defaults to the URL the | ||
|  | RDF is loaded from. | ||
|  | 	\resitem {owl:blankNodes} | ||
|  | One of \const{share} or \const{noshare}.  A SWI-Prolog RDF library | ||
|  | extension that allows for sharing equivalent blank nodes.  Sharing | ||
|  | is the default. | ||
|  |     \end{description} | ||
|  | 
 | ||
|  |     \resitem{lib:Virtual} | ||
|  | Virtual ontologies do not refer to an RDF resource themselves. They | ||
|  | only import other resources.  For example the W3C WordNet manifest | ||
|  | defines \const{wn-basic} and \const{wn-full} as virtual resources. | ||
|  | The lib:Virtual resource is used as a second rdf:type: | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | <wn-basic> | ||
|  | 	a lib:Ontology ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	... | ||
|  | \end{code} | ||
|  | 
 | ||
|  |     \resitem{lib:Namespace} | ||
|  | Defines a URL to be a namespace. The definition provides the preferred | ||
|  | mnemonic and can be referenced in the lib:providesNamespace and | ||
|  | lib:usesNamespace properties. The rdf_load_library/2 predicates | ||
|  | registers encountered namespace mnemonics with rdf-db using | ||
|  | rdf_register_ns/2.  Typically namespace declarations use @{prefix} | ||
|  | declarations.  E.g.\ | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | @prefix	    lib: <http://www.swi-prolog.org/rdf/library/> . | ||
|  | @prefix    rdfs: <http://www.w3.org/2000/01/rdf-schema#> . | ||
|  | 
 | ||
|  | :rdfs | ||
|  | 	a lib:Namespace ; | ||
|  | 	lib:mnemonic "rdfs" ; | ||
|  | 	lib:namespace rdfs: . | ||
|  | \end{code} | ||
|  | \end{description} | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsubsection{Finding manifest files} | ||
|  | 
 | ||
|  | The initial manifest file(s) are loaded into the system using | ||
|  | rdf_attach_library/1. | ||
|  | 
 | ||
|  | \begin{description} | ||
|  |     \predicate{rdf_attach_library}{1}{+FileOrDirectory} | ||
|  | Load meta-data on RDF repositories from \arg{FileOrDirectory}. If the | ||
|  | argument is a directory, this directory is processed recursively and | ||
|  | each file named \file{Manifest.ttl} or \file{Manifest.rdf} is loaded. | ||
|  | 
 | ||
|  | Declared namespaces are added to the rdf-db namespace list. Encountered | ||
|  | ontologies are added to a private database of | ||
|  | \file{rdf_list_library.pl}.% | ||
|  | 	\footnote{We could have used the global RDF store, but | ||
|  | 		  decided against that to avoid poluting the triple | ||
|  | 		  space.} | ||
|  | Each ontology is given an \jargon{identifier}, derived from the | ||
|  | basename of the URL without the extension.  This, using the | ||
|  | declaration below, the identifier of the declared ontology is | ||
|  | \const{wn-basic}. | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | <wn-basic> | ||
|  | 	a lib:Ontology ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	dc:title "Basic WordNet" ; | ||
|  | 	... | ||
|  | \end{code} | ||
|  | 
 | ||
|  |     \predicate{rdf_list_library}{0}{} | ||
|  | List the available resources in the library.  Currently only lists | ||
|  | resources that have a dc:title property.  See \secref{usage} for | ||
|  | an example. | ||
|  | \end{description} | ||
|  | 
 | ||
|  | It is possible for the initial set of manifests to refer to RDF files | ||
|  | that are not covered by a manifest. If such a reference is encountered | ||
|  | while loading or listing a library, the library manager will look for a | ||
|  | manifest file in the directory holding the referenced RDF file and load | ||
|  | this manifest. If a manifest is found that covers the referenced file, | ||
|  | the directives found in the manifest will be followed. Otherwise the RDF | ||
|  | resource is simply loaded using the current defaults. | ||
|  | 
 | ||
|  | Further exploration of the library is achieved using rdf_list_library/1 | ||
|  | or rdf_list_library/2: | ||
|  | 
 | ||
|  | \begin{description} | ||
|  |     \predicate{rdf_list_library}{1}{+Id} | ||
|  | Same as \term{rdf_list_library}{Id, []}. | ||
|  | 
 | ||
|  |     \predicate{rdf_list_library}{2}{+Id, +Options} | ||
|  | Lists the resources that will be loaded if \arg{Id} is handed to | ||
|  | rdf_load_library/2. See rdf_attach_library/2 for how ontology | ||
|  | identifiers are generated. In addition it checks the existence of each | ||
|  | resource to help debugging library dependencies. Before doing its work, | ||
|  | rdf_list_library/2 reloads manifests that have changed since they were | ||
|  | loaded the last time. For HTTP resources it uses the HEAD method to | ||
|  | verify existence and last modification time of resources. | ||
|  | 
 | ||
|  |     \predicate{rdf_load_library}{2}{+Id, +Options} | ||
|  | Load the given library. First rdf_load_library/2 will establish what | ||
|  | resources need to be loaded and whether all resources exist.  Than it | ||
|  | will load the resources. | ||
|  | \end{description} | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsection{Usage scenarios} | ||
|  | \label{sec:usage} | ||
|  | 
 | ||
|  | Typically, a project will use a single file using the same format as a | ||
|  | manifest file that defines alternative configurations that can be | ||
|  | loaded. This file is loaded at program startup using | ||
|  | rdf_attach_library/1.  Users can now list the available libraries | ||
|  | using rdf_list_libraries/0 and rdf_list_libraries/1: | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | 1 ?- rdf_list_library. | ||
|  | ec-core-vocabularies E-Culture core vocabularies | ||
|  | ec-all-vocabularies All E-Culture vocabularies | ||
|  | ec-hacks            Specific hacks | ||
|  | ec-mappings         E-Culture ontology mappings | ||
|  | ec-core-collections E-Culture core collections | ||
|  | ec-all-collections  E-Culture all collections | ||
|  | ec-medium           E-Culture medium sized data (artchive+aria) | ||
|  | ec-all              E-Culture all data | ||
|  | \end{code} | ||
|  | 
 | ||
|  | Now we can list a specific category using rdf_list_library/1. Note this | ||
|  | loads two additional manifests referenced by resources encountered in | ||
|  | \const{ec-mappings}.  If a resource does not exist is is flagged using | ||
|  | \const{[NOT FOUND]}. | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | 2 ?- rdf_list_library('ec-mappings'). | ||
|  | % Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl | ||
|  | % Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl | ||
|  | <file:///home/jan/src/eculture/src/server/ec-mappings> | ||
|  | . <file:///home/jan/src/eculture/vocabularies/mappings/mappings> | ||
|  | . . <file:///home/jan/src/eculture/vocabularies/mappings/interface> | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl | ||
|  | . . <file:///home/jan/src/eculture/vocabularies/mappings/properties> | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl | ||
|  | . . <file:///home/jan/src/eculture/vocabularies/mappings/situations> | ||
|  | . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl | ||
|  | . <file:///home/jan/src/eculture/collections/aul/aul> | ||
|  | . . file:///home/jan/src/eculture/collections/aul/aul.rdfs | ||
|  | . . file:///home/jan/src/eculture/collections/aul/aul.rdf | ||
|  | . . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf | ||
|  | . . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf | ||
|  | . . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf | ||
|  | \end{code} | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsubsection{Referencing resources} | ||
|  | 
 | ||
|  | Resources and manifests are located either on the local filesystem or on | ||
|  | a network resource. The initial manifest can also be loaded from a file | ||
|  | or a URL. This defines the initial \jargon{base URL} of the document. | ||
|  | The base URL can be overruled using the Turtle @{base} directive. Other | ||
|  | documents can be referenced relative to this base URL by exploiting | ||
|  | Turtle's URI expansion rules. Turtle resources can be specified in three | ||
|  | ways, as absolute URLs (e.g.\ | ||
|  | \verb$<http://www.example.com/rdf/ontology.rdf$>), as relative URL to | ||
|  | the base (e.g.\ \verb$<../rdf/ontology.rdf$>) or following a | ||
|  | \jargon{prefix} (e.g.\ prefix:ontology). | ||
|  | 
 | ||
|  | The prefix notation is powerful as we can define multiple of them and | ||
|  | define resources relative to them. Unfortunately, prefixes can only be | ||
|  | defined as absolute URLs or URLs relative to the base URL. Notably, they | ||
|  | cannot be defined relative to other prefixes. In addition, a prefix can | ||
|  | only be followed by a Qname, which excludes \verb$.$ and \verb$/$. | ||
|  | 
 | ||
|  | Easily relocatable manifests must define all resources relative to the | ||
|  | base URL. Relocation is automatical if the manifest remains in the same | ||
|  | hierarchy as the resources it references. If the manifest is copied | ||
|  | elsewhere (i.e.\ for creating a local version) it can use @{base} to | ||
|  | refer to the resource hierarchy. We can point to directories holding | ||
|  | manifest files using @{prefix} declarations.  There, we can reference | ||
|  | \jargon{Virtual} resources using prefix:name.  Here is an example, were | ||
|  | we first give some line from the initial manifest followed by the | ||
|  | definition of the virtual RDFS resource. | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | @base <http://gollem.science.uva.nl/e-culture/rdf/> . | ||
|  | 
 | ||
|  | @prefix base:		<base_ontologies/> . | ||
|  | 
 | ||
|  | <ec-core-vocabularies> | ||
|  | 	a lib:Ontology ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	dc:title "E-Culture core vocabularies" ; | ||
|  | 	owl:imports | ||
|  | 		base:rdfs , | ||
|  | 		base:owl , | ||
|  | 		base:dc , | ||
|  | 		base:vra , | ||
|  | 		... | ||
|  | \end{code} | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | <rdfs> | ||
|  | 	a lib:Schema ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	rdfs:comment "RDF Schema" ; | ||
|  | 	lib:source rdfs: ; | ||
|  | 	lib:providesNamespace :rdfs ; | ||
|  | 	lib:schema <rdfs.rdfs> . | ||
|  | \end{code} | ||
|  | 
 | ||
|  | \subsection{Putting it all together} | ||
|  | 
 | ||
|  | In this section we provide skeleton code for filling the RDF database | ||
|  | from a password protected HTTP repository. The first line loads the | ||
|  | application. Next we include modules that enable us to manage the RDF | ||
|  | library, RDF database caching and HTTP connections. Then we setup the | ||
|  | HTTP authetication, enable caching of processed RDF files and load the | ||
|  | initial manifest. Finally load_data/0 loads all our RDF data. | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | :- use_module(server). | ||
|  | 
 | ||
|  | :- use_module(library(http/http_open)). | ||
|  | :- use_module(library(semweb/rdf_library)). | ||
|  | :- use_module(library(semweb/rdf_cache)). | ||
|  | 
 | ||
|  | :- http_set_authorization('http://www.example.org/rdf', | ||
|  | 			  basic(john, secret)). | ||
|  | 
 | ||
|  | :- rdf_set_cache_options([ global_directory('RDF-Cache'), | ||
|  | 			   create_global_directory(true) | ||
|  | 			 ]). | ||
|  | 
 | ||
|  | 
 | ||
|  | :- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl'). | ||
|  | 
 | ||
|  | %%	load_data | ||
|  | % | ||
|  | %	Load our RDF data | ||
|  | 
 | ||
|  | load_data :- | ||
|  | 	rdf_load_library('all'). | ||
|  | \end{code} | ||
|  | 
 | ||
|  | \subsection{Example: A Manifest for W3C WordNet} | ||
|  | \label{sec:w3cmanifest} | ||
|  | 
 | ||
|  | The manifest below allows for loading WordNet in the two predefined | ||
|  | versions using one of | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | ?- rdf_load_library('wn-basic', []). | ||
|  | ?- rdf_load_library('wn-full', []). | ||
|  | \end{code} | ||
|  | 
 | ||
|  | 
 | ||
|  | 
 | ||
|  | \begin{code} | ||
|  | @prefix	    lib: <http://www.swi-prolog.org/rdf/library/> . | ||
|  | @prefix     owl: <http://www.w3.org/2002/07/owl#> . | ||
|  | @prefix     rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . | ||
|  | @prefix    rdfs: <http://www.w3.org/2000/01/rdf-schema#> . | ||
|  | @prefix     xsd: <http://www.w3.org/2001/XMLSchema#> . | ||
|  | @prefix      dc: <http://purl.org/dc/elements/1.1/> . | ||
|  | @prefix wn20schema: <http://www.w3.org/2006/03/wn/wn20/schema/> . | ||
|  | @prefix wn20instances: <http://www.w3.org/2006/03/wn/wn20/instances/> . | ||
|  | 
 | ||
|  | # Source from http://www.cs.vu.nl/~mark/pub/wntestrdf.zip | ||
|  | 
 | ||
|  | :wn20instances | ||
|  | 	a lib:Namespace ; | ||
|  | 	lib:mnemonic "wn20instances" ; | ||
|  | 	lib:namespace wn20instances: . | ||
|  | 
 | ||
|  | :wn20schema | ||
|  | 	a lib:Namespace ; | ||
|  | 	lib:mnemonic "wn20schema" ; | ||
|  | 	lib:namespace wn20schema: . | ||
|  | 
 | ||
|  | :dc | ||
|  | 	a lib:Namespace ; | ||
|  | 	lib:mnemonic "dc" ; | ||
|  | 	lib:namespace dc: . | ||
|  | 
 | ||
|  | :owl | ||
|  | 	a lib:Namespace ; | ||
|  | 	lib:mnemonic "owl" ; | ||
|  | 	lib:namespace owl: . | ||
|  | 
 | ||
|  | #	WordNet | ||
|  | 
 | ||
|  | <wn-common> | ||
|  | 	a lib:Instances ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	rdfs:comment "Common files between full and basic version of WordNet" ; | ||
|  | 	lib:source wn20instances: ; | ||
|  | 	lib:instances <wordnet-attribute.rdf> ; | ||
|  | 	lib:instances <wordnet-causes.rdf> ; | ||
|  | 	lib:instances <wordnet-classifiedby.rdf> ; | ||
|  | 	lib:instances <wordnet-entailment.rdf> ; | ||
|  | 	lib:instances <wordnet-frame.rdf> ; | ||
|  | 	lib:instances <wordnet-glossary.rdf> ; | ||
|  | 	lib:instances <wordnet-hyponym.rdf> ; | ||
|  | 	lib:instances <wordnet-membermeronym.rdf> ; | ||
|  | 	lib:instances <wordnet-partmeronym.rdf> ; | ||
|  | 	lib:instances <wordnet-sameverbgroupas.rdf> ; | ||
|  | 	lib:instances <wordnet-similarity.rdf> ; | ||
|  | 	lib:instances <wordnet-synset.rdf> ; | ||
|  | 	lib:instances <wordnet-substancemeronym.rdf> . | ||
|  | 
 | ||
|  | <wnbasic.rdfs> | ||
|  | 	a lib:Schema ; | ||
|  | 	lib:source wn20schema: ; | ||
|  | 	lib:usesNamespace :owl . | ||
|  | 
 | ||
|  | <wn-basic> | ||
|  | 	a lib:Ontology ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	dc:title "Basic WordNet" ; | ||
|  | 	owl:versionInfo "2.0" ; | ||
|  | 	rdfs:comment "Light version of W3C WordNet" ; | ||
|  | 	lib:schema <wnbasic.rdfs> ; | ||
|  | 	lib:source wn20instances: ; | ||
|  | 	lib:instances <wn-common> ; | ||
|  | 	lib:instances <wordnet-senselabels.rdf> ; | ||
|  | 	lib:providesNamespace :wn20schema ; | ||
|  | 	lib:providesNamespace :wn20instances . | ||
|  | 
 | ||
|  | <wnfull.rdfs> | ||
|  | 	a lib:Schema ; | ||
|  | 	lib:source wn20schema: ; | ||
|  | 	lib:usesNamespace :owl . | ||
|  | 
 | ||
|  | <wn-full> | ||
|  | 	a lib:Ontology ; | ||
|  | 	a lib:Virtual ; | ||
|  | 	dc:title "Full WordNet" ; | ||
|  | 	owl:versionInfo "2.0" ; | ||
|  | 	rdfs:comment "Full version of W3C WordNet" ; | ||
|  | 	lib:schema <full/wnfull.rdfs> ; | ||
|  | 	lib:source wn20instances: ; | ||
|  | 	lib:instances <wn-common> ; | ||
|  | 	lib:instances <wordnet-antonym.rdf> ; | ||
|  | 	lib:instances <wordnet-derivationallyrelated.rdf> ; | ||
|  | 	lib:instances <wordnet-participleof.rdf> ; | ||
|  | 	lib:instances <wordnet-pertainsto.rdf> ; | ||
|  | 	lib:instances <wordnet-seealso.rdf> ; | ||
|  | 	lib:instances <wordnet-wordsensesandwords.rdf> ; | ||
|  | 	lib:providesNamespace :wn20schema ; | ||
|  | 	lib:providesNamespace :wn20instances . | ||
|  | \end{code} | ||
|  | 
 | ||
|  | %% |