This repository has been archived on 2023-08-20. You can view files and clone it, but cannot push or open issues or pull requests.
yap-6.3/packages/http/http.doc

1838 lines
68 KiB
Plaintext
Raw Normal View History

2010-06-23 11:52:34 +01:00
\documentclass[11pt]{article}
\usepackage{times}
\usepackage{pl}
\usepackage{plpage}
\usepackage{html}
\makeindex
\onefile
\htmloutput{html} % Output directory
\htmlmainfile{index} % Main document file
\bodycolor{white} % Page colour
\sloppy
\renewcommand{\runningtitle}{SWI-Prolog HTTP support}
\begin{document}
\title{SWI-Prolog HTTP support}
\author{Jan Wielemaker \\
HCS, \\
University of Amsterdam \\
The Netherlands \\
E-mail: \email{J.Wielemaker@uva.nl}}
\maketitle
\begin{abstract}
This article documents the package HTTP, a series of libraries for
accessing data on HTTP servers as well as providing HTTP server
capabilities from SWI-Prolog. Both server and client are modular
libraries. The server can be operated from the Unix \program{inetd}
super-daemon as well as as a stand-alone server that runs on all
platforms supported by SWI-Prolog.
\end{abstract}
\vfill
\pagebreak
\tableofcontents
\vfill
\vfill
\newpage
\section{Introduction}
The HTTP (HyperText Transfer Protocol) is the W3C standard protocol for
transferring information between a web-client (browser) and a
web-server. The protocol is a simple \emph{envelope} protocol where
standard name/value pairs in the header are used to split the stream
into messages and communicate about the connection-status. Many
languages have client and or server libraries to deal with the HTTP
protocol, making it a suitable candidate for general purpose
client-server applications.
In this document we describe a modular infra-structure to access
web-servers from SWI-Prolog and turn Prolog into a web-server.
\subsection*{Acknowledgements}
This work has been carried out under the following projects:
\url[GARP]{http://hcs.science.uva.nl/projects/GARP/},
\url[MIA]{http://www.ins.cwi.nl/projects/MIA/},
\url[IBROW]{http://hcs.science.uva.nl/projects/ibrow/home.html},
\url[KITS]{http://kits.edte.utwente.nl/} and
\url[MultiMediaN]{http://e-culture.multimedian.nl/}
The following people have pioneered parts of this library and
contributed with bug-report and suggestions for improvements: Anjo
Anjewierden, Bert Bredeweg, Wouter Jansweijer, Bob Wielinga, Jacco
van Ossenbruggen, Michiel Hildebrandt, Matt Lilley and Keri Harris.
\section{The HTTP client libraries}
This package provides two packages for building HTTP clients. The first,
\pllib{http/http_open} is a lightweight library for opening a HTTP
URL address as a Prolog stream. It can only deal with the HTTP GET
protocol. The second, \pllib{http/http_client} is a more advanced
library dealing with \jargon{keep-alive}, \jargon{chunked transfer} and
a plug-in mechanism providing conversions based on the MIME content-type.
\input{httpopen.tex}
\subsection{The \pllib{http/http_client} library} \label{sec:httpclient}
The \pllib{http/http_client} library provides more powerful access to
reading HTTP resources, providing \jargon{keep-alive} connections,
\jargon{chunked} transfer and conversion of the content, such as
breaking down \jargon{multipart} data, parsing HTML, etc. The library
announces itself as providing \const{HTTP/1.1}.
\begin{description}
\predicate{http_get}{3}{+URL, -Reply, +Options}
Performs a HTTP GET request on the given URL and then reads the
reply using http_read_data/3. Defined options are:
\begin{description}
\termitem{connection}{ConnectionType}
If \const{close} (default) a new connection is created for this request
and closed after the request has completed. If \const{'Keep-Alive'} the
library checks for an open connection on the requested host and port
and re-uses this connection. The connection is left open if the other
party confirms the keep-alive and closed otherwise.
\termitem{http_version}{Major-Minor}
Indicate the HTTP protocol version used for the connection. Default is
\const{1.1}.
\termitem{proxy}{+Host, +Port}
Use an HTTP proxy to connect to the outside world.
\termitem{proxy_authorization}{+Authorization}
Send authorization to the proxy. Otherwise the same as the
\const{authorization} option.
\termitem{timeout}{+Timeout}
If provided, set a timeout on the stream using set_stream/2. With this
option if no new data arrives within \arg{Timeout} seconds the
stream raises an exception. Default is to wait forever
(\const{infinite}).
\termitem{user_agent}{+Agent}
Defines the value of the \const{User-Agent} field of the HTTP header.
Default is \const{SWI-Prolog (http://www.swi-prolog.org)}.
\termitem{range}{+Range}
Ask for partial content. \arg{Range} is a term \term{\arg{Unit}}{From,
To}, where \arg{From} is an integer and \arg{To} is either an integer
or the atom \const{end}. HTTP 1.1 only supports \arg{Unit} =
\const{bytes}. E.g., to ask for bytes 1000-1999, use the option
\exam{range(bytes(1000,1999))}.
\termitem{request_header}{Name = Value}
Add a line "\arg{Name}: \arg{Value}" to the HTTP request header. Both
name and value are added uninspected and literally to the request
header. This may be used to specify accept encodings, languages, etc.
Please check the RFC2616 (HTTP) document for available fields and their
meaning.
\termitem{reply_header}{Header}
Unify \arg{Header} with a list of \arg{Name}=\arg{Value} pairs
expressing all header fields of the reply. See http_read_request/2
for the result format.
\end{description}
Remaining options are passed to http_read_data/3.
\predicate{http_post}{4}{+URL, +In, -Reply, +Options}
Performs a HTTP POST request on the given URL. It is equivalent to
http_get/3, except for providing an \jargon{input document}, which is
posted using http_post_data/3.
\predicate{http_read_data}{3}{+Header, -Data, +Options}
Read data from an HTTP stream. Normally called from http_get/3 or
http_post/4. When dealing with HTTP POST in a server this predicate can
be used to retrieve the posted data. \arg{Header} is the parsed header.
\arg{Options} is a list of \term{\arg{Name}}{Value} pairs to guide the
translation of the data. The following options are supported:
\begin{description}
\termitem{to}{Target}
Do not try to interpret the data according to the MIME-type, but return
it literally according to \arg{Target}, which is one of:
\begin{description}
\termitem{stream}{Output}
Append the data to the given stream, which must be a Prolog stream open
for writing. This can be used to save the data in a (memory-)file, XPCE
object, forward it to process using a pipe, etc.
\termitem{atom}{}
Return the result as an atom. Though SWI-Prolog has no limit on the
size of atoms and provides atom-garbage collection, this options should
be used with care.%
\footnote{Currently atom-garbage collection is activated after
the creation of 10,000 atoms.}
\termitem{codes}{}
Return the page as a list of character-codes. This is especially useful
for parsing it using grammar rules.
\end{description}
\termitem{content_type}{Type}
Overrule the \const{Content-Type} as provided by the HTTP reply header.
Intended as a work-around for badly configured servers.
\end{description}
If no \term{to}{Target} option is provided the library tries the
registered plug-in conversion filters. If none of these succeed it
tries the built-in content-type handlers or returns the content as an
atom. The builtin content filters are described below. The provided
plug-ins are described in the following sections.
\begin{description}
\termitem{application/x-www-form-urlencoded}{}
This is the default encoding mechanism for POST requests issued by
a web-browser. It is broken down to a list of \arg{Name} = \arg{Value}
terms.
\end{description}
Finally, if all else fails the content is returned as an atom.
\predicate{http_post_data}{3}{+Data, +Stream, +ExtraHeader}
Write an HTTP POST request to \arg{Stream} using data from \arg{Data}
and passing the additional extra headers from \arg{ExtraHeader}.
\arg{Data} is one of:
\begin{description}
\termitem{html}{+HTMLTokens}
Send an HTML token string as produced by the library \pllib{html_write}
described in section \secref{htmlwrite}.
\termitem{file}{+File}
Send the contents of \arg{File}. The MIME type is derived from the
filename extension using file_mime_type/2.
\termitem{file}{+Type, +File}
Send the contents of \arg{File} using the provided MIME type,
i.e.\ claiming the \const{Content-type} equals \arg{Type}.
\termitem{codes}{+Codes}
Same as string(text/plain, Codes).
\termitem{codes}{+Type, +Codes}
Send string (list of character codes) using the indicated MIME-type.
\termitem{cgi_stream}{+Stream, +Len}
Read the input from \arg{Stream} which, like CGI data starts with a
partial HTTP header. The fields of this header are merged with the
provided \arg{ExtraHeader} fields. The first \arg{Len} characters
of \arg{Stream} are used.
\termitem{form}{+ListOfParameter}
Send data of the MIME type \const{application/x-www-form-urlencoded}
as produced by browsers issuing a POST request from an HTML form.
\arg{ListOfParameter} is a list of \arg{Name}=\arg{Value} or
\mbox{\arg{Name}(\arg{Value})}.
\termitem{form_data}{+ListOfData}
Send data of the MIME type \const{multipart/form-data} as produced by
browsers issuing a POST request from an HTML form using \const{enctype}
\const{multipart/form-data}. This is a somewhat simplified MIME
\const{multipart/mixed} encoding used by browser forms including
file input fields. \arg{ListOfData} is the same as for the \arg{List}
alternative described below. Below is an example from the SWI-Prolog
\url[Sesame]{http://www.openrdf.org} interface. \arg{Repository}, etc.\
are atoms providing the value, while the last argument provides a value
from a file.
\begin{code}
...,
http_post([ protocol(http),
host(Host),
port(Port),
path(ActionPath)
],
form_data([ repository = Repository,
dataFormat = DataFormat,
baseURI = BaseURI,
verifyData = Verify,
data = file(File)
]),
_Reply,
[]),
...,
\end{code}
\termitem{List}{}
If the argument is a plain list, it is sent using the MIME type
\const{multipart/mixed} and packed using mime_pack/3. See
mime_pack/3 for details on the argument format.
\end{description}
\end{description}
\subsubsection{The MIME client plug-in} \label{sec:httpmimeplugin}
This plug-in library \pllib{http/http_mime_plugin} breaks multipart
documents that are recognised by the \exam{Content-Type:
multipart/form-data} or \exam{Mime-Version: 1.0} in the header into a
list of \arg{Name} = \arg{Value} pairs. This library deals with data
from web-forms using the \const{multipart/form-data} encoding as well as
the \url[FIPA]{http://www.fipa.org} agent-protocol messages.
\subsubsection{The SGML client plug-in} \label{sec:httpsgmlplugin}
This plug-in library \pllib{http/http_sgml_plugin} provides a bridge
between the SGML/XML/HTML parser provided by \pllib{sgml} and the http
client library. After loading this hook the following mime-types are
automatically handled by the SGML parser.
\begin{description}
\termitem{text/html}{}
Handed to \pllib{sgml} using W3C HTML 4.0 DTD, suppressing and
ignoring all HTML syntax errors. \arg{Options} is passed to
load_structure/3.
\termitem{text/xml}{}
Handed to \pllib{sgml} using dialect \const{xmlns} (XML + namespaces).
\arg{Options} is passed to load_structure/3. In particular,
\term{dialect}{xml} may be used to suppress namespace handling.
\termitem{text/x-sgml}{}
Handled to \pllib{sgml} using dialect \const{sgml}. \arg{Options}
is passed to load_structure/3.
\end{description}
\section{The HTTP server libraries} \label{sec:httpserver}
The HTTP server library consists of two parts obligatory and one
optional part. The first deals with connection management and has three
different implementation depending on the desired type of server. The
second implements a generic wrapper for decoding the HTTP request,
calling user code to handle the request and encode the answer. The
optional \file{http_dispatch} module can be used to assign HTTP
\jargon{locations} (paths) to predicates. This design is summarised in
\figref{httpserver}.
\postscriptfig[width=0.8\linewidth]{httpserver}{Design of the HTTP
server}
The functional body of the user's code is independent from the selected
server-type, making it easy to switch between the supported server
types.
\subsection{The `Body'} \label{sec:body}
The server-body is the code that handles the request and formulates a
reply. To facilitate all mentioned setups, the body is driven by
http_wrapper/5. The goal is called with the parsed request (see
\secref{request}) as argument and \const{current_output} set to a
temporary buffer. Its task is closely related to the task of a CGI
script; it must write a header declaring holding at least the
\const{Content-type} field and a body. Here is a simple body writing the
request as an HTML table.
\begin{code}
reply(Request) :-
format('Content-type: text/html~n~n', []),
format('<html>~n', []),
format('<table border=1>~n'),
print_request(Request),
format('~n</table>~n'),
format('</html>~n', []).
print_request([]).
print_request([H|T]) :-
H =.. [Name, Value],
format('<tr><td>~w<td>~w~n', [Name, Value]),
print_request(T).
\end{code}
The infrastructure recognises the header
\texttt{Transfer-encoding:~chunked}, causing it to use chunked encoding
if the client allows for it. See also \secref{transfer} and the
\const{chunked} option in http_handler/3. Other header lines are passed
verbatim to the client. Typical examples are \texttt{Set-Cookie} and
authentication headers (see \secref{auth}.
\subsubsection{Returning special status codes} \label{sec:httpspecials}
Besides returning a page by writing it to the current output stream,
the server goal can raise an exception using throw/1 to generate special
pages such as \const{not_found}, \const{moved}, etc. The defined
exceptions are:
\begin{description}
\termitem{http_reply}{+Reply, +HdrExtra}
Return a result page using http_reply/3. See http_reply/3 for details.
\termitem{http_reply}{+Reply}
Equivalent to \term{http_reply}{Reply, []}.
\termitem{http}{not_modified}
Equivalent to \term{http_reply}{not_modified, []}. This exception is
for backward compatibility and can be used by the server to indicate
the referenced resource has not been modified since it was requested
last time.
\end{description}
\input{httpdispatch.tex}
\input{httpdirindex.tex}
\input{httpsession.tex}
\subsection{HTTP Authentication}
\label{sec:auth}
The module \file{http/http_authenticate} provides the basics to validate
an HTTP \const{Authorization} error. User and password information are
read from a Unix/Apache compatible password file. This information, as
well as the validation process is cached to achieve optimal performance.
\begin{description}
\predicate{http_authenticate}{+Type, +Request, -User}
True if Request contains the information to continue according
to Type. Type identifies the required authentication technique:
\begin{description}
\termitem{basic}{+PasswordFile}
Use HTTP \const{Basic} authentication and verify the password
from PasswordFile. PasswordFile is a file holding
usernames and passwords in a format compatible to
Unix and Apache. Each line is record with \verb$:$
separated fields. The first field is the username and
the second the password _hash_. Password hashes are
validated using crypt/2.
\end{description}
Successful authorization is cached for 60 seconds to avoid
overhead of decoding and lookup of the user and password data.
http_authenticate/3 just validates the header. If authorization
is not provided the browser must be challenged, in response to
which it normally opens a user-password dialogue. Example code
realising this is below. The exception causes the HTTP wrapper
code to generate an HTTP 401 reply.
\begin{code}
...,
( http_authenticate(basic(passwd), Request, User)
-> true
; throw(http_reply(authorise(basic, Realm)))
).
\end{code}
Alternatively \term{basic}{+PasswordFile} can be passed as an option to
http_handler/3.
\end{description}
\input{httpopenid.tex}
%================================================================
\subsection{Get parameters from HTML forms}
\label{sec:httpparam}
The library \pllib{http/http_parameters} provides two predicates to
fetch HTTP request parameters as a type-checked list easily. The
library transparently handles both GET and POST requests. It builds
on top of the low-level request representation described in
\secref{request}.
\begin{description}
\predicate{http_parameters}{2}{+Request, ?Parameters}
The predicate is passes the \arg{Request} as provided to the handler
goal by http_wrapper/5 as well as a partially instantiated lists
describing the requested parameters and their types. Each parameter
specification in \arg{Parameters} is a term of the format
\mbox{\arg{Name}(\arg{-Value}, \arg{+Options})}. \arg{Options} is
a list of option terms describing the type, default, etc. If no options
are specified the parameter must be present and its value is returned in
\arg{Value} as an atom. If a parameter is missing the exception
\term{error}{\term{existence_error}{form_data, Name}, _} is thrown.
Options fall into three categories: those that handle presence of
the parameter, those that guide conversion and restrict types and
those that support automatic generation of documention. First,
the presence-options:
\begin{description}
\termitem{default}{Default}
If the named parameter is missing, \arg{Value} is unified to
\arg{Default}.
\termitem{optional}{true}
If the named parameter is missing, \arg{Value} is left unbound and
no error is generated.
\termitem{list}{Type}
The same parameter may not appear or appear multiple times. If this
option is present, \const{default} and \const{optional} are ignored and
the value is returned as a list. Type checking options are processed on
each value.
\termitem{zero_or_more}{}
Deprecated. Use \term{List}{Type}.
\end{description}
The type and conversion options are given below. The type-language can
be extended by providing clauses for the multifile hook
http:convert_parameter/3.
\begin{description}
\termitem{;}{Type1, Type2}
Succeed if either \arg{Type1} or \arg{Type2} applies. It allows
for checks such as \exam{(nonneg;oneof([infinite]))} to specify
an integer or a symbolic value.
\termitem{oneof}{List}
Succeeds if the value is member of the given list.
\definition{length $> N$}
Succeeds if value is an atom of more than $N$ characters.
\definition{length $>= N$}
Succeeds if value is an atom of more or than equal to $N$ characters.
\definition{length $< N$}
Succeeds if value is an atom of less than $N$ characters.
\definition{length $=< N$}
Succeeds if value is an atom of length than or equal to $N$ characters.
\termitem{atom}{}
No-op. Allowed for consistency.
\termitem{between}{+Low, +High}
Convert value to a number and if either \arg{Low} or \arg{High} is a
float, force value to be a float. Then check that the value is in the
given range, which includes the boundaries.
\termitem{boolean}{}
Translate =true=, =yes=, =on= and '1' into =true=; =false=, =no=,
=off= and '0' into =false= and raises an error otherwise.
\termitem{float}{}
Convert value to a float. Integers are transformed into float. Throws a
type-error otherwise.
\termitem{integer}{}
Convert value to an integer. Throws a type-error otherwise.
\termitem{nonneg}{}
Convert value to a non-negative integer. Throws a type-error
of the value cannot be converted to an integer and a domain-error
otherwise.
\termitem{number}{}
Convert value to a number. Throws a type-error otherwise.
\end{description}
The last set of options is to support automatic generation of HTTP
API documentation from the sources.\footnote{This facility is under
development in ClioPatria; see \file{http_help.pl}}.
\begin{description}
\termitem{description}{+Atom}
Description of the parameter in plain text.
\termitem{group}{+Parameters, +Options}
Define a logical group of parameters. \arg{Parameters} are processed
as normal. \arg{Options} may include a description of the group. Groups
can be nested.
\end{description}
Below is an example
\begin{code}
reply(Request) :-
http_parameters(Request,
[ title(Title, [ optional(true) ]),
name(Name, [ length >= 2 ]),
age(Age, [ between(0, 150) ])
]),
...
\end{code}
Same as \term{http_parameters}{Request, Parameters, []}
\predicate{http_parameters}{3}{+Request, ?Parameters, +Options}
In addition to http_parameters/2, the following options are defined.
\begin{description}
\termitem{form_data}{-Data}
Return the entire set of provided \arg{Name}=\arg{Value} pairs from
the GET or POST request. All values are returned as atoms.
\termitem{attribute_declarations}{:Goal}
If a parameter specification lacks the parameter options, call
\term{call}{Goal, +ParamName, -Options} to find the options. Intended
to share declarations over many calls to http_parameters/3. Using
this construct the above can be written as below.
\begin{code}
reply(Request) :-
http_parameters(Request,
[ title(Title),
name(Name),
age(Age)
],
[ attribute_declarations(param)
]),
...
param(title, [optional(true)]).
param(name, [length >= 2 ]).
param(age, [integer]).
\end{code}
\end{description}
\end{description}
\subsection{Request format} \label{sec:request}
The body-code (see \secref{body}) is driven by a \arg{Request}. This
request is generated from http_read_request/2 defined in
\pllib{http/http_header}.
\begin{description}
\predicate{http_read_request}{2}{+Stream, -Request}
Reads an HTTP request from \arg{Stream} and unify \arg{Request} with
the parsed request. \arg{Request} is a list of \term{\arg{Name}}{Value}
elements. It provides a number of predefined elements for the result
of parsing the first line of the request, followed by the additional
request parameters. The predefined fields are:
\begin{description}
\termitem{host}{Host}
If the request contains \verb$Host: $\arg{Host}, Host is unified
with the host-name. If \arg{Host} is of the format <host>:<port>
\arg{Host} only describes <host> and a field \term{port}{Port} where
\arg{Port} is an integer is added.
\termitem{input}{Stream}
The \arg{Stream} is passed along, allowing to read more data or
requests from the same stream. This field is always present.
\termitem{method}{Method}
\arg{Method} is one of \const{get}, \const{put} or \const{post}. This
field is present if the header has been parsed successfully.
\termitem{path}{Path}
Path associated to the request. This field is always present.
\termitem{peer}{Peer}
\arg{Peer} is a term \term{ip}{A,B,C,D} containing the IP address of
the contacting host.
\termitem{port}{Port}
Port requested. See \const{host} for details.
\termitem{request_uri}{RequestURI}
This is the untranslated string that follows the method in the
request header. It is used to construct the path and search fields
of the \arg{Request}. It is provided because reconstructing this
string from the path and search fields may yield a different value
due to different usage of percent encoding.
\termitem{search}{ListOfNameValue}
Search-specification of URI. This is the part after the \chr{?},
normally used to transfer data from HTML forms that use the
`\const{GET}' protocol. In the URL it consists of a www-form-encoded
list of \arg{Name}=\arg{Value} pairs. This is mapped to a list of
Prolog \arg{Name}=\arg{Value} terms with decoded names and values.
This field is only present if the location contains a
search-specification.
\termitem{http_version}{Major-Minor}
If the first line contains the \const{HTTP/}\arg{Major}.\arg{Minor}
version indicator this element indicate the HTTP version of the
peer. Otherwise this field is not present.
\termitem{cookie}{ListOfNameValue}
If the header contains a \const{Cookie} line, the value of the
cookie is broken down in \arg{Name}=\arg{Value} pairs, where the
\arg{Name} is the lowercase version of the cookie name as used
for the HTTP fields.
\termitem{set_cookie}{set_cookie(Name, Value, Options)}
If the header contains a \const{SetCookie} line, the cookie field
is broken down into the \arg{Name} of the cookie, the \arg{Value}
and a list of \arg{Name}=\arg{Value} pairs for additional options
such as \const{expire}, \const{path}, \const{domain} or \const{secure}.
\end{description}
If the first line of the request is tagged with
\const{HTTP/}\arg{Major}.\arg{Minor}, http_read_request/2 reads all
input upto the first blank line. This header consists of
\arg{Name}:\arg{Value} fields. Each such field appears as a term
\term{\arg{Name}}{Value} in the \arg{Request}, where \arg{Name} is
canonised for use with Prolog. Canonisation implies that the
\arg{Name} is converted to lower case and all occurrences of the
\chr{-} are replaced by \chr{_}. The value for the
\const{Content-length} fields is translated into an integer.
\end{description}
Here is an example:
\begin{code}
?- http_read_request(user, X).
|: GET /mydb?class=person HTTP/1.0
|: Host: gollem
|:
X = [ input(user),
method(get),
search([ class = person
]),
path('/mydb'),
http_version(1-0),
host(gollem)
].
\end{code}
\subsubsection{Handling POST requests}
Where the HTTP \const{GET} operation is intended to get a document,
using a \arg{path} and possibly some additional search information,
the \const{POST} operation is intended to hand potentially large
amounts of data to the server for processing.
The \arg{Request} parameter above contains the term \term{method}{post}.
The data posted is left on the input stream that is available through
the term \term{input}{Stream} from the \arg{Request} header. This data
can be read using http_read_data/3 from the HTTP client library. Here is
a demo implementation simply returning the parsed posted data as plain
text (assuming pp/1 pretty-prints the data).
\begin{code}
reply(Request) :-
member(method(post), Request), !,
http_read_data(Request, Data, []),
format('Content-type: text/plain~n~n', []),
pp(Data).
\end{code}
If the POST is initiated from a browser, content-type is generally
either \const{application/x-www-form-urlencoded} or
\const{multipart/form-data}. The latter is broken down automatically
if the plug-in \pllib{http/http_mime_plugin} is loaded.
\subsection{Running the server}
The functionality of the server should be defined in one Prolog file (of
course this file is allowed to load other files). Depending on the
wanted server setup this `body' is wrapped into a small Prolog file
combining the body with the appropriate server interface. There are
three supported server-setups. For most applications we advice the
multi-threaded server. Examples of this server architecture are the
\url[PlDoc]{http://www.swi-prolog.org/packages/pldoc.html} documentation
system and the \url[SeRQL]{http://www.swi-prolog.org/packages/SeRQL/}
Semantic Web server infrastructure.
All the server setups may be wrapped in a \jargon{reverse proxy} to
make them available from the public web-server as described in
\secref{proxy}.
\begin{itemlist}
\item [Using \pllib{thread_httpd} for a multi-threaded server]
This server exploits the multi-threaded version of SWI-Prolog, running
the users body code parallel from a pool of worker threads. As it avoids
the state engine and copying required in the event-driven server it is
generally faster and capable to handle multiple requests concurrently.
This server is harder to debug due to the involved threading, although
the GUI tracer provides reasonable support for multi-threaded
applications using the tspy/1 command. It can provide fast communication
to multiple clients and can be used for more demanding servers.
\item [Using \pllib{xpce_httpd} for an event-driven server]
This approach provides a single-threaded event-driven application. The
clients talk to XPCE sockets that collect an HTTP request. The server
infra-structure can talk to multiple clients simultaneously, but once
a request is complete the wrappers call the user's goal and blocks all
further activity until the request is handled. Requests from multiple
clients are thus fully serialised in one Prolog process.
This server setup is very suitable for debugging as well as embedded
server in simple applications in a fairly controlled environment.
\item [Using \pllib{inetd_httpd} for server-per-client]
In this setup the Unix \program{inetd} user-daemon is used to initialise
a server for each connection. This approach is especially suitable for
servers that have a limited startup-time. In this setup a crashing
client does not influence other requests.
This server is very hard to debug as the server is not connected to the
user environment. It provides a robust implementation for servers that
can be started quickly.
\end{itemlist}
\subsubsection{Common server interface options}
All the server interfaces provide \term{http_server}{:Goal, +Options}
to create the server. The list of options differ, but the servers share
common options:
\begin{description}
\termitem{port}{?Port}
Specify the port to listen to for stand-alone servers. \arg{Port} is
either an integer or unbound. If unbound, it is unified to the selected
free port.
\end{description}
\subsubsection{Multi-threaded Prolog} \label{sec:mthttpd}
The \pllib{http/thread_httpd.pl} provides the infrastructure to manage
multiple clients using a pool of \jargon{worker-threads}. This realises
a popular server design, also seen in Java Tomcat and Microsoft .NET.
As a single persistent server process maintains communication to all
clients startup time is not an important issue and the server can
easily maintain state-information for all clients.
In addition to the functionality provided by the other (XPCE and
inetd) servers, the threaded server can also be used to realise an
HTTPS server exploiting the \pllib{ssl} library. See option
\term{ssl}{+SSLOptions} below.
\begin{description}
\predicate{http_server}{3}{:Goal, +Options}
Create the server. \arg{Options} must provide the \term{port}{?Port}
option to specify the port the server should listen to. If \arg{Port} is
unbound an arbitrary free port is selected and \arg{Port} is unified to
this port-number. The server consists of a small Prolog thread
accepting new connection on \arg{Port} and dispatching these to a pool
of workers. Defined \arg{Options} are:
\begin{description}
\termitem{port}{?Port}
Port the server should listen to. If unbound \arg{Port} is unified with
the selected free port.
\termitem{workers}{+N}
Defines the number of worker threads in the pool. Default is to use
\arg{two} workers. Choosing the optimal value for best performance is a
difficult task depending on the number of CPUs in your system and how
much resources are required for processing a request. Too high numbers
makes your system switch too often between threads or even swap if there
is not enough memory to keep all threads in memory, while a too low
number causes clients to wait unnecessary for other clients to complete.
See also http_workers/2.
\termitem{timeout}{+SecondsOrInfinite}
Determines the maximum period of inactivity handling a request. If no
data arrives within the specified time since the last data arrived the
connection raises an exception, the worker discards the client and
returns to the pool-queue for a new client. Default is \const{infinite},
making each worker wait forever for a request to complete. Without a
timeout, a worker may wait forever on an a client that doesn't complete
its request.
\termitem{keep_alive_timeout}{+SecondsOrInfinite}
Maximum time to wait for new activity on \emph{Keep-Alive} connections.
Choosing the correct value for this parameter is hard. Disabling
Keep-Alive is bad for performance if the clients request multiple
documents for a single page. This may ---for example-- be caused by HTML
frames, HTML pages with images, associated CSS files, etc. Keeping
a connection open in the threaded model however prevents the thread
servicing the client servicing other clients. The default is 5 seconds.
\termitem{local}{+KBytes}
Size of the local-stack for the workers. Default is taken from the
commandline option.
\termitem{global}{+KBytes}
Size of the global-stack for the workers. Default is taken from the
commandline option.
\termitem{trail}{+KBytes}
Size of the trail-stack for the workers. Default is taken from the
commandline option.
\termitem{ssl}{+SSLOptions}
Use SSL (Secure Socket Layer) rather than plan TCP/IP. A server created
this way is accessed using the \const{https://} protocol. SSL allows for
encrypted communication to avoid others from tapping the wire as well as
improved authentication of client and server. The \arg{SSLOptions}
option list is passed to ssl_init/3. The port option of the main option
list is forwarded to the SSL layer. See the \pllib{ssl} library for
details.
\end{description}
\predicate{http_server_property}{2}{?Port, ?Property}
True if \arg{Property} is a property of the HTTP server running at
\arg{Port}. Defined properties are:
\begin{description}
\termitem{goal}{:Goal}
Goal used to start the server. This is often http_dispatch/1.
\termitem{start_time}{?Time}
Time-stamp when the server was created. See format_time/3 for
creating a human-readable representation.
\end{description}
\predicate{http_workers}{2}{:Port, ?Workers}
Query or manipulate the number of workers of the server identified by
\arg{Port}. If \arg{Workers} is unbound it is unified with the number
of running servers. If it is an integer greater than the current size
of the worker pool new workers are created with the same specification
as the running workers. If the number is less than the current size
of the worker pool, this predicate inserts a number of `quit' requests
in the queue, discarding the excess workers as they finish their jobs
(i.e.\ no worker is abandoned while serving a client).
This can be used to tune the number of workers for performance. Another
possible application is to reduce the pool to one worker to facilitate
easier debugging.
\predicate{http_stop_server}{2}{+Port, +Options}
Stop the HTTP server at Port. Halting a server is done
\textit{gracefully}, which means that requests being processed are not
abandoned. The \arg{Options} list is for future refinements of this
predicate such as a forced immediate abort of the server, but is
currently ignored.
\predicate{http_current_worker}{2}{?Port, ?ThreadID}
True if \arg{ThreadID} is the identifier of a Prolog thread serving
\arg{Port}. This predicate is motivated to allow for the use of
arbitrary interaction with the worker thread for development and
statistics.
\predicate{http_spawn}{2}{:Goal, +Spec}
Continue handling this request in a new thread running \arg{Goal}. After
http_spawn/2, the worker returns to the pool to process new requests. In
its simplest form, \arg{Spec} is the name of a thread pool as defined by
thread_pool_create/3. Alternatively it is an option list, whose options
are passed to thread_create_in_pool/4 if \arg{Spec} contains
\term{pool}{Pool} or to thread_create/3 of the pool option is not
present. If the dispatch module is used (see \secref{httpdispatch}),
spawning is normally specified as an option to the http_handler/3
registration.
We recomment the use of thread pools. They allow registration of a set
of threads using common characteristics, specify how many can be active
and what to do if all threads are active. A typical application may
define a small pool of threads with large stacks for computation
intensive tasks, and a large pool of threads with small stacks to serve
media. The declaration could be the one below, allowing for max 3
concurrent solvers and a maximum backlog of 5 and 30 tasks creating
image thumbnails.
\begin{code}
:- use_module(library(thread_pool)).
:- thread_pool_create(compute, 3,
[ local(20000), global(100000), trail(50000),
backlog(5)
]).
:- thread_pool_create(media, 30,
[ local(100), global(100), trail(100),
backlog(100)
]).
:- http_handler('/solve', solve, [spawn(compute)]).
:- http_handler('/thumbnail', thumbnail, [spawn(media)]).
\end{code}
\end{description}
\subsubsection{From an interactive Prolog session using XPCE}
The \pllib{http/xpce_httpd.pl} provides the infrastructure to manage
multiple clients with an event-driven control-structure. This version
can be started from an interactive Prolog session, providing a
comfortable infra-structure to debug the body of your server. It also
allows the combination of an (XPCE-based) GUI with web-technology in one
application.
\begin{description}
\predicate{http_server}{2}{:Goal, +Options}
Create an instance of \class{interactive_httpd}. \arg{Options} must
provide the \term{port}{?Port} option to specify the port the server
should listen to. If \arg{Port} is unbound an arbitrary free port is
selected and \arg{Port} is unified to this port-number. Currently
no options are defined.
\end{description}
The file \file{demo_xpce} gives a typical example of this wrapper,
assuming \file{demo_body} defines the predicate reply/1.
\begin{code}
:- use_module(xpce_httpd).
:- use_module(demo_body).
server(Port) :-
http_server(reply, Port, []).
\end{code}
The created server opens a server socket at the selected address and
waits for incoming connections. On each accepted connection it collects
input until an HTTP request is complete. Then it opens an input stream
on the collected data and using the output stream directed to the XPCE
\class{socket} it calls http_wrapper/5. This approach is fundamentally
different compared to the other approaches:
\begin{itemlist}
\item [Server can handle multiple connections]
When \emph{inetd} will start a server for each \emph{client}, and CGI
starts a server for each \emph{request}, this approach starts a single
server handling multiple clients.
\item [Requests are serialised]
All calls to \arg{Goal} are fully serialised, processing on behalf of a
new client can only start after all previous requests are answered. This
easier and quite acceptable if the server is mostly inactive and
requests take not very long to process.
\item [Lifetime of the server]
The server lives as long as Prolog runs.
\end{itemlist}
\subsubsection{From (Unix) inetd}
All modern Unix systems handle a large number of the services they run
through the super-server \emph{inetd}. This program reads
\file{/etc/inetd.conf} and opens server-sockets on all ports defined in
this file. As a request comes in it accepts it and starts the associated
server such that standard I/O refers to the socket. This approach has
several advantages:
\begin{itemlist}
\item [Simplification of servers]
Servers don't have to know about sockets and -operations.
\item [Centralised authorisation]
Using \emph{tcpwrappers} simple and effective firewalling of all
services is realised.
\item [Automatic start and monitor]
The inetd automatically starts the server `just-in-time' and starts
additional servers or restarts a crashed server according to the
specifications.
\end{itemlist}
The very small generic script for handling inetd based connections
is in \file{inetd_httpd}, defining http_server/1:
\begin{description}
\predicate{http_server}{2}{:Goal, +Options}
Initialises and runs http_wrapper/5 in a loop until failure or
end-of-file. This server does not support the \arg{Port} option
as the port is specified with the \program{inetd} configuration.
The only supported option is \arg{After}.
\end{description}
Here is the example from \file{demo_inetd}
\begin{code}
#!/usr/bin/pl -t main -q -f
:- use_module(demo_body).
:- use_module(inetd_httpd).
main :-
http_server(reply).
\end{code}
With the above file installed in \file{/home/jan/plhttp/demo_inetd},
the following line in \file{/etc/inetd} enables the server at port
4001 guarded by \emph{tcpwrappers}. After modifying inetd, send the
daemon the \const{HUP} signal to make it reload its configuration.
For more information, please check \manref{inetd.conf}{5}.
\begin{code}
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
\end{code}
\subsubsection{MS-Windows}
There are rumours that \emph{inetd} has been ported to Windows.
\subsubsection{As CGI script}
To be done.
\subsubsection{Using a reverse proxy}
\label{sec:proxy}
There are three options for public deployment of a service. One is to
run it on a dedicated machine on port 80, the standard HTTP port. The
machine may be a virtual machine running ---for example--- under
\url[VMWARE]{http://www.vmware.com} or
\url[XEN]{http://www.cl.cam.ac.uk/research/srg/netos/xen/}. The
(virtual) machine approach isolates security threads and allows for
using a standard port. The server can also be hosted on a non-standard
port such as 8000, or 8080. Using non-standard ports however may cause
problems with intermediate proxy- and/or firewall policies. Isolation
can be achieved using a Unix \jargon{chroot} environment. Another
option, also recommended for \jargon{Tomcat} servers, is the use of
Apache \jargon{reverse proxies}. This causes the main web-server to
relay requests below a given URL location to our Prolog based server.
This approach has several advantages:
\begin{itemize}
\item We can access the server on port 80, just as for a dedicated
machine. We do not need a machine though and we only need
access to the Apache configuration.
\item As Apache is doing the front-line service, the Prolog server
is normally protected from malformed HTTP requests that could
result in denial of service or otherwise compromise the
server. In addition, Apache can provide encodings such as
compression to the outside world.
\end{itemize}
Note that the proxy technology can be combined with isolation methods
such as dedicated machines, virtual machines and chroot jails. The
proxy can also provide load balancing.
\paragraph{Setting up a reverse proxy}
The Apache reverse proxy setup is really simple. Ensure the modules
\const{proxy} and \const{proxy_http} are loaded. Then add two simple
rules to the server configuration. Below is an example that makes a
PlDoc server on port 4000 available from the main Apache server at port
80.
\begin{code}
ProxyPass /pldoc/ http://localhost:4000/pldoc/
ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/
\end{code}
Apache rewrites the HTTP headers passing by, but using the above rules
it does not examine the content. This implies that URLs embedded in the
(HTML) content must use relative addressing. If the locations on the
public and Prolog server are the same (as in the example above) it is
allowed to use absolute locations. I.e.\ \const{/pldoc/search} is ok,
but \const{http://myhost.com:4000/pldoc/search} is \emph{not}. If
the locations on the server differ, locations must be relative (i.e.\
not start with \chr{/}.
This problem can also be solved using the contributed Apache module
\const{proxy_html} that can be instructed to rewrite URLs embedded in
HTML documents. In our experience, this is not troublefree as URLs can
appear in many places in generated documents. JavaScript can create
URLs on the fly, which makes rewriting virtually impossible.
\subsection{The wrapper library}
The body is called by the module \pllib{http/http_wrapper.pl}. This
module realises the communication between the I/O streams and the body
described in \secref{body}. The interface is realised by
http_wrapper/5:
\begin{description}
\predicate{http_wrapper}{5}{:Goal, +In, +Out, -Connection, +Options}
Handle an HTTP request where \arg{In} is an input stream from the
client, \arg{Out} is an output stream to the client and \arg{Goal}
defines the goal realising the body. \arg{Connection} is unified to
\const{'Keep-alive'} if both ends of the connection want to continue the
connection or \const{close} if either side wishes to close the
connection.
This predicate reads an HTTP request-header from \arg{In}, redirects
current output to a memory file and then runs \exam{call(Goal,
Request)}, watching for exceptions and failure. If \arg{Goal} executes
successfully it generates a complete reply from the created output.
Otherwise it generates an HTTP server error with additional context
information derived from the exception.
http_wrapper/5 supports the following options:
\begin{description}
\termitem{request}{-Request}
Return the executed request to the caller.
\termitem{peer}{+Peer}
Add peer(Peer) to the request header handed to \arg{Goal}. The format
of \arg{Peer} is defined by tcp_accept/3 from the clib package.
\end{description}
\predicate{http:request_expansion}{2}{+RequestIn, -RequestOut}
This \jargon{multifile} hook predicate is called just before the goal
that produces the body, while the output is already redirected to
collect the reply. If it succeeds it must return a valid modified
request. It is allowed to throw exceptions as defined in
\secref{httpspecials}. It is intended for operations such as mapping
paths, deny access for certain requests or manage cookies. If it writes
output, these must be HTTP header fields that are added \emph{before}
header fields written by the body. The example below is from the
session management library (see \secref{httpsession}) sets a cookie.
\begin{code}
...,
format('Set-Cookie: ~w=~w; path=~w~n', [Cookie, SessionID, Path]),
...,
\end{code}
\predicate{http_current_request}{1}{-Request}
Get access to the currently executing request. \arg{Request} is the
same as handed to \arg{Goal} of http_wrapper/5 \emph{after} applying
rewrite rules as defined by http:request_expansion/2. Raises an
existence error if there is no request in progress.
\predicate{http_relative_path}{2}{+AbsPath, -RelPath}
Convert an absolute path (without host, fragment or search) into a path
relative to the current page, defined as the path component from the
current request (see http_current_request/1). This call is intended to
create reusable components returning relative paths for easier support
of reverse proxies.
If ---for whatever reason--- the conversion is not possible it simply
unifies \arg{RelPath} to \arg{AbsPath}.
\end{description}
\input{httphost}
\input{httplog}
\subsection{Debugging Servers} \label{sec:debug}
The library \pllib{http/http_error.pl} defines a hook that decorates
uncaught exceptions with a stack-trace. This will generate a \emph{500
internal server error} document with a stack-trace. To enable this
feature, simply load this library. Please do note that providing
error information to the user simplifies the job of a hacker trying
to compromise your server. It is therefore not recommended to load
this file by default.
The example program \file{calc.pl} has the error handler loaded which
can be triggered by forcing a divide-by-zero in the calculator.
\subsection{Handling HTTP headers} \label{sec:httpheader}
The library \pllib{http/http_header} provides primitives for parsing and
composing HTTP headers. Its functionality is normally hidden by the
other parts of the HTTP server and client libraries. We provide a brief
overview of http_reply/3 which can be accessed from the reply body using
an exception as explain in \secref{httpspecials}.
\begin{description}
\predicate{http_reply}{3}{+Type, +Stream, +HdrExtra}
Compose a complete HTTP reply from the term \arg{Type} using additional
headers from \arg{HdrExtra} to the output stream \arg{Stream}.
\arg{ExtraHeader} is a list of \term{Field}{Value}. \arg{Type} is
one of:
\begin{description}
\termitem{html}{+HTML}
Produce a HTML page using print_html/1, normally generated using the
\pllib{http/html_write} described in \secref{htmlwrite}.
\termitem{file}{+MimeType, +Path}
Reply the content of the given file, indicating the given MIME type.
\termitem{tmp_file}{+MimeType, +Path}
Similar to \term{File}{+MimeType, +Path}, but do not include a
modification time header.
\termitem{stream}{+Stream, +Len}
Reply using the next \arg{Len} characters from \arg{Stream}. The
user must provides the MIME type and other attributes through the
\arg{ExtraHeader} argument.
\termitem{cgi_stream}{+Stream, +Len}
Similar to \term{stream}{+Stream, +Len}, but the data on \arg{Stream}
must contain an HTTP header.
\termitem{moved}{+URL}
Generate a ``301 Moved Permanently'' page with the given target
\arg{URL}.
\termitem{moved_temporary}{+URL}
Generate a ``302 Moved Temporary'' page with the given target
\arg{URL}.
\termitem{see_other}{+URL}
Generate a ``303 See Other'' page with the given target \arg{URL}.
\termitem{not_found}{+URL}
Generate a ``404 Not Found'' page.
\termitem{forbidden}{+URL}
Generate a ``403 Forbidden'' page, denying access without challenging
the client.
\termitem{authorise}{+Method, +Realm}
Generate a ``401 Authorization Required'', requesting the client to
retry using proper credentials (i.e.\ user and password).
\termitem{not_modified}{}
Generate a ``304 Not Modified'' page, indicating the requested resource
has not changed since the indicated time.
\termitem{server_error}{+Error}
Generate a ``500 Internal server error'' page with a message generated
from a Prolog exception term (see print_message/2).
\end{description}
\end{description}
\subsection{The \pllib{http/html_write} library} \label{sec:htmlwrite}
\newcommand{\elem}[1]{\const{#1}}
Producing output for the web in the form of an HTML document is a
requirement for many Prolog programs. Just using format/2 is
satisfactory as it leads to poorly readable programs generating poor
HTML. This library is based on using DCG rules.
The \pllib{http/html_write} structures the generation of HTML from a
program. It is an extensible library, providing a \jargon{DCG} framework
for generating legal HTML under (Prolog) program control. It is
especially useful for the generation of structured pages (e.g.\ tables)
from Prolog data structures.
The normal way to use this library is through the DCG html//1. This
non-terminal provides the central translation from a structured term
with embedded calls to additional translation rules to a list of atoms
that can then be printed using print_html/[1,2].
\begin{description}
\dcg{html}{1}{:Spec}
The DCG non-terminal html//1 is the main predicate of this library. It translates
the specification for an HTML page into a list of atoms that can be
written to a stream using print_html/[1,2]. The expansion rules of this
predicate may be extended by defining the multifile DCG
html_write:expand//1. \arg{Spec} is either a single specification or a
list of single specifications. Using nested lists is not allowed to
avoid ambiguity caused by the atom \const{[]}
\begin{itemlist}
\item [Atomic data]
Atomic data is quoted using html_quoted//1.
\item [\arg{Fmt} - \arg{Args}]
\arg{Fmt} and \arg{Args} are used as format-specification and argument
list to format/3. The result is quoted and added to the output list.
\item [\bsl\arg{List}]
Escape sequence to add atoms directly to the output list. This can be
used to embed external HTML code or emit script output. \arg{List} is
a list of the following terms:
\begin{itemlist}
\item [\arg{Fmt} - \arg{Args}]
\arg{Fmt} and \arg{Args} are used as format-specification and argument
list to format/3. The result is added to the output list.
\item [\arg{Atomic}]
Atomic values are added directly to the output list.
\end{itemlist}
\item [\bsl\arg{Term}]
Invoke the non-terminal \arg{Term} in the calling module. This is the
common mechanism to realise abstraction and modularisation in generating
HTML.
\item [\arg{Module}:\arg{Term}]
Invoke the non-terminal <Module>:<Term>. This is similar to
\bsl\arg{Term} but allows for invoking grammar rules in external
packages.
\item [\&(Entity)]
Emit {\tt\&<Entity>;} or {\tt\&\#<Entity>;} if \arg{Entity} is an
integer. SWI-Prolog atoms and strings are represented as Unicode.
Explicit use of this construct is rarely needed because code-points that
are not supported by the output encoding are automatically converted
into character-entities.
\item [\term{Tag}{Content}]
Emit HTML element \arg{Tag} using \arg{Content} and no attributes.
\arg{Content} is handed to html//1. See \secref{htmllayout} for details
on the automatically generated layout.
\item [\term{Tag}{Attributes, Content}]
Emit HTML element \arg{Tag} using \arg{Attributes} and \arg{Content}.
\arg{Attributes} is either a single attribute of a list of attributes.
Each attributes is of the format \term{Name}{Value} or
\mbox{\arg{Name}=\arg{Value}}. \arg{Value} is the atomic attribute
value but allows for a limited functional notation:
\begin{itemlist}
\item [$A$ + $B$]
Concatenation of $A$ and $B$
\item [\term{encode}{Atom}]
Use www_form_encode/2 to create a valid URL component.
\item [\term{location_by_id}{ID}]
HTTP location of the HTTP handler with given ID. See http_location_by_id/2.
\item [List]
A list is handled as a URL `search' component. The list members are
terms of the format \mbox{\arg{Name} = \arg{Value}} or
\term{Name}{Value}. Values are encoded as in the encode option
described above.
\end{itemlist}
The example below generates a URL that references the predicate
set_lang/1 in the application with given parameters. The http_handler/3
declaration binds \const{/setlang} to the predicate set_lang/1 for which
we provide a very simple implementation. The code between \const{...}
is part of an HTML page showing the english flag which, when pressed,
calls \term{set_lang}{Request} where \arg{Request} contains the search
parameter \mbox{\const{lang} = \const{en}}. Note that the HTTP location
(path) \const{/setlang} can be moved without affecting this code.
\begin{code}
:- http_handler('/setlang', set_lang, []).
set_lang(Request) :-
http_parameters(Request,
[ lang(Lang, [])
]),
http_session_retractall(lang(_)),
http_session_assert(lang(Lang)),
reply_html_page(title('Switched language'),
p(['Switch language to ', Lang])).
...
html(a(href(location_by_id(set_lang) + [lang(en)]),
img(src('/www/images/flags/en.png')))),
...
\end{code}
\end{itemlist}
\dcg{page}{2}{:HeadContent, :BodyContent}
The DCG non-terminal page//2 generated a complete page, including the SGML
\const{DOCTYPE} declaration. \arg{HeadContent} are elements to be placed
in the \elem{head} element and \arg{BodyContent} are elements to be
placed in the \elem{body} element.
To achieve common style (background, page header and footer), it is
possible to define DCG non-terminals head//1 and/or body//1. Non-terminal page//1
checks for the definition of these non-terminals in the module it is called
from as well as in the \const{user} module. If no definition is found, it
creates a head with only the \arg{HeadContent} (note that the
\elem{title} is obligatory) and a \elem{body} with \const{bgcolor} set
to \const{white} and the provided \arg{BodyContent}.
Note that further customisation is easily achieved using html//1 directly
as page//2 is (besides handling the hooks) defined as:
\begin{code}
page(Head, Body) -->
html([ \['<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.0//EN">\n'],
html([ head(Head),
body(bgcolor(white), Body)
])
]).
\end{code}
\dcg{page}{1}{:Contents}
This version of the page/[1,2] only gives you the SGML \const{DOCTYPE}
and the \elem{HTML} element. \arg{Contents} is used to generate both the
head and body of the page.
\dcg{html_begin}{1}{+Begin}
Just open the given element. \arg{Begin} is either an atom or a
compound term, In the latter case the arguments are used as arguments
to the begin-tag. Some examples:
\begin{code}
html_begin(table)
html_begin(table(border(2), align(center)))
\end{code}
This predicate provides an alternative to using the
\bsl\arg{Command} syntax in the html//1 specification. The
following two fragments are the same. The preferred solution depends on
your preferences as well as whether the specification is generated or
entered by the programmer.
\begin{code}
table(Rows) -->
html(table([border(1), align(center), width('80%')],
[ \table_header,
\table_rows(Rows)
])).
% or
table(Rows) -->
html_begin(table(border(1), align(center), width('80%'))),
table_header,
table_rows,
html_end(table).
\end{code}
\dcg{html_end}{1}{+End}
End an element. See html_begin/1 for details.
\end{description}
\subsubsection{Emitting HTML documents}
The non-terminal html//1 translates a specification into a list of
atoms and layout instructions. Currently the layout instructions are
terms of the format \term{nl}{N}, requesting at least \arg{N}
newlines. Multiple consecutive \term{nl}{1} terms are combined to an
atom containing the maximum of the requested number of newline
characters.
To simplify handing the data to a client or storing it into a file,
the following predicates are available from this library:
\begin{description}
\predicate{reply_html_page}{2}{:Head, :Body}
Same as \term{reply_html_page}{default, Head, Body}.
\predicate{reply_html_page}{3}{+Style, :Head, :Body}
Writes an HTML page preceded by an HTTP header as required
by \pllib{http_wrapper} (CGI-style). Here is a simple typical
example:
\begin{code}
reply(Request) :-
reply_html_page(title('Welcome'),
[ h1('Welcome'),
p('Welcome to our ...')
]).
\end{code}
The header and footer of the page can be hooked using the grammar-rules
user:head//2 and user:body//2. The first argument passed to these hooks
is the \arg{Style} argument of reply_html_page/3 and the second is the
2nd (for head//2) or 3rd (for body//2) argument of reply_html_page/3.
These hooks can be used to restyle the page, typically by embedding the
real body content in a \elem{div}. E.g., the following code provides a
menu on top of each page of that is identified using the style
\textit{myapp}.
\begin{code}
:- multifile
user:body//2.
user:body(myapp, Body) -->
html(body([ div(id(top), \application_menu),
div(id(content), Body)
])).
\end{code}
Redefining the \elem{head} can be used to pull in scripts, but
typically html_requires//1 provides a more modular approach for
pulling scripts and CSS-files.
\predicate{print_html}{1}{+List}
Print the token list to the Prolog current output stream.
\predicate{print_html}{2}{+Stream, +List}
Print the token list to the specified output stream
\predicate{html_print_length}{2}{+List, -Length}
When calling html_print/[1,2] on \arg{List}, \arg{Length}
characters will be produced. Knowing the length is needed to
provide the \const{Content-length} field of an HTTP reply-header.
\end{description}
\input{post.tex}
\subsubsection{Adding rules for html//1}
In some cases it is practical to extend the translations imposed by
html//1. When using XPCE for example, it is comfortable to be able
defining default translation to HTML for objects. We also used this
technique to define translation rules for the output of the SWI-Prolog
\pllib{sgml} package.
The html//1 non-terminal first calls the multifile ruleset html_write:expand//1.
\begin{description}
\dcg{html_write:expand}{1}{+Spec} Hook to add additional
translation rules for html//1.
\dcg{html_quoted}{1}{+Atom} Emit the text
in \arg{Atom}, inserting entity-references for the SGML special
characters \verb$<&>$.
\dcg{html_quoted_attribute}{1}{+Atom} Emit the
text in \arg{Atom} suitable for use as an SGML attribute, inserting
entity-references for the SGML special characters \verb$<&>"$.
\end{description}
\subsubsection{Generating layout} \label{sec:htmllayout}
Though not strictly necessary, the library attempts to generate
reasonable layout in SGML output. It does this only by inserting
newlines before and after tags. It does this on the basis of the
multifile predicate html_write:layout/3
\begin{description}
\predicate{html_write:layout}{3}{+Tag, -Open, -Close}
Specify the layout conventions for the element \arg{Tag}, which is a
lowercase atom. \arg{Open} is a term \arg{Pre}-\arg{Post}. It defines
that the element should have at least \arg{Pre} newline characters
before and \arg{Post} after the tag. The \arg{Close} specification is
similar, but in addition allows for the atom \const{-}, requesting the
output generator to omit the close-tag altogether or \const{empty},
telling the library that the element has declared empty content. In this
case the close-tag is not emitted either, but in addition html//1
interprets \arg{Arg} in \term{Tag}{Arg} as a list of attributes rather
than the content.
A tag that does not appear in this table is emitted without additional
layout. See also print_html/[1,2]. Please consult the
library source for examples.
\end{description}
\subsubsection{Examples}
In the following example we will generate a table of Prolog predicates
we find from the SWI-Prolog help system based on a keyword. The primary
database is defined by the predicate predicate/5 We will make hyperlinks
for the predicates pointing to their documentation.
\begin{code}
html_apropos(Kwd) :-
findall(Pred, apropos_predicate(Kwd, Pred), Matches),
phrase(apropos_page(Kwd, Matches), Tokens),
print_html(Tokens).
% emit page with title, header and table of matches
apropos_page(Kwd, Matches) -->
page([ title(['Predicates for ', Kwd])
],
[ h2(align(center),
['Predicates for ', Kwd]),
table([ align(center),
border(1),
width('80%')
],
[ tr([ th('Predicate'),
th('Summary')
])
| \apropos_rows(Matches)
])
]).
% emit the rows for the body of the table.
apropos_rows([]) -->
[].
apropos_rows([pred(Name, Arity, Summary)|T]) -->
html([ tr([ td(\predref(Name/Arity)),
td(em(Summary))
])
]),
apropos_rows(T).
% predref(Name/Arity)
%
% Emit Name/Arity as a hyperlink to
%
% /cgi-bin/plman?name=Name&arity=Arity
%
% we must do form-encoding for the name as it may contain illegal
% characters. www_form_encode/2 is defined in library(url).
predref(Name/Arity) -->
{ www_form_encode(Name, Encoded),
sformat(Href, '/cgi-bin/plman?name=~w&arity=~w',
[Encoded, Arity])
},
html(a(href(Href), [Name, /, Arity])).
% Find predicates from a keyword. '$apropos_match' is an internal
% undocumented predicate.
apropos_predicate(Pattern, pred(Name, Arity, Summary)) :-
predicate(Name, Arity, Summary, _, _),
( '$apropos_match'(Pattern, Name)
-> true
; '$apropos_match'(Pattern, Summary)
).
\end{code}
\subsubsection{Remarks on the \pllib{http/html_write} library}
This library is the result of various attempts to reach at a more
satisfactory and Prolog-minded way to produce HTML text from a program.
We have been using Prolog for the generation of web pages in a number of
projects. Just using format/2 never was a real
option, generating error-prone HTML from clumsy syntax. We started
with a layer on top of format/2, keeping track of the current nesting
and thus always capable of properly closing the environment.
DCG based translation however naturally exploits Prolog's term-rewriting
primitives. If generation fails for whatever reason it is easy to
produce an alternative document (for example holding an error message).
The approach presented in this library has been used in combination with
\pllib{http/httpd} in three projects: viewing RDF in a browser,
selecting fragments from an analysed document and presenting parts of
the XPCE documentation using a browser. It has proven to be
able to deal with generating pages quickly and comfortably.
In a future version we will probably define a goal_expansion/2 to do
compile-time optimisation of the library. Quotation of known text and
invocation of sub-rules using the \bsl\arg{RuleSet} and
<Module>:<RuleSet> operators are costly operations in the analysis
that can be done at compile-time.
\input{jswrite}
\input{httppath}
\input{htmlhead}
\input{httppwp}
\subsection{Security}
Writing servers is an inherently dangerous job that should be carried out
with some considerations. You have basically started a program on a
public terminal and invited strangers to use it. When using the
interactive server or inetd based server the server runs under your
privileges. Using CGI scripted it runs with the privileges of your
web-server. Though it should not be possible to fatally compromise a
Unix machine using user privileges, getting unconstrained access to the
system is highly undesirable.
Symbolic languages have an additional handicap in their inherent
possibilities to modify the running program and dynamically create goals
(this also applies to the popular perl and java scripting languages).
Here are some guidelines.
\begin{itemlist}
\item [Check your input]
Hardly anything can go wrong if you check the validity of
query-arguments before formulating an answer.
\item [Check filenames]
If part of the query consists of filenames or directories, check
them. This also applies to files you only read. Passing names as
\file{/etc/passwd}, but also \file{../../../../../etc/passwd} are
tried by experienced hackers to learn about the system they want
to attack. So, expand provided names using absolute_file_name/[2,3]
and verify they are inside a folder reserved for the server. Avoid
symbolic links from this subtree to the outside world. The example
below checks validity of filenames. The first call ensures proper
canonisation of the paths to avoid an mismatch due to
symbolic links or other filesystem ambiguities.
\begin{code}
check_file(File) :-
absolute_file_name('/path/to/reserved/area', Reserved),
absolute_file_name(File, Tried),
atom_concat(Reserved, _, Tried).
\end{code}
\item [Check scripts]
Should input in any way activate external scripts using shell/1
or \exam{open(pipe(Command), ...)}, verify the argument once more.
\item [Check meta-calling]
\emph{The} attractive situation for you and your attacker is below:
\begin{code}
reply(Query) :-
member(search(Args), Query),
member(action=Action, Query),
member(arg=Arg, Query),
call(Action, Arg). % NEVER EVER DO THIS!
\end{code}
All your attacker has to do is specify \arg{Action} as \const{shell}
and \arg{Arg} as \const{/bin/sh} and he has an uncontrolled shell!
\end{itemlist}
\subsection{Tips and tricks}
\begin{itemlist}
\item [URL Locations]
With an application in mind, it is tempting to make all URL
locations short and directly connected to the root (\const{/}). This is
\emph{not} a good idea. It is adviced to have all locations in a server
below a directory with an informative name. Consider to make the root
location something that can be changed using a global setting.
\begin{itemize}
\item Page generating code can easily be reused. Using locations
directly below the root however increases the likelihood
of conflicts.
\item Multiple servers can be placed behind the same public
server as explained in \secref{proxy}. Using a common
and fairly unique root, redirection is much easier and
less likely to lead to conflicts.
\end{itemize}
\item [Debugging]
Please check the section \url[``Thread-support
library(threadutil)'']{http://gollem.science.uva.nl/SWI-Prolog/Manual/thutil.html}
of the SWI-Prolog reference manual.
\end{itemlist}
\section{Transfer encodings}
\label{sec:transfer}
\index{chunked,encoding}%
\index{deflate,encoding}%
The HTTP protocol provides for \jargon{transfer encodings}. These define
filters applied to the data described by the \const{Content-type}. The
two most popular transfer encodings are \const{chunked} and
\const{deflate}. The \const{chunked} encoding avoids the need for
a \const{Content-length} header, sending the data in chunks, each of
which is preceded by a length. The \const{deflate} encoding provides
compression.
Transfer-encodings are supported by filters defined as foreign libraries
that realise an encoding/decoding stream on top of another stream.
Currently there are two such libraries: \pllib{http/http_chunked.pl}
and \pllib{zlib.pl}.
There is an emerging hook interface dealing with transfer encodings. The
\pllib{http/http_chunked.pl} provides a hook used by
\pllib{http/http_open.pl} to support chunked encoding in http_open/3.
Note that both \file{http_open.pl} \emph{and} \file{http_chunked.pl}
must be loaded for http_open/3 to support chunked encoding.
\subsection{The \pllib{http/http_chunked} library}
\begin{description}
\predicate{http_chunked_open}{3}{+RawStream, -DataStream, +Options}
Create a stream to realise HTTP chunked encoding or decoding. The
technique is similar to library(zlib), using a Prolog stream as a filter
on another stream. See online documentation at
\url{http://gollem.science.uva.nl/SWI-Prolog/pldoc/} for details.
\end{description}
\input{json.tex}
\section{Status}
The SWI-Prolog HTTP library is in active use in a large number of
projects. It is considered one of the SWI-Prolog core libraries that is
actively maintained and regularly extended with new features. This is
particularly true for the multi-threaded server. The XPCE and inetd based
servers are not widely used.
This library is by no means complete and you are free to extend it.
\printindex
\end{document}