1838 lines
68 KiB
Plaintext
1838 lines
68 KiB
Plaintext
\documentclass[11pt]{article}
|
|
\usepackage{times}
|
|
\usepackage{pl}
|
|
\usepackage{plpage}
|
|
\usepackage{html}
|
|
\makeindex
|
|
|
|
\onefile
|
|
\htmloutput{html} % Output directory
|
|
\htmlmainfile{index} % Main document file
|
|
\bodycolor{white} % Page colour
|
|
\sloppy
|
|
|
|
\renewcommand{\runningtitle}{SWI-Prolog HTTP support}
|
|
|
|
\begin{document}
|
|
|
|
\title{SWI-Prolog HTTP support}
|
|
\author{Jan Wielemaker \\
|
|
HCS, \\
|
|
University of Amsterdam \\
|
|
The Netherlands \\
|
|
E-mail: \email{J.Wielemaker@uva.nl}}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
This article documents the package HTTP, a series of libraries for
|
|
accessing data on HTTP servers as well as providing HTTP server
|
|
capabilities from SWI-Prolog. Both server and client are modular
|
|
libraries. The server can be operated from the Unix \program{inetd}
|
|
super-daemon as well as as a stand-alone server that runs on all
|
|
platforms supported by SWI-Prolog.
|
|
\end{abstract}
|
|
|
|
\vfill
|
|
|
|
\pagebreak
|
|
\tableofcontents
|
|
|
|
\vfill
|
|
\vfill
|
|
|
|
\newpage
|
|
|
|
|
|
\section{Introduction}
|
|
|
|
The HTTP (HyperText Transfer Protocol) is the W3C standard protocol for
|
|
transferring information between a web-client (browser) and a
|
|
web-server. The protocol is a simple \emph{envelope} protocol where
|
|
standard name/value pairs in the header are used to split the stream
|
|
into messages and communicate about the connection-status. Many
|
|
languages have client and or server libraries to deal with the HTTP
|
|
protocol, making it a suitable candidate for general purpose
|
|
client-server applications.
|
|
|
|
In this document we describe a modular infra-structure to access
|
|
web-servers from SWI-Prolog and turn Prolog into a web-server.
|
|
|
|
|
|
\subsection*{Acknowledgements}
|
|
|
|
This work has been carried out under the following projects:
|
|
\url[GARP]{http://hcs.science.uva.nl/projects/GARP/},
|
|
\url[MIA]{http://www.ins.cwi.nl/projects/MIA/},
|
|
\url[IBROW]{http://hcs.science.uva.nl/projects/ibrow/home.html},
|
|
\url[KITS]{http://kits.edte.utwente.nl/} and
|
|
\url[MultiMediaN]{http://e-culture.multimedian.nl/}
|
|
The following people have pioneered parts of this library and
|
|
contributed with bug-report and suggestions for improvements: Anjo
|
|
Anjewierden, Bert Bredeweg, Wouter Jansweijer, Bob Wielinga, Jacco
|
|
van Ossenbruggen, Michiel Hildebrandt, Matt Lilley and Keri Harris.
|
|
|
|
|
|
\section{The HTTP client libraries}
|
|
|
|
This package provides two packages for building HTTP clients. The first,
|
|
\pllib{http/http_open} is a lightweight library for opening a HTTP
|
|
URL address as a Prolog stream. It can only deal with the HTTP GET
|
|
protocol. The second, \pllib{http/http_client} is a more advanced
|
|
library dealing with \jargon{keep-alive}, \jargon{chunked transfer} and
|
|
a plug-in mechanism providing conversions based on the MIME content-type.
|
|
|
|
\input{httpopen.tex}
|
|
|
|
\subsection{The \pllib{http/http_client} library} \label{sec:httpclient}
|
|
|
|
The \pllib{http/http_client} library provides more powerful access to
|
|
reading HTTP resources, providing \jargon{keep-alive} connections,
|
|
\jargon{chunked} transfer and conversion of the content, such as
|
|
breaking down \jargon{multipart} data, parsing HTML, etc. The library
|
|
announces itself as providing \const{HTTP/1.1}.
|
|
|
|
\begin{description}
|
|
\predicate{http_get}{3}{+URL, -Reply, +Options}
|
|
Performs a HTTP GET request on the given URL and then reads the
|
|
reply using http_read_data/3. Defined options are:
|
|
|
|
\begin{description}
|
|
\termitem{connection}{ConnectionType}
|
|
If \const{close} (default) a new connection is created for this request
|
|
and closed after the request has completed. If \const{'Keep-Alive'} the
|
|
library checks for an open connection on the requested host and port
|
|
and re-uses this connection. The connection is left open if the other
|
|
party confirms the keep-alive and closed otherwise.
|
|
|
|
\termitem{http_version}{Major-Minor}
|
|
Indicate the HTTP protocol version used for the connection. Default is
|
|
\const{1.1}.
|
|
|
|
\termitem{proxy}{+Host, +Port}
|
|
Use an HTTP proxy to connect to the outside world.
|
|
|
|
\termitem{proxy_authorization}{+Authorization}
|
|
Send authorization to the proxy. Otherwise the same as the
|
|
\const{authorization} option.
|
|
|
|
\termitem{timeout}{+Timeout}
|
|
If provided, set a timeout on the stream using set_stream/2. With this
|
|
option if no new data arrives within \arg{Timeout} seconds the
|
|
stream raises an exception. Default is to wait forever
|
|
(\const{infinite}).
|
|
|
|
\termitem{user_agent}{+Agent}
|
|
Defines the value of the \const{User-Agent} field of the HTTP header.
|
|
Default is \const{SWI-Prolog (http://www.swi-prolog.org)}.
|
|
|
|
\termitem{range}{+Range}
|
|
Ask for partial content. \arg{Range} is a term \term{\arg{Unit}}{From,
|
|
To}, where \arg{From} is an integer and \arg{To} is either an integer
|
|
or the atom \const{end}. HTTP 1.1 only supports \arg{Unit} =
|
|
\const{bytes}. E.g., to ask for bytes 1000-1999, use the option
|
|
\exam{range(bytes(1000,1999))}.
|
|
|
|
\termitem{request_header}{Name = Value}
|
|
Add a line "\arg{Name}: \arg{Value}" to the HTTP request header. Both
|
|
name and value are added uninspected and literally to the request
|
|
header. This may be used to specify accept encodings, languages, etc.
|
|
Please check the RFC2616 (HTTP) document for available fields and their
|
|
meaning.
|
|
|
|
\termitem{reply_header}{Header}
|
|
Unify \arg{Header} with a list of \arg{Name}=\arg{Value} pairs
|
|
expressing all header fields of the reply. See http_read_request/2
|
|
for the result format.
|
|
\end{description}
|
|
|
|
Remaining options are passed to http_read_data/3.
|
|
|
|
\predicate{http_post}{4}{+URL, +In, -Reply, +Options}
|
|
Performs a HTTP POST request on the given URL. It is equivalent to
|
|
http_get/3, except for providing an \jargon{input document}, which is
|
|
posted using http_post_data/3.
|
|
|
|
\predicate{http_read_data}{3}{+Header, -Data, +Options}
|
|
Read data from an HTTP stream. Normally called from http_get/3 or
|
|
http_post/4. When dealing with HTTP POST in a server this predicate can
|
|
be used to retrieve the posted data. \arg{Header} is the parsed header.
|
|
\arg{Options} is a list of \term{\arg{Name}}{Value} pairs to guide the
|
|
translation of the data. The following options are supported:
|
|
|
|
\begin{description}
|
|
\termitem{to}{Target}
|
|
Do not try to interpret the data according to the MIME-type, but return
|
|
it literally according to \arg{Target}, which is one of:
|
|
\begin{description}
|
|
\termitem{stream}{Output}
|
|
Append the data to the given stream, which must be a Prolog stream open
|
|
for writing. This can be used to save the data in a (memory-)file, XPCE
|
|
object, forward it to process using a pipe, etc.
|
|
|
|
\termitem{atom}{}
|
|
Return the result as an atom. Though SWI-Prolog has no limit on the
|
|
size of atoms and provides atom-garbage collection, this options should
|
|
be used with care.%
|
|
\footnote{Currently atom-garbage collection is activated after
|
|
the creation of 10,000 atoms.}
|
|
|
|
\termitem{codes}{}
|
|
Return the page as a list of character-codes. This is especially useful
|
|
for parsing it using grammar rules.
|
|
\end{description}
|
|
\termitem{content_type}{Type}
|
|
Overrule the \const{Content-Type} as provided by the HTTP reply header.
|
|
Intended as a work-around for badly configured servers.
|
|
\end{description}
|
|
|
|
If no \term{to}{Target} option is provided the library tries the
|
|
registered plug-in conversion filters. If none of these succeed it
|
|
tries the built-in content-type handlers or returns the content as an
|
|
atom. The builtin content filters are described below. The provided
|
|
plug-ins are described in the following sections.
|
|
|
|
\begin{description}
|
|
\termitem{application/x-www-form-urlencoded}{}
|
|
This is the default encoding mechanism for POST requests issued by
|
|
a web-browser. It is broken down to a list of \arg{Name} = \arg{Value}
|
|
terms.
|
|
\end{description}
|
|
|
|
Finally, if all else fails the content is returned as an atom.
|
|
|
|
\predicate{http_post_data}{3}{+Data, +Stream, +ExtraHeader}
|
|
Write an HTTP POST request to \arg{Stream} using data from \arg{Data}
|
|
and passing the additional extra headers from \arg{ExtraHeader}.
|
|
\arg{Data} is one of:
|
|
|
|
\begin{description}
|
|
\termitem{html}{+HTMLTokens}
|
|
Send an HTML token string as produced by the library \pllib{html_write}
|
|
described in section \secref{htmlwrite}.
|
|
|
|
\termitem{file}{+File}
|
|
Send the contents of \arg{File}. The MIME type is derived from the
|
|
filename extension using file_mime_type/2.
|
|
|
|
\termitem{file}{+Type, +File}
|
|
Send the contents of \arg{File} using the provided MIME type,
|
|
i.e.\ claiming the \const{Content-type} equals \arg{Type}.
|
|
|
|
\termitem{codes}{+Codes}
|
|
Same as string(text/plain, Codes).
|
|
|
|
\termitem{codes}{+Type, +Codes}
|
|
Send string (list of character codes) using the indicated MIME-type.
|
|
|
|
\termitem{cgi_stream}{+Stream, +Len}
|
|
Read the input from \arg{Stream} which, like CGI data starts with a
|
|
partial HTTP header. The fields of this header are merged with the
|
|
provided \arg{ExtraHeader} fields. The first \arg{Len} characters
|
|
of \arg{Stream} are used.
|
|
|
|
\termitem{form}{+ListOfParameter}
|
|
Send data of the MIME type \const{application/x-www-form-urlencoded}
|
|
as produced by browsers issuing a POST request from an HTML form.
|
|
\arg{ListOfParameter} is a list of \arg{Name}=\arg{Value} or
|
|
\mbox{\arg{Name}(\arg{Value})}.
|
|
|
|
\termitem{form_data}{+ListOfData}
|
|
Send data of the MIME type \const{multipart/form-data} as produced by
|
|
browsers issuing a POST request from an HTML form using \const{enctype}
|
|
\const{multipart/form-data}. This is a somewhat simplified MIME
|
|
\const{multipart/mixed} encoding used by browser forms including
|
|
file input fields. \arg{ListOfData} is the same as for the \arg{List}
|
|
alternative described below. Below is an example from the SWI-Prolog
|
|
\url[Sesame]{http://www.openrdf.org} interface. \arg{Repository}, etc.\
|
|
are atoms providing the value, while the last argument provides a value
|
|
from a file.
|
|
|
|
\begin{code}
|
|
...,
|
|
http_post([ protocol(http),
|
|
host(Host),
|
|
port(Port),
|
|
path(ActionPath)
|
|
],
|
|
form_data([ repository = Repository,
|
|
dataFormat = DataFormat,
|
|
baseURI = BaseURI,
|
|
verifyData = Verify,
|
|
data = file(File)
|
|
]),
|
|
_Reply,
|
|
[]),
|
|
...,
|
|
\end{code}
|
|
|
|
\termitem{List}{}
|
|
If the argument is a plain list, it is sent using the MIME type
|
|
\const{multipart/mixed} and packed using mime_pack/3. See
|
|
mime_pack/3 for details on the argument format.
|
|
\end{description}
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{The MIME client plug-in} \label{sec:httpmimeplugin}
|
|
|
|
This plug-in library \pllib{http/http_mime_plugin} breaks multipart
|
|
documents that are recognised by the \exam{Content-Type:
|
|
multipart/form-data} or \exam{Mime-Version: 1.0} in the header into a
|
|
list of \arg{Name} = \arg{Value} pairs. This library deals with data
|
|
from web-forms using the \const{multipart/form-data} encoding as well as
|
|
the \url[FIPA]{http://www.fipa.org} agent-protocol messages.
|
|
|
|
|
|
\subsubsection{The SGML client plug-in} \label{sec:httpsgmlplugin}
|
|
|
|
This plug-in library \pllib{http/http_sgml_plugin} provides a bridge
|
|
between the SGML/XML/HTML parser provided by \pllib{sgml} and the http
|
|
client library. After loading this hook the following mime-types are
|
|
automatically handled by the SGML parser.
|
|
|
|
\begin{description}
|
|
\termitem{text/html}{}
|
|
Handed to \pllib{sgml} using W3C HTML 4.0 DTD, suppressing and
|
|
ignoring all HTML syntax errors. \arg{Options} is passed to
|
|
load_structure/3.
|
|
|
|
\termitem{text/xml}{}
|
|
Handed to \pllib{sgml} using dialect \const{xmlns} (XML + namespaces).
|
|
\arg{Options} is passed to load_structure/3. In particular,
|
|
\term{dialect}{xml} may be used to suppress namespace handling.
|
|
|
|
\termitem{text/x-sgml}{}
|
|
Handled to \pllib{sgml} using dialect \const{sgml}. \arg{Options}
|
|
is passed to load_structure/3.
|
|
\end{description}
|
|
|
|
|
|
\section{The HTTP server libraries} \label{sec:httpserver}
|
|
|
|
The HTTP server library consists of two parts obligatory and one
|
|
optional part. The first deals with connection management and has three
|
|
different implementation depending on the desired type of server. The
|
|
second implements a generic wrapper for decoding the HTTP request,
|
|
calling user code to handle the request and encode the answer. The
|
|
optional \file{http_dispatch} module can be used to assign HTTP
|
|
\jargon{locations} (paths) to predicates. This design is summarised in
|
|
\figref{httpserver}.
|
|
|
|
\postscriptfig[width=0.8\linewidth]{httpserver}{Design of the HTTP
|
|
server}
|
|
|
|
The functional body of the user's code is independent from the selected
|
|
server-type, making it easy to switch between the supported server
|
|
types.
|
|
|
|
|
|
\subsection{The `Body'} \label{sec:body}
|
|
|
|
The server-body is the code that handles the request and formulates a
|
|
reply. To facilitate all mentioned setups, the body is driven by
|
|
http_wrapper/5. The goal is called with the parsed request (see
|
|
\secref{request}) as argument and \const{current_output} set to a
|
|
temporary buffer. Its task is closely related to the task of a CGI
|
|
script; it must write a header declaring holding at least the
|
|
\const{Content-type} field and a body. Here is a simple body writing the
|
|
request as an HTML table.
|
|
|
|
\begin{code}
|
|
reply(Request) :-
|
|
format('Content-type: text/html~n~n', []),
|
|
format('<html>~n', []),
|
|
format('<table border=1>~n'),
|
|
print_request(Request),
|
|
format('~n</table>~n'),
|
|
format('</html>~n', []).
|
|
|
|
print_request([]).
|
|
print_request([H|T]) :-
|
|
H =.. [Name, Value],
|
|
format('<tr><td>~w<td>~w~n', [Name, Value]),
|
|
print_request(T).
|
|
\end{code}
|
|
|
|
The infrastructure recognises the header
|
|
\texttt{Transfer-encoding:~chunked}, causing it to use chunked encoding
|
|
if the client allows for it. See also \secref{transfer} and the
|
|
\const{chunked} option in http_handler/3. Other header lines are passed
|
|
verbatim to the client. Typical examples are \texttt{Set-Cookie} and
|
|
authentication headers (see \secref{auth}.
|
|
|
|
|
|
\subsubsection{Returning special status codes} \label{sec:httpspecials}
|
|
|
|
Besides returning a page by writing it to the current output stream,
|
|
the server goal can raise an exception using throw/1 to generate special
|
|
pages such as \const{not_found}, \const{moved}, etc. The defined
|
|
exceptions are:
|
|
|
|
\begin{description}
|
|
\termitem{http_reply}{+Reply, +HdrExtra}
|
|
Return a result page using http_reply/3. See http_reply/3 for details.
|
|
|
|
\termitem{http_reply}{+Reply}
|
|
Equivalent to \term{http_reply}{Reply, []}.
|
|
|
|
\termitem{http}{not_modified}
|
|
Equivalent to \term{http_reply}{not_modified, []}. This exception is
|
|
for backward compatibility and can be used by the server to indicate
|
|
the referenced resource has not been modified since it was requested
|
|
last time.
|
|
\end{description}
|
|
|
|
|
|
\input{httpdispatch.tex}
|
|
\input{httpdirindex.tex}
|
|
\input{httpsession.tex}
|
|
|
|
|
|
\subsection{HTTP Authentication}
|
|
\label{sec:auth}
|
|
|
|
The module \file{http/http_authenticate} provides the basics to validate
|
|
an HTTP \const{Authorization} error. User and password information are
|
|
read from a Unix/Apache compatible password file. This information, as
|
|
well as the validation process is cached to achieve optimal performance.
|
|
|
|
\begin{description}
|
|
\predicate{http_authenticate}{+Type, +Request, -User}
|
|
True if Request contains the information to continue according
|
|
to Type. Type identifies the required authentication technique:
|
|
|
|
\begin{description}
|
|
\termitem{basic}{+PasswordFile}
|
|
Use HTTP \const{Basic} authentication and verify the password
|
|
from PasswordFile. PasswordFile is a file holding
|
|
usernames and passwords in a format compatible to
|
|
Unix and Apache. Each line is record with \verb$:$
|
|
separated fields. The first field is the username and
|
|
the second the password _hash_. Password hashes are
|
|
validated using crypt/2.
|
|
\end{description}
|
|
|
|
Successful authorization is cached for 60 seconds to avoid
|
|
overhead of decoding and lookup of the user and password data.
|
|
|
|
http_authenticate/3 just validates the header. If authorization
|
|
is not provided the browser must be challenged, in response to
|
|
which it normally opens a user-password dialogue. Example code
|
|
realising this is below. The exception causes the HTTP wrapper
|
|
code to generate an HTTP 401 reply.
|
|
|
|
\begin{code}
|
|
...,
|
|
( http_authenticate(basic(passwd), Request, User)
|
|
-> true
|
|
; throw(http_reply(authorise(basic, Realm)))
|
|
).
|
|
\end{code}
|
|
|
|
Alternatively \term{basic}{+PasswordFile} can be passed as an option to
|
|
http_handler/3.
|
|
\end{description}
|
|
|
|
\input{httpopenid.tex}
|
|
|
|
%================================================================
|
|
\subsection{Get parameters from HTML forms}
|
|
\label{sec:httpparam}
|
|
|
|
The library \pllib{http/http_parameters} provides two predicates to
|
|
fetch HTTP request parameters as a type-checked list easily. The
|
|
library transparently handles both GET and POST requests. It builds
|
|
on top of the low-level request representation described in
|
|
\secref{request}.
|
|
|
|
\begin{description}
|
|
\predicate{http_parameters}{2}{+Request, ?Parameters}
|
|
The predicate is passes the \arg{Request} as provided to the handler
|
|
goal by http_wrapper/5 as well as a partially instantiated lists
|
|
describing the requested parameters and their types. Each parameter
|
|
specification in \arg{Parameters} is a term of the format
|
|
\mbox{\arg{Name}(\arg{-Value}, \arg{+Options})}. \arg{Options} is
|
|
a list of option terms describing the type, default, etc. If no options
|
|
are specified the parameter must be present and its value is returned in
|
|
\arg{Value} as an atom. If a parameter is missing the exception
|
|
\term{error}{\term{existence_error}{form_data, Name}, _} is thrown.
|
|
Options fall into three categories: those that handle presence of
|
|
the parameter, those that guide conversion and restrict types and
|
|
those that support automatic generation of documention. First,
|
|
the presence-options:
|
|
|
|
\begin{description}
|
|
\termitem{default}{Default}
|
|
If the named parameter is missing, \arg{Value} is unified to
|
|
\arg{Default}.
|
|
|
|
\termitem{optional}{true}
|
|
If the named parameter is missing, \arg{Value} is left unbound and
|
|
no error is generated.
|
|
|
|
\termitem{list}{Type}
|
|
The same parameter may not appear or appear multiple times. If this
|
|
option is present, \const{default} and \const{optional} are ignored and
|
|
the value is returned as a list. Type checking options are processed on
|
|
each value.
|
|
|
|
\termitem{zero_or_more}{}
|
|
Deprecated. Use \term{List}{Type}.
|
|
\end{description}
|
|
|
|
The type and conversion options are given below. The type-language can
|
|
be extended by providing clauses for the multifile hook
|
|
http:convert_parameter/3.
|
|
|
|
\begin{description}
|
|
\termitem{;}{Type1, Type2}
|
|
Succeed if either \arg{Type1} or \arg{Type2} applies. It allows
|
|
for checks such as \exam{(nonneg;oneof([infinite]))} to specify
|
|
an integer or a symbolic value.
|
|
|
|
\termitem{oneof}{List}
|
|
Succeeds if the value is member of the given list.
|
|
|
|
\definition{length $> N$}
|
|
Succeeds if value is an atom of more than $N$ characters.
|
|
|
|
\definition{length $>= N$}
|
|
Succeeds if value is an atom of more or than equal to $N$ characters.
|
|
|
|
\definition{length $< N$}
|
|
Succeeds if value is an atom of less than $N$ characters.
|
|
|
|
\definition{length $=< N$}
|
|
Succeeds if value is an atom of length than or equal to $N$ characters.
|
|
|
|
\termitem{atom}{}
|
|
No-op. Allowed for consistency.
|
|
|
|
\termitem{between}{+Low, +High}
|
|
Convert value to a number and if either \arg{Low} or \arg{High} is a
|
|
float, force value to be a float. Then check that the value is in the
|
|
given range, which includes the boundaries.
|
|
|
|
\termitem{boolean}{}
|
|
Translate =true=, =yes=, =on= and '1' into =true=; =false=, =no=,
|
|
=off= and '0' into =false= and raises an error otherwise.
|
|
|
|
\termitem{float}{}
|
|
Convert value to a float. Integers are transformed into float. Throws a
|
|
type-error otherwise.
|
|
|
|
\termitem{integer}{}
|
|
Convert value to an integer. Throws a type-error otherwise.
|
|
|
|
\termitem{nonneg}{}
|
|
Convert value to a non-negative integer. Throws a type-error
|
|
of the value cannot be converted to an integer and a domain-error
|
|
otherwise.
|
|
|
|
\termitem{number}{}
|
|
Convert value to a number. Throws a type-error otherwise.
|
|
\end{description}
|
|
|
|
The last set of options is to support automatic generation of HTTP
|
|
API documentation from the sources.\footnote{This facility is under
|
|
development in ClioPatria; see \file{http_help.pl}}.
|
|
|
|
\begin{description}
|
|
\termitem{description}{+Atom}
|
|
Description of the parameter in plain text.
|
|
|
|
\termitem{group}{+Parameters, +Options}
|
|
Define a logical group of parameters. \arg{Parameters} are processed
|
|
as normal. \arg{Options} may include a description of the group. Groups
|
|
can be nested.
|
|
\end{description}
|
|
|
|
Below is an example
|
|
|
|
\begin{code}
|
|
reply(Request) :-
|
|
http_parameters(Request,
|
|
[ title(Title, [ optional(true) ]),
|
|
name(Name, [ length >= 2 ]),
|
|
age(Age, [ between(0, 150) ])
|
|
]),
|
|
...
|
|
\end{code}
|
|
|
|
Same as \term{http_parameters}{Request, Parameters, []}
|
|
|
|
\predicate{http_parameters}{3}{+Request, ?Parameters, +Options}
|
|
In addition to http_parameters/2, the following options are defined.
|
|
|
|
\begin{description}
|
|
\termitem{form_data}{-Data}
|
|
Return the entire set of provided \arg{Name}=\arg{Value} pairs from
|
|
the GET or POST request. All values are returned as atoms.
|
|
|
|
\termitem{attribute_declarations}{:Goal}
|
|
If a parameter specification lacks the parameter options, call
|
|
\term{call}{Goal, +ParamName, -Options} to find the options. Intended
|
|
to share declarations over many calls to http_parameters/3. Using
|
|
this construct the above can be written as below.
|
|
|
|
\begin{code}
|
|
reply(Request) :-
|
|
http_parameters(Request,
|
|
[ title(Title),
|
|
name(Name),
|
|
age(Age)
|
|
],
|
|
[ attribute_declarations(param)
|
|
]),
|
|
...
|
|
|
|
param(title, [optional(true)]).
|
|
param(name, [length >= 2 ]).
|
|
param(age, [integer]).
|
|
\end{code}
|
|
\end{description}
|
|
\end{description}
|
|
|
|
|
|
\subsection{Request format} \label{sec:request}
|
|
|
|
The body-code (see \secref{body}) is driven by a \arg{Request}. This
|
|
request is generated from http_read_request/2 defined in
|
|
\pllib{http/http_header}.
|
|
|
|
|
|
\begin{description}
|
|
\predicate{http_read_request}{2}{+Stream, -Request}
|
|
Reads an HTTP request from \arg{Stream} and unify \arg{Request} with
|
|
the parsed request. \arg{Request} is a list of \term{\arg{Name}}{Value}
|
|
elements. It provides a number of predefined elements for the result
|
|
of parsing the first line of the request, followed by the additional
|
|
request parameters. The predefined fields are:
|
|
|
|
\begin{description}
|
|
\termitem{host}{Host}
|
|
If the request contains \verb$Host: $\arg{Host}, Host is unified
|
|
with the host-name. If \arg{Host} is of the format <host>:<port>
|
|
\arg{Host} only describes <host> and a field \term{port}{Port} where
|
|
\arg{Port} is an integer is added.
|
|
|
|
\termitem{input}{Stream}
|
|
The \arg{Stream} is passed along, allowing to read more data or
|
|
requests from the same stream. This field is always present.
|
|
|
|
\termitem{method}{Method}
|
|
\arg{Method} is one of \const{get}, \const{put} or \const{post}. This
|
|
field is present if the header has been parsed successfully.
|
|
|
|
\termitem{path}{Path}
|
|
Path associated to the request. This field is always present.
|
|
|
|
\termitem{peer}{Peer}
|
|
\arg{Peer} is a term \term{ip}{A,B,C,D} containing the IP address of
|
|
the contacting host.
|
|
|
|
\termitem{port}{Port}
|
|
Port requested. See \const{host} for details.
|
|
|
|
\termitem{request_uri}{RequestURI}
|
|
This is the untranslated string that follows the method in the
|
|
request header. It is used to construct the path and search fields
|
|
of the \arg{Request}. It is provided because reconstructing this
|
|
string from the path and search fields may yield a different value
|
|
due to different usage of percent encoding.
|
|
|
|
\termitem{search}{ListOfNameValue}
|
|
Search-specification of URI. This is the part after the \chr{?},
|
|
normally used to transfer data from HTML forms that use the
|
|
`\const{GET}' protocol. In the URL it consists of a www-form-encoded
|
|
list of \arg{Name}=\arg{Value} pairs. This is mapped to a list of
|
|
Prolog \arg{Name}=\arg{Value} terms with decoded names and values.
|
|
This field is only present if the location contains a
|
|
search-specification.
|
|
|
|
\termitem{http_version}{Major-Minor}
|
|
If the first line contains the \const{HTTP/}\arg{Major}.\arg{Minor}
|
|
version indicator this element indicate the HTTP version of the
|
|
peer. Otherwise this field is not present.
|
|
|
|
\termitem{cookie}{ListOfNameValue}
|
|
If the header contains a \const{Cookie} line, the value of the
|
|
cookie is broken down in \arg{Name}=\arg{Value} pairs, where the
|
|
\arg{Name} is the lowercase version of the cookie name as used
|
|
for the HTTP fields.
|
|
|
|
\termitem{set_cookie}{set_cookie(Name, Value, Options)}
|
|
If the header contains a \const{SetCookie} line, the cookie field
|
|
is broken down into the \arg{Name} of the cookie, the \arg{Value}
|
|
and a list of \arg{Name}=\arg{Value} pairs for additional options
|
|
such as \const{expire}, \const{path}, \const{domain} or \const{secure}.
|
|
\end{description}
|
|
|
|
If the first line of the request is tagged with
|
|
\const{HTTP/}\arg{Major}.\arg{Minor}, http_read_request/2 reads all
|
|
input upto the first blank line. This header consists of
|
|
\arg{Name}:\arg{Value} fields. Each such field appears as a term
|
|
\term{\arg{Name}}{Value} in the \arg{Request}, where \arg{Name} is
|
|
canonised for use with Prolog. Canonisation implies that the
|
|
\arg{Name} is converted to lower case and all occurrences of the
|
|
\chr{-} are replaced by \chr{_}. The value for the
|
|
\const{Content-length} fields is translated into an integer.
|
|
\end{description}
|
|
|
|
Here is an example:
|
|
|
|
\begin{code}
|
|
?- http_read_request(user, X).
|
|
|: GET /mydb?class=person HTTP/1.0
|
|
|: Host: gollem
|
|
|:
|
|
X = [ input(user),
|
|
method(get),
|
|
search([ class = person
|
|
]),
|
|
path('/mydb'),
|
|
http_version(1-0),
|
|
host(gollem)
|
|
].
|
|
\end{code}
|
|
|
|
|
|
\subsubsection{Handling POST requests}
|
|
|
|
Where the HTTP \const{GET} operation is intended to get a document,
|
|
using a \arg{path} and possibly some additional search information,
|
|
the \const{POST} operation is intended to hand potentially large
|
|
amounts of data to the server for processing.
|
|
|
|
The \arg{Request} parameter above contains the term \term{method}{post}.
|
|
The data posted is left on the input stream that is available through
|
|
the term \term{input}{Stream} from the \arg{Request} header. This data
|
|
can be read using http_read_data/3 from the HTTP client library. Here is
|
|
a demo implementation simply returning the parsed posted data as plain
|
|
text (assuming pp/1 pretty-prints the data).
|
|
|
|
\begin{code}
|
|
reply(Request) :-
|
|
member(method(post), Request), !,
|
|
http_read_data(Request, Data, []),
|
|
format('Content-type: text/plain~n~n', []),
|
|
pp(Data).
|
|
\end{code}
|
|
|
|
If the POST is initiated from a browser, content-type is generally
|
|
either \const{application/x-www-form-urlencoded} or
|
|
\const{multipart/form-data}. The latter is broken down automatically
|
|
if the plug-in \pllib{http/http_mime_plugin} is loaded.
|
|
|
|
|
|
\subsection{Running the server}
|
|
|
|
The functionality of the server should be defined in one Prolog file (of
|
|
course this file is allowed to load other files). Depending on the
|
|
wanted server setup this `body' is wrapped into a small Prolog file
|
|
combining the body with the appropriate server interface. There are
|
|
three supported server-setups. For most applications we advice the
|
|
multi-threaded server. Examples of this server architecture are the
|
|
\url[PlDoc]{http://www.swi-prolog.org/packages/pldoc.html} documentation
|
|
system and the \url[SeRQL]{http://www.swi-prolog.org/packages/SeRQL/}
|
|
Semantic Web server infrastructure.
|
|
|
|
All the server setups may be wrapped in a \jargon{reverse proxy} to
|
|
make them available from the public web-server as described in
|
|
\secref{proxy}.
|
|
|
|
|
|
\begin{itemlist}
|
|
\item [Using \pllib{thread_httpd} for a multi-threaded server]
|
|
This server exploits the multi-threaded version of SWI-Prolog, running
|
|
the users body code parallel from a pool of worker threads. As it avoids
|
|
the state engine and copying required in the event-driven server it is
|
|
generally faster and capable to handle multiple requests concurrently.
|
|
|
|
This server is harder to debug due to the involved threading, although
|
|
the GUI tracer provides reasonable support for multi-threaded
|
|
applications using the tspy/1 command. It can provide fast communication
|
|
to multiple clients and can be used for more demanding servers.
|
|
|
|
\item [Using \pllib{xpce_httpd} for an event-driven server]
|
|
This approach provides a single-threaded event-driven application. The
|
|
clients talk to XPCE sockets that collect an HTTP request. The server
|
|
infra-structure can talk to multiple clients simultaneously, but once
|
|
a request is complete the wrappers call the user's goal and blocks all
|
|
further activity until the request is handled. Requests from multiple
|
|
clients are thus fully serialised in one Prolog process.
|
|
|
|
This server setup is very suitable for debugging as well as embedded
|
|
server in simple applications in a fairly controlled environment.
|
|
|
|
\item [Using \pllib{inetd_httpd} for server-per-client]
|
|
In this setup the Unix \program{inetd} user-daemon is used to initialise
|
|
a server for each connection. This approach is especially suitable for
|
|
servers that have a limited startup-time. In this setup a crashing
|
|
client does not influence other requests.
|
|
|
|
This server is very hard to debug as the server is not connected to the
|
|
user environment. It provides a robust implementation for servers that
|
|
can be started quickly.
|
|
\end{itemlist}
|
|
|
|
|
|
\subsubsection{Common server interface options}
|
|
|
|
All the server interfaces provide \term{http_server}{:Goal, +Options}
|
|
to create the server. The list of options differ, but the servers share
|
|
common options:
|
|
|
|
\begin{description}
|
|
\termitem{port}{?Port}
|
|
Specify the port to listen to for stand-alone servers. \arg{Port} is
|
|
either an integer or unbound. If unbound, it is unified to the selected
|
|
free port.
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{Multi-threaded Prolog} \label{sec:mthttpd}
|
|
|
|
The \pllib{http/thread_httpd.pl} provides the infrastructure to manage
|
|
multiple clients using a pool of \jargon{worker-threads}. This realises
|
|
a popular server design, also seen in Java Tomcat and Microsoft .NET.
|
|
As a single persistent server process maintains communication to all
|
|
clients startup time is not an important issue and the server can
|
|
easily maintain state-information for all clients.
|
|
|
|
In addition to the functionality provided by the other (XPCE and
|
|
inetd) servers, the threaded server can also be used to realise an
|
|
HTTPS server exploiting the \pllib{ssl} library. See option
|
|
\term{ssl}{+SSLOptions} below.
|
|
|
|
|
|
\begin{description}
|
|
\predicate{http_server}{3}{:Goal, +Options}
|
|
Create the server. \arg{Options} must provide the \term{port}{?Port}
|
|
option to specify the port the server should listen to. If \arg{Port} is
|
|
unbound an arbitrary free port is selected and \arg{Port} is unified to
|
|
this port-number. The server consists of a small Prolog thread
|
|
accepting new connection on \arg{Port} and dispatching these to a pool
|
|
of workers. Defined \arg{Options} are:
|
|
|
|
\begin{description}
|
|
\termitem{port}{?Port}
|
|
Port the server should listen to. If unbound \arg{Port} is unified with
|
|
the selected free port.
|
|
|
|
\termitem{workers}{+N}
|
|
Defines the number of worker threads in the pool. Default is to use
|
|
\arg{two} workers. Choosing the optimal value for best performance is a
|
|
difficult task depending on the number of CPUs in your system and how
|
|
much resources are required for processing a request. Too high numbers
|
|
makes your system switch too often between threads or even swap if there
|
|
is not enough memory to keep all threads in memory, while a too low
|
|
number causes clients to wait unnecessary for other clients to complete.
|
|
See also http_workers/2.
|
|
|
|
\termitem{timeout}{+SecondsOrInfinite}
|
|
Determines the maximum period of inactivity handling a request. If no
|
|
data arrives within the specified time since the last data arrived the
|
|
connection raises an exception, the worker discards the client and
|
|
returns to the pool-queue for a new client. Default is \const{infinite},
|
|
making each worker wait forever for a request to complete. Without a
|
|
timeout, a worker may wait forever on an a client that doesn't complete
|
|
its request.
|
|
|
|
\termitem{keep_alive_timeout}{+SecondsOrInfinite}
|
|
Maximum time to wait for new activity on \emph{Keep-Alive} connections.
|
|
Choosing the correct value for this parameter is hard. Disabling
|
|
Keep-Alive is bad for performance if the clients request multiple
|
|
documents for a single page. This may ---for example-- be caused by HTML
|
|
frames, HTML pages with images, associated CSS files, etc. Keeping
|
|
a connection open in the threaded model however prevents the thread
|
|
servicing the client servicing other clients. The default is 5 seconds.
|
|
|
|
\termitem{local}{+KBytes}
|
|
Size of the local-stack for the workers. Default is taken from the
|
|
commandline option.
|
|
|
|
\termitem{global}{+KBytes}
|
|
Size of the global-stack for the workers. Default is taken from the
|
|
commandline option.
|
|
|
|
\termitem{trail}{+KBytes}
|
|
Size of the trail-stack for the workers. Default is taken from the
|
|
commandline option.
|
|
|
|
\termitem{ssl}{+SSLOptions}
|
|
Use SSL (Secure Socket Layer) rather than plan TCP/IP. A server created
|
|
this way is accessed using the \const{https://} protocol. SSL allows for
|
|
encrypted communication to avoid others from tapping the wire as well as
|
|
improved authentication of client and server. The \arg{SSLOptions}
|
|
option list is passed to ssl_init/3. The port option of the main option
|
|
list is forwarded to the SSL layer. See the \pllib{ssl} library for
|
|
details.
|
|
\end{description}
|
|
|
|
\predicate{http_server_property}{2}{?Port, ?Property}
|
|
True if \arg{Property} is a property of the HTTP server running at
|
|
\arg{Port}. Defined properties are:
|
|
|
|
\begin{description}
|
|
\termitem{goal}{:Goal}
|
|
Goal used to start the server. This is often http_dispatch/1.
|
|
\termitem{start_time}{?Time}
|
|
Time-stamp when the server was created. See format_time/3 for
|
|
creating a human-readable representation.
|
|
\end{description}
|
|
|
|
\predicate{http_workers}{2}{:Port, ?Workers}
|
|
Query or manipulate the number of workers of the server identified by
|
|
\arg{Port}. If \arg{Workers} is unbound it is unified with the number
|
|
of running servers. If it is an integer greater than the current size
|
|
of the worker pool new workers are created with the same specification
|
|
as the running workers. If the number is less than the current size
|
|
of the worker pool, this predicate inserts a number of `quit' requests
|
|
in the queue, discarding the excess workers as they finish their jobs
|
|
(i.e.\ no worker is abandoned while serving a client).
|
|
|
|
This can be used to tune the number of workers for performance. Another
|
|
possible application is to reduce the pool to one worker to facilitate
|
|
easier debugging.
|
|
|
|
\predicate{http_stop_server}{2}{+Port, +Options}
|
|
Stop the HTTP server at Port. Halting a server is done
|
|
\textit{gracefully}, which means that requests being processed are not
|
|
abandoned. The \arg{Options} list is for future refinements of this
|
|
predicate such as a forced immediate abort of the server, but is
|
|
currently ignored.
|
|
|
|
\predicate{http_current_worker}{2}{?Port, ?ThreadID}
|
|
True if \arg{ThreadID} is the identifier of a Prolog thread serving
|
|
\arg{Port}. This predicate is motivated to allow for the use of
|
|
arbitrary interaction with the worker thread for development and
|
|
statistics.
|
|
|
|
\predicate{http_spawn}{2}{:Goal, +Spec}
|
|
Continue handling this request in a new thread running \arg{Goal}. After
|
|
http_spawn/2, the worker returns to the pool to process new requests. In
|
|
its simplest form, \arg{Spec} is the name of a thread pool as defined by
|
|
thread_pool_create/3. Alternatively it is an option list, whose options
|
|
are passed to thread_create_in_pool/4 if \arg{Spec} contains
|
|
\term{pool}{Pool} or to thread_create/3 of the pool option is not
|
|
present. If the dispatch module is used (see \secref{httpdispatch}),
|
|
spawning is normally specified as an option to the http_handler/3
|
|
registration.
|
|
|
|
We recomment the use of thread pools. They allow registration of a set
|
|
of threads using common characteristics, specify how many can be active
|
|
and what to do if all threads are active. A typical application may
|
|
define a small pool of threads with large stacks for computation
|
|
intensive tasks, and a large pool of threads with small stacks to serve
|
|
media. The declaration could be the one below, allowing for max 3
|
|
concurrent solvers and a maximum backlog of 5 and 30 tasks creating
|
|
image thumbnails.
|
|
|
|
\begin{code}
|
|
:- use_module(library(thread_pool)).
|
|
|
|
:- thread_pool_create(compute, 3,
|
|
[ local(20000), global(100000), trail(50000),
|
|
backlog(5)
|
|
]).
|
|
:- thread_pool_create(media, 30,
|
|
[ local(100), global(100), trail(100),
|
|
backlog(100)
|
|
]).
|
|
|
|
:- http_handler('/solve', solve, [spawn(compute)]).
|
|
:- http_handler('/thumbnail', thumbnail, [spawn(media)]).
|
|
\end{code}
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{From an interactive Prolog session using XPCE}
|
|
|
|
The \pllib{http/xpce_httpd.pl} provides the infrastructure to manage
|
|
multiple clients with an event-driven control-structure. This version
|
|
can be started from an interactive Prolog session, providing a
|
|
comfortable infra-structure to debug the body of your server. It also
|
|
allows the combination of an (XPCE-based) GUI with web-technology in one
|
|
application.
|
|
|
|
\begin{description}
|
|
\predicate{http_server}{2}{:Goal, +Options}
|
|
Create an instance of \class{interactive_httpd}. \arg{Options} must
|
|
provide the \term{port}{?Port} option to specify the port the server
|
|
should listen to. If \arg{Port} is unbound an arbitrary free port is
|
|
selected and \arg{Port} is unified to this port-number. Currently
|
|
no options are defined.
|
|
\end{description}
|
|
|
|
The file \file{demo_xpce} gives a typical example of this wrapper,
|
|
assuming \file{demo_body} defines the predicate reply/1.
|
|
|
|
\begin{code}
|
|
:- use_module(xpce_httpd).
|
|
:- use_module(demo_body).
|
|
|
|
server(Port) :-
|
|
http_server(reply, Port, []).
|
|
\end{code}
|
|
|
|
The created server opens a server socket at the selected address and
|
|
waits for incoming connections. On each accepted connection it collects
|
|
input until an HTTP request is complete. Then it opens an input stream
|
|
on the collected data and using the output stream directed to the XPCE
|
|
\class{socket} it calls http_wrapper/5. This approach is fundamentally
|
|
different compared to the other approaches:
|
|
|
|
\begin{itemlist}
|
|
\item [Server can handle multiple connections]
|
|
When \emph{inetd} will start a server for each \emph{client}, and CGI
|
|
starts a server for each \emph{request}, this approach starts a single
|
|
server handling multiple clients.
|
|
|
|
\item [Requests are serialised]
|
|
All calls to \arg{Goal} are fully serialised, processing on behalf of a
|
|
new client can only start after all previous requests are answered. This
|
|
easier and quite acceptable if the server is mostly inactive and
|
|
requests take not very long to process.
|
|
|
|
\item [Lifetime of the server]
|
|
The server lives as long as Prolog runs.
|
|
\end{itemlist}
|
|
|
|
|
|
\subsubsection{From (Unix) inetd}
|
|
|
|
All modern Unix systems handle a large number of the services they run
|
|
through the super-server \emph{inetd}. This program reads
|
|
\file{/etc/inetd.conf} and opens server-sockets on all ports defined in
|
|
this file. As a request comes in it accepts it and starts the associated
|
|
server such that standard I/O refers to the socket. This approach has
|
|
several advantages:
|
|
|
|
\begin{itemlist}
|
|
\item [Simplification of servers]
|
|
Servers don't have to know about sockets and -operations.
|
|
|
|
\item [Centralised authorisation]
|
|
Using \emph{tcpwrappers} simple and effective firewalling of all
|
|
services is realised.
|
|
|
|
\item [Automatic start and monitor]
|
|
The inetd automatically starts the server `just-in-time' and starts
|
|
additional servers or restarts a crashed server according to the
|
|
specifications.
|
|
\end{itemlist}
|
|
|
|
The very small generic script for handling inetd based connections
|
|
is in \file{inetd_httpd}, defining http_server/1:
|
|
|
|
\begin{description}
|
|
\predicate{http_server}{2}{:Goal, +Options}
|
|
Initialises and runs http_wrapper/5 in a loop until failure or
|
|
end-of-file. This server does not support the \arg{Port} option
|
|
as the port is specified with the \program{inetd} configuration.
|
|
The only supported option is \arg{After}.
|
|
\end{description}
|
|
|
|
Here is the example from \file{demo_inetd}
|
|
|
|
\begin{code}
|
|
#!/usr/bin/pl -t main -q -f
|
|
:- use_module(demo_body).
|
|
:- use_module(inetd_httpd).
|
|
|
|
main :-
|
|
http_server(reply).
|
|
\end{code}
|
|
|
|
With the above file installed in \file{/home/jan/plhttp/demo_inetd},
|
|
the following line in \file{/etc/inetd} enables the server at port
|
|
4001 guarded by \emph{tcpwrappers}. After modifying inetd, send the
|
|
daemon the \const{HUP} signal to make it reload its configuration.
|
|
For more information, please check \manref{inetd.conf}{5}.
|
|
|
|
\begin{code}
|
|
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
|
|
\end{code}
|
|
|
|
|
|
\subsubsection{MS-Windows}
|
|
|
|
There are rumours that \emph{inetd} has been ported to Windows.
|
|
|
|
|
|
\subsubsection{As CGI script}
|
|
|
|
To be done.
|
|
|
|
|
|
\subsubsection{Using a reverse proxy}
|
|
\label{sec:proxy}
|
|
|
|
There are three options for public deployment of a service. One is to
|
|
run it on a dedicated machine on port 80, the standard HTTP port. The
|
|
machine may be a virtual machine running ---for example--- under
|
|
\url[VMWARE]{http://www.vmware.com} or
|
|
\url[XEN]{http://www.cl.cam.ac.uk/research/srg/netos/xen/}. The
|
|
(virtual) machine approach isolates security threads and allows for
|
|
using a standard port. The server can also be hosted on a non-standard
|
|
port such as 8000, or 8080. Using non-standard ports however may cause
|
|
problems with intermediate proxy- and/or firewall policies. Isolation
|
|
can be achieved using a Unix \jargon{chroot} environment. Another
|
|
option, also recommended for \jargon{Tomcat} servers, is the use of
|
|
Apache \jargon{reverse proxies}. This causes the main web-server to
|
|
relay requests below a given URL location to our Prolog based server.
|
|
This approach has several advantages:
|
|
|
|
\begin{itemize}
|
|
\item We can access the server on port 80, just as for a dedicated
|
|
machine. We do not need a machine though and we only need
|
|
access to the Apache configuration.
|
|
\item As Apache is doing the front-line service, the Prolog server
|
|
is normally protected from malformed HTTP requests that could
|
|
result in denial of service or otherwise compromise the
|
|
server. In addition, Apache can provide encodings such as
|
|
compression to the outside world.
|
|
\end{itemize}
|
|
|
|
Note that the proxy technology can be combined with isolation methods
|
|
such as dedicated machines, virtual machines and chroot jails. The
|
|
proxy can also provide load balancing.
|
|
|
|
\paragraph{Setting up a reverse proxy}
|
|
|
|
The Apache reverse proxy setup is really simple. Ensure the modules
|
|
\const{proxy} and \const{proxy_http} are loaded. Then add two simple
|
|
rules to the server configuration. Below is an example that makes a
|
|
PlDoc server on port 4000 available from the main Apache server at port
|
|
80.
|
|
|
|
\begin{code}
|
|
ProxyPass /pldoc/ http://localhost:4000/pldoc/
|
|
ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/
|
|
\end{code}
|
|
|
|
Apache rewrites the HTTP headers passing by, but using the above rules
|
|
it does not examine the content. This implies that URLs embedded in the
|
|
(HTML) content must use relative addressing. If the locations on the
|
|
public and Prolog server are the same (as in the example above) it is
|
|
allowed to use absolute locations. I.e.\ \const{/pldoc/search} is ok,
|
|
but \const{http://myhost.com:4000/pldoc/search} is \emph{not}. If
|
|
the locations on the server differ, locations must be relative (i.e.\
|
|
not start with \chr{/}.
|
|
|
|
This problem can also be solved using the contributed Apache module
|
|
\const{proxy_html} that can be instructed to rewrite URLs embedded in
|
|
HTML documents. In our experience, this is not troublefree as URLs can
|
|
appear in many places in generated documents. JavaScript can create
|
|
URLs on the fly, which makes rewriting virtually impossible.
|
|
|
|
\subsection{The wrapper library}
|
|
|
|
The body is called by the module \pllib{http/http_wrapper.pl}. This
|
|
module realises the communication between the I/O streams and the body
|
|
described in \secref{body}. The interface is realised by
|
|
http_wrapper/5:
|
|
|
|
\begin{description}
|
|
\predicate{http_wrapper}{5}{:Goal, +In, +Out, -Connection, +Options}
|
|
Handle an HTTP request where \arg{In} is an input stream from the
|
|
client, \arg{Out} is an output stream to the client and \arg{Goal}
|
|
defines the goal realising the body. \arg{Connection} is unified to
|
|
\const{'Keep-alive'} if both ends of the connection want to continue the
|
|
connection or \const{close} if either side wishes to close the
|
|
connection.
|
|
|
|
This predicate reads an HTTP request-header from \arg{In}, redirects
|
|
current output to a memory file and then runs \exam{call(Goal,
|
|
Request)}, watching for exceptions and failure. If \arg{Goal} executes
|
|
successfully it generates a complete reply from the created output.
|
|
Otherwise it generates an HTTP server error with additional context
|
|
information derived from the exception.
|
|
|
|
http_wrapper/5 supports the following options:
|
|
|
|
\begin{description}
|
|
\termitem{request}{-Request}
|
|
Return the executed request to the caller.
|
|
|
|
\termitem{peer}{+Peer}
|
|
Add peer(Peer) to the request header handed to \arg{Goal}. The format
|
|
of \arg{Peer} is defined by tcp_accept/3 from the clib package.
|
|
\end{description}
|
|
|
|
\predicate{http:request_expansion}{2}{+RequestIn, -RequestOut}
|
|
This \jargon{multifile} hook predicate is called just before the goal
|
|
that produces the body, while the output is already redirected to
|
|
collect the reply. If it succeeds it must return a valid modified
|
|
request. It is allowed to throw exceptions as defined in
|
|
\secref{httpspecials}. It is intended for operations such as mapping
|
|
paths, deny access for certain requests or manage cookies. If it writes
|
|
output, these must be HTTP header fields that are added \emph{before}
|
|
header fields written by the body. The example below is from the
|
|
session management library (see \secref{httpsession}) sets a cookie.
|
|
|
|
\begin{code}
|
|
...,
|
|
format('Set-Cookie: ~w=~w; path=~w~n', [Cookie, SessionID, Path]),
|
|
...,
|
|
\end{code}
|
|
|
|
\predicate{http_current_request}{1}{-Request}
|
|
Get access to the currently executing request. \arg{Request} is the
|
|
same as handed to \arg{Goal} of http_wrapper/5 \emph{after} applying
|
|
rewrite rules as defined by http:request_expansion/2. Raises an
|
|
existence error if there is no request in progress.
|
|
|
|
\predicate{http_relative_path}{2}{+AbsPath, -RelPath}
|
|
Convert an absolute path (without host, fragment or search) into a path
|
|
relative to the current page, defined as the path component from the
|
|
current request (see http_current_request/1). This call is intended to
|
|
create reusable components returning relative paths for easier support
|
|
of reverse proxies.
|
|
|
|
If ---for whatever reason--- the conversion is not possible it simply
|
|
unifies \arg{RelPath} to \arg{AbsPath}.
|
|
\end{description}
|
|
|
|
\input{httphost}
|
|
|
|
\input{httplog}
|
|
|
|
\subsection{Debugging Servers} \label{sec:debug}
|
|
|
|
The library \pllib{http/http_error.pl} defines a hook that decorates
|
|
uncaught exceptions with a stack-trace. This will generate a \emph{500
|
|
internal server error} document with a stack-trace. To enable this
|
|
feature, simply load this library. Please do note that providing
|
|
error information to the user simplifies the job of a hacker trying
|
|
to compromise your server. It is therefore not recommended to load
|
|
this file by default.
|
|
|
|
The example program \file{calc.pl} has the error handler loaded which
|
|
can be triggered by forcing a divide-by-zero in the calculator.
|
|
|
|
|
|
\subsection{Handling HTTP headers} \label{sec:httpheader}
|
|
|
|
The library \pllib{http/http_header} provides primitives for parsing and
|
|
composing HTTP headers. Its functionality is normally hidden by the
|
|
other parts of the HTTP server and client libraries. We provide a brief
|
|
overview of http_reply/3 which can be accessed from the reply body using
|
|
an exception as explain in \secref{httpspecials}.
|
|
|
|
|
|
\begin{description}
|
|
\predicate{http_reply}{3}{+Type, +Stream, +HdrExtra}
|
|
Compose a complete HTTP reply from the term \arg{Type} using additional
|
|
headers from \arg{HdrExtra} to the output stream \arg{Stream}.
|
|
\arg{ExtraHeader} is a list of \term{Field}{Value}. \arg{Type} is
|
|
one of:
|
|
|
|
\begin{description}
|
|
\termitem{html}{+HTML}
|
|
Produce a HTML page using print_html/1, normally generated using the
|
|
\pllib{http/html_write} described in \secref{htmlwrite}.
|
|
|
|
\termitem{file}{+MimeType, +Path}
|
|
Reply the content of the given file, indicating the given MIME type.
|
|
|
|
\termitem{tmp_file}{+MimeType, +Path}
|
|
Similar to \term{File}{+MimeType, +Path}, but do not include a
|
|
modification time header.
|
|
|
|
\termitem{stream}{+Stream, +Len}
|
|
Reply using the next \arg{Len} characters from \arg{Stream}. The
|
|
user must provides the MIME type and other attributes through the
|
|
\arg{ExtraHeader} argument.
|
|
|
|
\termitem{cgi_stream}{+Stream, +Len}
|
|
Similar to \term{stream}{+Stream, +Len}, but the data on \arg{Stream}
|
|
must contain an HTTP header.
|
|
|
|
\termitem{moved}{+URL}
|
|
Generate a ``301 Moved Permanently'' page with the given target
|
|
\arg{URL}.
|
|
|
|
\termitem{moved_temporary}{+URL}
|
|
Generate a ``302 Moved Temporary'' page with the given target
|
|
\arg{URL}.
|
|
|
|
\termitem{see_other}{+URL}
|
|
Generate a ``303 See Other'' page with the given target \arg{URL}.
|
|
|
|
\termitem{not_found}{+URL}
|
|
Generate a ``404 Not Found'' page.
|
|
|
|
\termitem{forbidden}{+URL}
|
|
Generate a ``403 Forbidden'' page, denying access without challenging
|
|
the client.
|
|
|
|
\termitem{authorise}{+Method, +Realm}
|
|
Generate a ``401 Authorization Required'', requesting the client to
|
|
retry using proper credentials (i.e.\ user and password).
|
|
|
|
\termitem{not_modified}{}
|
|
Generate a ``304 Not Modified'' page, indicating the requested resource
|
|
has not changed since the indicated time.
|
|
|
|
\termitem{server_error}{+Error}
|
|
Generate a ``500 Internal server error'' page with a message generated
|
|
from a Prolog exception term (see print_message/2).
|
|
\end{description}
|
|
\end{description}
|
|
|
|
|
|
\subsection{The \pllib{http/html_write} library} \label{sec:htmlwrite}
|
|
|
|
\newcommand{\elem}[1]{\const{#1}}
|
|
|
|
Producing output for the web in the form of an HTML document is a
|
|
requirement for many Prolog programs. Just using format/2 is
|
|
satisfactory as it leads to poorly readable programs generating poor
|
|
HTML. This library is based on using DCG rules.
|
|
|
|
The \pllib{http/html_write} structures the generation of HTML from a
|
|
program. It is an extensible library, providing a \jargon{DCG} framework
|
|
for generating legal HTML under (Prolog) program control. It is
|
|
especially useful for the generation of structured pages (e.g.\ tables)
|
|
from Prolog data structures.
|
|
|
|
The normal way to use this library is through the DCG html//1. This
|
|
non-terminal provides the central translation from a structured term
|
|
with embedded calls to additional translation rules to a list of atoms
|
|
that can then be printed using print_html/[1,2].
|
|
|
|
\begin{description}
|
|
\dcg{html}{1}{:Spec}
|
|
The DCG non-terminal html//1 is the main predicate of this library. It translates
|
|
the specification for an HTML page into a list of atoms that can be
|
|
written to a stream using print_html/[1,2]. The expansion rules of this
|
|
predicate may be extended by defining the multifile DCG
|
|
html_write:expand//1. \arg{Spec} is either a single specification or a
|
|
list of single specifications. Using nested lists is not allowed to
|
|
avoid ambiguity caused by the atom \const{[]}
|
|
|
|
\begin{itemlist}
|
|
\item [Atomic data]
|
|
Atomic data is quoted using html_quoted//1.
|
|
|
|
\item [\arg{Fmt} - \arg{Args}]
|
|
\arg{Fmt} and \arg{Args} are used as format-specification and argument
|
|
list to format/3. The result is quoted and added to the output list.
|
|
|
|
\item [\bsl\arg{List}]
|
|
Escape sequence to add atoms directly to the output list. This can be
|
|
used to embed external HTML code or emit script output. \arg{List} is
|
|
a list of the following terms:
|
|
|
|
\begin{itemlist}
|
|
\item [\arg{Fmt} - \arg{Args}]
|
|
\arg{Fmt} and \arg{Args} are used as format-specification and argument
|
|
list to format/3. The result is added to the output list.
|
|
\item [\arg{Atomic}]
|
|
Atomic values are added directly to the output list.
|
|
\end{itemlist}
|
|
|
|
\item [\bsl\arg{Term}]
|
|
Invoke the non-terminal \arg{Term} in the calling module. This is the
|
|
common mechanism to realise abstraction and modularisation in generating
|
|
HTML.
|
|
|
|
\item [\arg{Module}:\arg{Term}]
|
|
Invoke the non-terminal <Module>:<Term>. This is similar to
|
|
\bsl\arg{Term} but allows for invoking grammar rules in external
|
|
packages.
|
|
|
|
\item [\&(Entity)]
|
|
Emit {\tt\&<Entity>;} or {\tt\&\#<Entity>;} if \arg{Entity} is an
|
|
integer. SWI-Prolog atoms and strings are represented as Unicode.
|
|
Explicit use of this construct is rarely needed because code-points that
|
|
are not supported by the output encoding are automatically converted
|
|
into character-entities.
|
|
|
|
\item [\term{Tag}{Content}]
|
|
Emit HTML element \arg{Tag} using \arg{Content} and no attributes.
|
|
\arg{Content} is handed to html//1. See \secref{htmllayout} for details
|
|
on the automatically generated layout.
|
|
|
|
\item [\term{Tag}{Attributes, Content}]
|
|
Emit HTML element \arg{Tag} using \arg{Attributes} and \arg{Content}.
|
|
\arg{Attributes} is either a single attribute of a list of attributes.
|
|
Each attributes is of the format \term{Name}{Value} or
|
|
\mbox{\arg{Name}=\arg{Value}}. \arg{Value} is the atomic attribute
|
|
value but allows for a limited functional notation:
|
|
|
|
\begin{itemlist}
|
|
\item [$A$ + $B$]
|
|
Concatenation of $A$ and $B$
|
|
\item [\term{encode}{Atom}]
|
|
Use www_form_encode/2 to create a valid URL component.
|
|
\item [\term{location_by_id}{ID}]
|
|
HTTP location of the HTTP handler with given ID. See http_location_by_id/2.
|
|
\item [List]
|
|
A list is handled as a URL `search' component. The list members are
|
|
terms of the format \mbox{\arg{Name} = \arg{Value}} or
|
|
\term{Name}{Value}. Values are encoded as in the encode option
|
|
described above.
|
|
\end{itemlist}
|
|
|
|
The example below generates a URL that references the predicate
|
|
set_lang/1 in the application with given parameters. The http_handler/3
|
|
declaration binds \const{/setlang} to the predicate set_lang/1 for which
|
|
we provide a very simple implementation. The code between \const{...}
|
|
is part of an HTML page showing the english flag which, when pressed,
|
|
calls \term{set_lang}{Request} where \arg{Request} contains the search
|
|
parameter \mbox{\const{lang} = \const{en}}. Note that the HTTP location
|
|
(path) \const{/setlang} can be moved without affecting this code.
|
|
|
|
\begin{code}
|
|
:- http_handler('/setlang', set_lang, []).
|
|
|
|
set_lang(Request) :-
|
|
http_parameters(Request,
|
|
[ lang(Lang, [])
|
|
]),
|
|
http_session_retractall(lang(_)),
|
|
http_session_assert(lang(Lang)),
|
|
reply_html_page(title('Switched language'),
|
|
p(['Switch language to ', Lang])).
|
|
|
|
|
|
...
|
|
html(a(href(location_by_id(set_lang) + [lang(en)]),
|
|
img(src('/www/images/flags/en.png')))),
|
|
...
|
|
\end{code}
|
|
|
|
|
|
|
|
\end{itemlist}
|
|
|
|
\dcg{page}{2}{:HeadContent, :BodyContent}
|
|
The DCG non-terminal page//2 generated a complete page, including the SGML
|
|
\const{DOCTYPE} declaration. \arg{HeadContent} are elements to be placed
|
|
in the \elem{head} element and \arg{BodyContent} are elements to be
|
|
placed in the \elem{body} element.
|
|
|
|
To achieve common style (background, page header and footer), it is
|
|
possible to define DCG non-terminals head//1 and/or body//1. Non-terminal page//1
|
|
checks for the definition of these non-terminals in the module it is called
|
|
from as well as in the \const{user} module. If no definition is found, it
|
|
creates a head with only the \arg{HeadContent} (note that the
|
|
\elem{title} is obligatory) and a \elem{body} with \const{bgcolor} set
|
|
to \const{white} and the provided \arg{BodyContent}.
|
|
|
|
Note that further customisation is easily achieved using html//1 directly
|
|
as page//2 is (besides handling the hooks) defined as:
|
|
|
|
\begin{code}
|
|
page(Head, Body) -->
|
|
html([ \['<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.0//EN">\n'],
|
|
html([ head(Head),
|
|
body(bgcolor(white), Body)
|
|
])
|
|
]).
|
|
\end{code}
|
|
|
|
\dcg{page}{1}{:Contents}
|
|
This version of the page/[1,2] only gives you the SGML \const{DOCTYPE}
|
|
and the \elem{HTML} element. \arg{Contents} is used to generate both the
|
|
head and body of the page.
|
|
|
|
\dcg{html_begin}{1}{+Begin}
|
|
Just open the given element. \arg{Begin} is either an atom or a
|
|
compound term, In the latter case the arguments are used as arguments
|
|
to the begin-tag. Some examples:
|
|
|
|
\begin{code}
|
|
html_begin(table)
|
|
html_begin(table(border(2), align(center)))
|
|
\end{code}
|
|
|
|
This predicate provides an alternative to using the
|
|
\bsl\arg{Command} syntax in the html//1 specification. The
|
|
following two fragments are the same. The preferred solution depends on
|
|
your preferences as well as whether the specification is generated or
|
|
entered by the programmer.
|
|
|
|
\begin{code}
|
|
table(Rows) -->
|
|
html(table([border(1), align(center), width('80%')],
|
|
[ \table_header,
|
|
\table_rows(Rows)
|
|
])).
|
|
|
|
% or
|
|
|
|
table(Rows) -->
|
|
html_begin(table(border(1), align(center), width('80%'))),
|
|
table_header,
|
|
table_rows,
|
|
html_end(table).
|
|
\end{code}
|
|
|
|
\dcg{html_end}{1}{+End}
|
|
End an element. See html_begin/1 for details.
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{Emitting HTML documents}
|
|
|
|
The non-terminal html//1 translates a specification into a list of
|
|
atoms and layout instructions. Currently the layout instructions are
|
|
terms of the format \term{nl}{N}, requesting at least \arg{N}
|
|
newlines. Multiple consecutive \term{nl}{1} terms are combined to an
|
|
atom containing the maximum of the requested number of newline
|
|
characters.
|
|
|
|
To simplify handing the data to a client or storing it into a file,
|
|
the following predicates are available from this library:
|
|
|
|
\begin{description}
|
|
\predicate{reply_html_page}{2}{:Head, :Body}
|
|
Same as \term{reply_html_page}{default, Head, Body}.
|
|
|
|
\predicate{reply_html_page}{3}{+Style, :Head, :Body}
|
|
Writes an HTML page preceded by an HTTP header as required
|
|
by \pllib{http_wrapper} (CGI-style). Here is a simple typical
|
|
example:
|
|
|
|
\begin{code}
|
|
reply(Request) :-
|
|
reply_html_page(title('Welcome'),
|
|
[ h1('Welcome'),
|
|
p('Welcome to our ...')
|
|
]).
|
|
\end{code}
|
|
|
|
The header and footer of the page can be hooked using the grammar-rules
|
|
user:head//2 and user:body//2. The first argument passed to these hooks
|
|
is the \arg{Style} argument of reply_html_page/3 and the second is the
|
|
2nd (for head//2) or 3rd (for body//2) argument of reply_html_page/3.
|
|
These hooks can be used to restyle the page, typically by embedding the
|
|
real body content in a \elem{div}. E.g., the following code provides a
|
|
menu on top of each page of that is identified using the style
|
|
\textit{myapp}.
|
|
|
|
\begin{code}
|
|
:- multifile
|
|
user:body//2.
|
|
|
|
user:body(myapp, Body) -->
|
|
html(body([ div(id(top), \application_menu),
|
|
div(id(content), Body)
|
|
])).
|
|
\end{code}
|
|
|
|
Redefining the \elem{head} can be used to pull in scripts, but
|
|
typically html_requires//1 provides a more modular approach for
|
|
pulling scripts and CSS-files.
|
|
|
|
\predicate{print_html}{1}{+List}
|
|
Print the token list to the Prolog current output stream.
|
|
|
|
\predicate{print_html}{2}{+Stream, +List}
|
|
Print the token list to the specified output stream
|
|
|
|
\predicate{html_print_length}{2}{+List, -Length}
|
|
When calling html_print/[1,2] on \arg{List}, \arg{Length}
|
|
characters will be produced. Knowing the length is needed to
|
|
provide the \const{Content-length} field of an HTTP reply-header.
|
|
\end{description}
|
|
|
|
|
|
\input{post.tex}
|
|
|
|
\subsubsection{Adding rules for html//1}
|
|
|
|
In some cases it is practical to extend the translations imposed by
|
|
html//1. When using XPCE for example, it is comfortable to be able
|
|
defining default translation to HTML for objects. We also used this
|
|
technique to define translation rules for the output of the SWI-Prolog
|
|
\pllib{sgml} package.
|
|
|
|
The html//1 non-terminal first calls the multifile ruleset html_write:expand//1.
|
|
|
|
\begin{description}
|
|
\dcg{html_write:expand}{1}{+Spec} Hook to add additional
|
|
translation rules for html//1.
|
|
|
|
\dcg{html_quoted}{1}{+Atom} Emit the text
|
|
in \arg{Atom}, inserting entity-references for the SGML special
|
|
characters \verb$<&>$.
|
|
|
|
\dcg{html_quoted_attribute}{1}{+Atom} Emit the
|
|
text in \arg{Atom} suitable for use as an SGML attribute, inserting
|
|
entity-references for the SGML special characters \verb$<&>"$.
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{Generating layout} \label{sec:htmllayout}
|
|
|
|
Though not strictly necessary, the library attempts to generate
|
|
reasonable layout in SGML output. It does this only by inserting
|
|
newlines before and after tags. It does this on the basis of the
|
|
multifile predicate html_write:layout/3
|
|
|
|
\begin{description}
|
|
\predicate{html_write:layout}{3}{+Tag, -Open, -Close}
|
|
Specify the layout conventions for the element \arg{Tag}, which is a
|
|
lowercase atom. \arg{Open} is a term \arg{Pre}-\arg{Post}. It defines
|
|
that the element should have at least \arg{Pre} newline characters
|
|
before and \arg{Post} after the tag. The \arg{Close} specification is
|
|
similar, but in addition allows for the atom \const{-}, requesting the
|
|
output generator to omit the close-tag altogether or \const{empty},
|
|
telling the library that the element has declared empty content. In this
|
|
case the close-tag is not emitted either, but in addition html//1
|
|
interprets \arg{Arg} in \term{Tag}{Arg} as a list of attributes rather
|
|
than the content.
|
|
|
|
A tag that does not appear in this table is emitted without additional
|
|
layout. See also print_html/[1,2]. Please consult the
|
|
library source for examples.
|
|
\end{description}
|
|
|
|
|
|
\subsubsection{Examples}
|
|
|
|
In the following example we will generate a table of Prolog predicates
|
|
we find from the SWI-Prolog help system based on a keyword. The primary
|
|
database is defined by the predicate predicate/5 We will make hyperlinks
|
|
for the predicates pointing to their documentation.
|
|
|
|
\begin{code}
|
|
html_apropos(Kwd) :-
|
|
findall(Pred, apropos_predicate(Kwd, Pred), Matches),
|
|
phrase(apropos_page(Kwd, Matches), Tokens),
|
|
print_html(Tokens).
|
|
|
|
% emit page with title, header and table of matches
|
|
|
|
apropos_page(Kwd, Matches) -->
|
|
page([ title(['Predicates for ', Kwd])
|
|
],
|
|
[ h2(align(center),
|
|
['Predicates for ', Kwd]),
|
|
table([ align(center),
|
|
border(1),
|
|
width('80%')
|
|
],
|
|
[ tr([ th('Predicate'),
|
|
th('Summary')
|
|
])
|
|
| \apropos_rows(Matches)
|
|
])
|
|
]).
|
|
|
|
% emit the rows for the body of the table.
|
|
|
|
apropos_rows([]) -->
|
|
[].
|
|
apropos_rows([pred(Name, Arity, Summary)|T]) -->
|
|
html([ tr([ td(\predref(Name/Arity)),
|
|
td(em(Summary))
|
|
])
|
|
]),
|
|
apropos_rows(T).
|
|
|
|
% predref(Name/Arity)
|
|
%
|
|
% Emit Name/Arity as a hyperlink to
|
|
%
|
|
% /cgi-bin/plman?name=Name&arity=Arity
|
|
%
|
|
% we must do form-encoding for the name as it may contain illegal
|
|
% characters. www_form_encode/2 is defined in library(url).
|
|
|
|
predref(Name/Arity) -->
|
|
{ www_form_encode(Name, Encoded),
|
|
sformat(Href, '/cgi-bin/plman?name=~w&arity=~w',
|
|
[Encoded, Arity])
|
|
},
|
|
html(a(href(Href), [Name, /, Arity])).
|
|
|
|
% Find predicates from a keyword. '$apropos_match' is an internal
|
|
% undocumented predicate.
|
|
|
|
apropos_predicate(Pattern, pred(Name, Arity, Summary)) :-
|
|
predicate(Name, Arity, Summary, _, _),
|
|
( '$apropos_match'(Pattern, Name)
|
|
-> true
|
|
; '$apropos_match'(Pattern, Summary)
|
|
).
|
|
\end{code}
|
|
|
|
|
|
|
|
\subsubsection{Remarks on the \pllib{http/html_write} library}
|
|
|
|
This library is the result of various attempts to reach at a more
|
|
satisfactory and Prolog-minded way to produce HTML text from a program.
|
|
We have been using Prolog for the generation of web pages in a number of
|
|
projects. Just using format/2 never was a real
|
|
option, generating error-prone HTML from clumsy syntax. We started
|
|
with a layer on top of format/2, keeping track of the current nesting
|
|
and thus always capable of properly closing the environment.
|
|
|
|
DCG based translation however naturally exploits Prolog's term-rewriting
|
|
primitives. If generation fails for whatever reason it is easy to
|
|
produce an alternative document (for example holding an error message).
|
|
|
|
The approach presented in this library has been used in combination with
|
|
\pllib{http/httpd} in three projects: viewing RDF in a browser,
|
|
selecting fragments from an analysed document and presenting parts of
|
|
the XPCE documentation using a browser. It has proven to be
|
|
able to deal with generating pages quickly and comfortably.
|
|
|
|
In a future version we will probably define a goal_expansion/2 to do
|
|
compile-time optimisation of the library. Quotation of known text and
|
|
invocation of sub-rules using the \bsl\arg{RuleSet} and
|
|
<Module>:<RuleSet> operators are costly operations in the analysis
|
|
that can be done at compile-time.
|
|
|
|
\input{jswrite}
|
|
\input{httppath}
|
|
\input{htmlhead}
|
|
\input{httppwp}
|
|
|
|
|
|
\subsection{Security}
|
|
|
|
Writing servers is an inherently dangerous job that should be carried out
|
|
with some considerations. You have basically started a program on a
|
|
public terminal and invited strangers to use it. When using the
|
|
interactive server or inetd based server the server runs under your
|
|
privileges. Using CGI scripted it runs with the privileges of your
|
|
web-server. Though it should not be possible to fatally compromise a
|
|
Unix machine using user privileges, getting unconstrained access to the
|
|
system is highly undesirable.
|
|
|
|
Symbolic languages have an additional handicap in their inherent
|
|
possibilities to modify the running program and dynamically create goals
|
|
(this also applies to the popular perl and java scripting languages).
|
|
Here are some guidelines.
|
|
|
|
\begin{itemlist}
|
|
\item [Check your input]
|
|
Hardly anything can go wrong if you check the validity of
|
|
query-arguments before formulating an answer.
|
|
|
|
\item [Check filenames]
|
|
If part of the query consists of filenames or directories, check
|
|
them. This also applies to files you only read. Passing names as
|
|
\file{/etc/passwd}, but also \file{../../../../../etc/passwd} are
|
|
tried by experienced hackers to learn about the system they want
|
|
to attack. So, expand provided names using absolute_file_name/[2,3]
|
|
and verify they are inside a folder reserved for the server. Avoid
|
|
symbolic links from this subtree to the outside world. The example
|
|
below checks validity of filenames. The first call ensures proper
|
|
canonisation of the paths to avoid an mismatch due to
|
|
symbolic links or other filesystem ambiguities.
|
|
|
|
\begin{code}
|
|
check_file(File) :-
|
|
absolute_file_name('/path/to/reserved/area', Reserved),
|
|
absolute_file_name(File, Tried),
|
|
atom_concat(Reserved, _, Tried).
|
|
\end{code}
|
|
|
|
\item [Check scripts]
|
|
Should input in any way activate external scripts using shell/1
|
|
or \exam{open(pipe(Command), ...)}, verify the argument once more.
|
|
|
|
\item [Check meta-calling]
|
|
\emph{The} attractive situation for you and your attacker is below:
|
|
|
|
\begin{code}
|
|
reply(Query) :-
|
|
member(search(Args), Query),
|
|
member(action=Action, Query),
|
|
member(arg=Arg, Query),
|
|
call(Action, Arg). % NEVER EVER DO THIS!
|
|
\end{code}
|
|
|
|
All your attacker has to do is specify \arg{Action} as \const{shell}
|
|
and \arg{Arg} as \const{/bin/sh} and he has an uncontrolled shell!
|
|
\end{itemlist}
|
|
|
|
|
|
\subsection{Tips and tricks}
|
|
|
|
\begin{itemlist}
|
|
\item [URL Locations]
|
|
With an application in mind, it is tempting to make all URL
|
|
locations short and directly connected to the root (\const{/}). This is
|
|
\emph{not} a good idea. It is adviced to have all locations in a server
|
|
below a directory with an informative name. Consider to make the root
|
|
location something that can be changed using a global setting.
|
|
|
|
\begin{itemize}
|
|
\item Page generating code can easily be reused. Using locations
|
|
directly below the root however increases the likelihood
|
|
of conflicts.
|
|
\item Multiple servers can be placed behind the same public
|
|
server as explained in \secref{proxy}. Using a common
|
|
and fairly unique root, redirection is much easier and
|
|
less likely to lead to conflicts.
|
|
\end{itemize}
|
|
|
|
\item [Debugging]
|
|
Please check the section \url[``Thread-support
|
|
library(threadutil)'']{http://gollem.science.uva.nl/SWI-Prolog/Manual/thutil.html}
|
|
of the SWI-Prolog reference manual.
|
|
\end{itemlist}
|
|
|
|
|
|
\section{Transfer encodings}
|
|
\label{sec:transfer}
|
|
|
|
\index{chunked,encoding}%
|
|
\index{deflate,encoding}%
|
|
The HTTP protocol provides for \jargon{transfer encodings}. These define
|
|
filters applied to the data described by the \const{Content-type}. The
|
|
two most popular transfer encodings are \const{chunked} and
|
|
\const{deflate}. The \const{chunked} encoding avoids the need for
|
|
a \const{Content-length} header, sending the data in chunks, each of
|
|
which is preceded by a length. The \const{deflate} encoding provides
|
|
compression.
|
|
|
|
Transfer-encodings are supported by filters defined as foreign libraries
|
|
that realise an encoding/decoding stream on top of another stream.
|
|
Currently there are two such libraries: \pllib{http/http_chunked.pl}
|
|
and \pllib{zlib.pl}.
|
|
|
|
There is an emerging hook interface dealing with transfer encodings. The
|
|
\pllib{http/http_chunked.pl} provides a hook used by
|
|
\pllib{http/http_open.pl} to support chunked encoding in http_open/3.
|
|
Note that both \file{http_open.pl} \emph{and} \file{http_chunked.pl}
|
|
must be loaded for http_open/3 to support chunked encoding.
|
|
|
|
\subsection{The \pllib{http/http_chunked} library}
|
|
|
|
\begin{description}
|
|
\predicate{http_chunked_open}{3}{+RawStream, -DataStream, +Options}
|
|
Create a stream to realise HTTP chunked encoding or decoding. The
|
|
technique is similar to library(zlib), using a Prolog stream as a filter
|
|
on another stream. See online documentation at
|
|
\url{http://gollem.science.uva.nl/SWI-Prolog/pldoc/} for details.
|
|
\end{description}
|
|
|
|
\input{json.tex}
|
|
|
|
\section{Status}
|
|
|
|
The SWI-Prolog HTTP library is in active use in a large number of
|
|
projects. It is considered one of the SWI-Prolog core libraries that is
|
|
actively maintained and regularly extended with new features. This is
|
|
particularly true for the multi-threaded server. The XPCE and inetd based
|
|
servers are not widely used.
|
|
|
|
This library is by no means complete and you are free to extend it.
|
|
|
|
\printindex
|
|
|
|
\end{document}
|
|
|