\documentclass[11pt]{article} \usepackage{times} \usepackage{pl} \usepackage{plpage} \usepackage{html} \makeindex \onefile \htmloutput{html} % Output directory \htmlmainfile{index} % Main document file \bodycolor{white} % Page colour \sloppy \renewcommand{\runningtitle}{SWI-Prolog HTTP support} \begin{document} \title{SWI-Prolog HTTP support} \author{Jan Wielemaker \\ HCS, \\ University of Amsterdam \\ The Netherlands \\ E-mail: \email{J.Wielemaker@uva.nl}} \maketitle \begin{abstract} This article documents the package HTTP, a series of libraries for accessing data on HTTP servers as well as providing HTTP server capabilities from SWI-Prolog. Both server and client are modular libraries. The server can be operated from the Unix \program{inetd} super-daemon as well as as a stand-alone server that runs on all platforms supported by SWI-Prolog. \end{abstract} \vfill \pagebreak \tableofcontents \vfill \vfill \newpage \section{Introduction} The HTTP (HyperText Transfer Protocol) is the W3C standard protocol for transferring information between a web-client (browser) and a web-server. The protocol is a simple \emph{envelope} protocol where standard name/value pairs in the header are used to split the stream into messages and communicate about the connection-status. Many languages have client and or server libraries to deal with the HTTP protocol, making it a suitable candidate for general purpose client-server applications. In this document we describe a modular infra-structure to access web-servers from SWI-Prolog and turn Prolog into a web-server. \subsection*{Acknowledgements} This work has been carried out under the following projects: \url[GARP]{http://hcs.science.uva.nl/projects/GARP/}, \url[MIA]{http://www.ins.cwi.nl/projects/MIA/}, \url[IBROW]{http://hcs.science.uva.nl/projects/ibrow/home.html}, \url[KITS]{http://kits.edte.utwente.nl/} and \url[MultiMediaN]{http://e-culture.multimedian.nl/} The following people have pioneered parts of this library and contributed with bug-report and suggestions for improvements: Anjo Anjewierden, Bert Bredeweg, Wouter Jansweijer, Bob Wielinga, Jacco van Ossenbruggen, Michiel Hildebrandt, Matt Lilley and Keri Harris. \section{The HTTP client libraries} This package provides two packages for building HTTP clients. The first, \pllib{http/http_open} is a lightweight library for opening a HTTP URL address as a Prolog stream. It can only deal with the HTTP GET protocol. The second, \pllib{http/http_client} is a more advanced library dealing with \jargon{keep-alive}, \jargon{chunked transfer} and a plug-in mechanism providing conversions based on the MIME content-type. \input{httpopen.tex} \subsection{The \pllib{http/http_client} library} \label{sec:httpclient} The \pllib{http/http_client} library provides more powerful access to reading HTTP resources, providing \jargon{keep-alive} connections, \jargon{chunked} transfer and conversion of the content, such as breaking down \jargon{multipart} data, parsing HTML, etc. The library announces itself as providing \const{HTTP/1.1}. \begin{description} \predicate{http_get}{3}{+URL, -Reply, +Options} Performs a HTTP GET request on the given URL and then reads the reply using http_read_data/3. Defined options are: \begin{description} \termitem{connection}{ConnectionType} If \const{close} (default) a new connection is created for this request and closed after the request has completed. If \const{'Keep-Alive'} the library checks for an open connection on the requested host and port and re-uses this connection. The connection is left open if the other party confirms the keep-alive and closed otherwise. \termitem{http_version}{Major-Minor} Indicate the HTTP protocol version used for the connection. Default is \const{1.1}. \termitem{proxy}{+Host, +Port} Use an HTTP proxy to connect to the outside world. \termitem{proxy_authorization}{+Authorization} Send authorization to the proxy. Otherwise the same as the \const{authorization} option. \termitem{timeout}{+Timeout} If provided, set a timeout on the stream using set_stream/2. With this option if no new data arrives within \arg{Timeout} seconds the stream raises an exception. Default is to wait forever (\const{infinite}). \termitem{user_agent}{+Agent} Defines the value of the \const{User-Agent} field of the HTTP header. Default is \const{SWI-Prolog (http://www.swi-prolog.org)}. \termitem{range}{+Range} Ask for partial content. \arg{Range} is a term \term{\arg{Unit}}{From, To}, where \arg{From} is an integer and \arg{To} is either an integer or the atom \const{end}. HTTP 1.1 only supports \arg{Unit} = \const{bytes}. E.g., to ask for bytes 1000-1999, use the option \exam{range(bytes(1000,1999))}. \termitem{request_header}{Name = Value} Add a line "\arg{Name}: \arg{Value}" to the HTTP request header. Both name and value are added uninspected and literally to the request header. This may be used to specify accept encodings, languages, etc. Please check the RFC2616 (HTTP) document for available fields and their meaning. \termitem{reply_header}{Header} Unify \arg{Header} with a list of \arg{Name}=\arg{Value} pairs expressing all header fields of the reply. See http_read_request/2 for the result format. \end{description} Remaining options are passed to http_read_data/3. \predicate{http_post}{4}{+URL, +In, -Reply, +Options} Performs a HTTP POST request on the given URL. It is equivalent to http_get/3, except for providing an \jargon{input document}, which is posted using http_post_data/3. \predicate{http_read_data}{3}{+Header, -Data, +Options} Read data from an HTTP stream. Normally called from http_get/3 or http_post/4. When dealing with HTTP POST in a server this predicate can be used to retrieve the posted data. \arg{Header} is the parsed header. \arg{Options} is a list of \term{\arg{Name}}{Value} pairs to guide the translation of the data. The following options are supported: \begin{description} \termitem{to}{Target} Do not try to interpret the data according to the MIME-type, but return it literally according to \arg{Target}, which is one of: \begin{description} \termitem{stream}{Output} Append the data to the given stream, which must be a Prolog stream open for writing. This can be used to save the data in a (memory-)file, XPCE object, forward it to process using a pipe, etc. \termitem{atom}{} Return the result as an atom. Though SWI-Prolog has no limit on the size of atoms and provides atom-garbage collection, this options should be used with care.% \footnote{Currently atom-garbage collection is activated after the creation of 10,000 atoms.} \termitem{codes}{} Return the page as a list of character-codes. This is especially useful for parsing it using grammar rules. \end{description} \termitem{content_type}{Type} Overrule the \const{Content-Type} as provided by the HTTP reply header. Intended as a work-around for badly configured servers. \end{description} If no \term{to}{Target} option is provided the library tries the registered plug-in conversion filters. If none of these succeed it tries the built-in content-type handlers or returns the content as an atom. The builtin content filters are described below. The provided plug-ins are described in the following sections. \begin{description} \termitem{application/x-www-form-urlencoded}{} This is the default encoding mechanism for POST requests issued by a web-browser. It is broken down to a list of \arg{Name} = \arg{Value} terms. \end{description} Finally, if all else fails the content is returned as an atom. \predicate{http_post_data}{3}{+Data, +Stream, +ExtraHeader} Write an HTTP POST request to \arg{Stream} using data from \arg{Data} and passing the additional extra headers from \arg{ExtraHeader}. \arg{Data} is one of: \begin{description} \termitem{html}{+HTMLTokens} Send an HTML token string as produced by the library \pllib{html_write} described in section \secref{htmlwrite}. \termitem{file}{+File} Send the contents of \arg{File}. The MIME type is derived from the filename extension using file_mime_type/2. \termitem{file}{+Type, +File} Send the contents of \arg{File} using the provided MIME type, i.e.\ claiming the \const{Content-type} equals \arg{Type}. \termitem{codes}{+Codes} Same as string(text/plain, Codes). \termitem{codes}{+Type, +Codes} Send string (list of character codes) using the indicated MIME-type. \termitem{cgi_stream}{+Stream, +Len} Read the input from \arg{Stream} which, like CGI data starts with a partial HTTP header. The fields of this header are merged with the provided \arg{ExtraHeader} fields. The first \arg{Len} characters of \arg{Stream} are used. \termitem{form}{+ListOfParameter} Send data of the MIME type \const{application/x-www-form-urlencoded} as produced by browsers issuing a POST request from an HTML form. \arg{ListOfParameter} is a list of \arg{Name}=\arg{Value} or \mbox{\arg{Name}(\arg{Value})}. \termitem{form_data}{+ListOfData} Send data of the MIME type \const{multipart/form-data} as produced by browsers issuing a POST request from an HTML form using \const{enctype} \const{multipart/form-data}. This is a somewhat simplified MIME \const{multipart/mixed} encoding used by browser forms including file input fields. \arg{ListOfData} is the same as for the \arg{List} alternative described below. Below is an example from the SWI-Prolog \url[Sesame]{http://www.openrdf.org} interface. \arg{Repository}, etc.\ are atoms providing the value, while the last argument provides a value from a file. \begin{code} ..., http_post([ protocol(http), host(Host), port(Port), path(ActionPath) ], form_data([ repository = Repository, dataFormat = DataFormat, baseURI = BaseURI, verifyData = Verify, data = file(File) ]), _Reply, []), ..., \end{code} \termitem{List}{} If the argument is a plain list, it is sent using the MIME type \const{multipart/mixed} and packed using mime_pack/3. See mime_pack/3 for details on the argument format. \end{description} \end{description} \subsubsection{The MIME client plug-in} \label{sec:httpmimeplugin} This plug-in library \pllib{http/http_mime_plugin} breaks multipart documents that are recognised by the \exam{Content-Type: multipart/form-data} or \exam{Mime-Version: 1.0} in the header into a list of \arg{Name} = \arg{Value} pairs. This library deals with data from web-forms using the \const{multipart/form-data} encoding as well as the \url[FIPA]{http://www.fipa.org} agent-protocol messages. \subsubsection{The SGML client plug-in} \label{sec:httpsgmlplugin} This plug-in library \pllib{http/http_sgml_plugin} provides a bridge between the SGML/XML/HTML parser provided by \pllib{sgml} and the http client library. After loading this hook the following mime-types are automatically handled by the SGML parser. \begin{description} \termitem{text/html}{} Handed to \pllib{sgml} using W3C HTML 4.0 DTD, suppressing and ignoring all HTML syntax errors. \arg{Options} is passed to load_structure/3. \termitem{text/xml}{} Handed to \pllib{sgml} using dialect \const{xmlns} (XML + namespaces). \arg{Options} is passed to load_structure/3. In particular, \term{dialect}{xml} may be used to suppress namespace handling. \termitem{text/x-sgml}{} Handled to \pllib{sgml} using dialect \const{sgml}. \arg{Options} is passed to load_structure/3. \end{description} \section{The HTTP server libraries} \label{sec:httpserver} The HTTP server library consists of two parts obligatory and one optional part. The first deals with connection management and has three different implementation depending on the desired type of server. The second implements a generic wrapper for decoding the HTTP request, calling user code to handle the request and encode the answer. The optional \file{http_dispatch} module can be used to assign HTTP \jargon{locations} (paths) to predicates. This design is summarised in \figref{httpserver}. \postscriptfig[width=0.8\linewidth]{httpserver}{Design of the HTTP server} The functional body of the user's code is independent from the selected server-type, making it easy to switch between the supported server types. \subsection{The `Body'} \label{sec:body} The server-body is the code that handles the request and formulates a reply. To facilitate all mentioned setups, the body is driven by http_wrapper/5. The goal is called with the parsed request (see \secref{request}) as argument and \const{current_output} set to a temporary buffer. Its task is closely related to the task of a CGI script; it must write a header declaring holding at least the \const{Content-type} field and a body. Here is a simple body writing the request as an HTML table. \begin{code} reply(Request) :- format('Content-type: text/html~n~n', []), format('~n', []), format('~n'), print_request(Request), format('~n
~n'), format('~n', []). print_request([]). print_request([H|T]) :- H =.. [Name, Value], format('~w~w~n', [Name, Value]), print_request(T). \end{code} The infrastructure recognises the header \texttt{Transfer-encoding:~chunked}, causing it to use chunked encoding if the client allows for it. See also \secref{transfer} and the \const{chunked} option in http_handler/3. Other header lines are passed verbatim to the client. Typical examples are \texttt{Set-Cookie} and authentication headers (see \secref{auth}. \subsubsection{Returning special status codes} \label{sec:httpspecials} Besides returning a page by writing it to the current output stream, the server goal can raise an exception using throw/1 to generate special pages such as \const{not_found}, \const{moved}, etc. The defined exceptions are: \begin{description} \termitem{http_reply}{+Reply, +HdrExtra} Return a result page using http_reply/3. See http_reply/3 for details. \termitem{http_reply}{+Reply} Equivalent to \term{http_reply}{Reply, []}. \termitem{http}{not_modified} Equivalent to \term{http_reply}{not_modified, []}. This exception is for backward compatibility and can be used by the server to indicate the referenced resource has not been modified since it was requested last time. \end{description} \input{httpdispatch.tex} \input{httpdirindex.tex} \input{httpsession.tex} \subsection{HTTP Authentication} \label{sec:auth} The module \file{http/http_authenticate} provides the basics to validate an HTTP \const{Authorization} error. User and password information are read from a Unix/Apache compatible password file. This information, as well as the validation process is cached to achieve optimal performance. \begin{description} \predicate{http_authenticate}{+Type, +Request, -User} True if Request contains the information to continue according to Type. Type identifies the required authentication technique: \begin{description} \termitem{basic}{+PasswordFile} Use HTTP \const{Basic} authentication and verify the password from PasswordFile. PasswordFile is a file holding usernames and passwords in a format compatible to Unix and Apache. Each line is record with \verb$:$ separated fields. The first field is the username and the second the password _hash_. Password hashes are validated using crypt/2. \end{description} Successful authorization is cached for 60 seconds to avoid overhead of decoding and lookup of the user and password data. http_authenticate/3 just validates the header. If authorization is not provided the browser must be challenged, in response to which it normally opens a user-password dialogue. Example code realising this is below. The exception causes the HTTP wrapper code to generate an HTTP 401 reply. \begin{code} ..., ( http_authenticate(basic(passwd), Request, User) -> true ; throw(http_reply(authorise(basic, Realm))) ). \end{code} Alternatively \term{basic}{+PasswordFile} can be passed as an option to http_handler/3. \end{description} \input{httpopenid.tex} %================================================================ \subsection{Get parameters from HTML forms} \label{sec:httpparam} The library \pllib{http/http_parameters} provides two predicates to fetch HTTP request parameters as a type-checked list easily. The library transparently handles both GET and POST requests. It builds on top of the low-level request representation described in \secref{request}. \begin{description} \predicate{http_parameters}{2}{+Request, ?Parameters} The predicate is passes the \arg{Request} as provided to the handler goal by http_wrapper/5 as well as a partially instantiated lists describing the requested parameters and their types. Each parameter specification in \arg{Parameters} is a term of the format \mbox{\arg{Name}(\arg{-Value}, \arg{+Options})}. \arg{Options} is a list of option terms describing the type, default, etc. If no options are specified the parameter must be present and its value is returned in \arg{Value} as an atom. If a parameter is missing the exception \term{error}{\term{existence_error}{form_data, Name}, _} is thrown. Options fall into three categories: those that handle presence of the parameter, those that guide conversion and restrict types and those that support automatic generation of documention. First, the presence-options: \begin{description} \termitem{default}{Default} If the named parameter is missing, \arg{Value} is unified to \arg{Default}. \termitem{optional}{true} If the named parameter is missing, \arg{Value} is left unbound and no error is generated. \termitem{list}{Type} The same parameter may not appear or appear multiple times. If this option is present, \const{default} and \const{optional} are ignored and the value is returned as a list. Type checking options are processed on each value. \termitem{zero_or_more}{} Deprecated. Use \term{List}{Type}. \end{description} The type and conversion options are given below. The type-language can be extended by providing clauses for the multifile hook http:convert_parameter/3. \begin{description} \termitem{;}{Type1, Type2} Succeed if either \arg{Type1} or \arg{Type2} applies. It allows for checks such as \exam{(nonneg;oneof([infinite]))} to specify an integer or a symbolic value. \termitem{oneof}{List} Succeeds if the value is member of the given list. \definition{length $> N$} Succeeds if value is an atom of more than $N$ characters. \definition{length $>= N$} Succeeds if value is an atom of more or than equal to $N$ characters. \definition{length $< N$} Succeeds if value is an atom of less than $N$ characters. \definition{length $=< N$} Succeeds if value is an atom of length than or equal to $N$ characters. \termitem{atom}{} No-op. Allowed for consistency. \termitem{between}{+Low, +High} Convert value to a number and if either \arg{Low} or \arg{High} is a float, force value to be a float. Then check that the value is in the given range, which includes the boundaries. \termitem{boolean}{} Translate =true=, =yes=, =on= and '1' into =true=; =false=, =no=, =off= and '0' into =false= and raises an error otherwise. \termitem{float}{} Convert value to a float. Integers are transformed into float. Throws a type-error otherwise. \termitem{integer}{} Convert value to an integer. Throws a type-error otherwise. \termitem{nonneg}{} Convert value to a non-negative integer. Throws a type-error of the value cannot be converted to an integer and a domain-error otherwise. \termitem{number}{} Convert value to a number. Throws a type-error otherwise. \end{description} The last set of options is to support automatic generation of HTTP API documentation from the sources.\footnote{This facility is under development in ClioPatria; see \file{http_help.pl}}. \begin{description} \termitem{description}{+Atom} Description of the parameter in plain text. \termitem{group}{+Parameters, +Options} Define a logical group of parameters. \arg{Parameters} are processed as normal. \arg{Options} may include a description of the group. Groups can be nested. \end{description} Below is an example \begin{code} reply(Request) :- http_parameters(Request, [ title(Title, [ optional(true) ]), name(Name, [ length >= 2 ]), age(Age, [ between(0, 150) ]) ]), ... \end{code} Same as \term{http_parameters}{Request, Parameters, []} \predicate{http_parameters}{3}{+Request, ?Parameters, +Options} In addition to http_parameters/2, the following options are defined. \begin{description} \termitem{form_data}{-Data} Return the entire set of provided \arg{Name}=\arg{Value} pairs from the GET or POST request. All values are returned as atoms. \termitem{attribute_declarations}{:Goal} If a parameter specification lacks the parameter options, call \term{call}{Goal, +ParamName, -Options} to find the options. Intended to share declarations over many calls to http_parameters/3. Using this construct the above can be written as below. \begin{code} reply(Request) :- http_parameters(Request, [ title(Title), name(Name), age(Age) ], [ attribute_declarations(param) ]), ... param(title, [optional(true)]). param(name, [length >= 2 ]). param(age, [integer]). \end{code} \end{description} \end{description} \subsection{Request format} \label{sec:request} The body-code (see \secref{body}) is driven by a \arg{Request}. This request is generated from http_read_request/2 defined in \pllib{http/http_header}. \begin{description} \predicate{http_read_request}{2}{+Stream, -Request} Reads an HTTP request from \arg{Stream} and unify \arg{Request} with the parsed request. \arg{Request} is a list of \term{\arg{Name}}{Value} elements. It provides a number of predefined elements for the result of parsing the first line of the request, followed by the additional request parameters. The predefined fields are: \begin{description} \termitem{host}{Host} If the request contains \verb$Host: $\arg{Host}, Host is unified with the host-name. If \arg{Host} is of the format : \arg{Host} only describes and a field \term{port}{Port} where \arg{Port} is an integer is added. \termitem{input}{Stream} The \arg{Stream} is passed along, allowing to read more data or requests from the same stream. This field is always present. \termitem{method}{Method} \arg{Method} is one of \const{get}, \const{put} or \const{post}. This field is present if the header has been parsed successfully. \termitem{path}{Path} Path associated to the request. This field is always present. \termitem{peer}{Peer} \arg{Peer} is a term \term{ip}{A,B,C,D} containing the IP address of the contacting host. \termitem{port}{Port} Port requested. See \const{host} for details. \termitem{request_uri}{RequestURI} This is the untranslated string that follows the method in the request header. It is used to construct the path and search fields of the \arg{Request}. It is provided because reconstructing this string from the path and search fields may yield a different value due to different usage of percent encoding. \termitem{search}{ListOfNameValue} Search-specification of URI. This is the part after the \chr{?}, normally used to transfer data from HTML forms that use the `\const{GET}' protocol. In the URL it consists of a www-form-encoded list of \arg{Name}=\arg{Value} pairs. This is mapped to a list of Prolog \arg{Name}=\arg{Value} terms with decoded names and values. This field is only present if the location contains a search-specification. \termitem{http_version}{Major-Minor} If the first line contains the \const{HTTP/}\arg{Major}.\arg{Minor} version indicator this element indicate the HTTP version of the peer. Otherwise this field is not present. \termitem{cookie}{ListOfNameValue} If the header contains a \const{Cookie} line, the value of the cookie is broken down in \arg{Name}=\arg{Value} pairs, where the \arg{Name} is the lowercase version of the cookie name as used for the HTTP fields. \termitem{set_cookie}{set_cookie(Name, Value, Options)} If the header contains a \const{SetCookie} line, the cookie field is broken down into the \arg{Name} of the cookie, the \arg{Value} and a list of \arg{Name}=\arg{Value} pairs for additional options such as \const{expire}, \const{path}, \const{domain} or \const{secure}. \end{description} If the first line of the request is tagged with \const{HTTP/}\arg{Major}.\arg{Minor}, http_read_request/2 reads all input upto the first blank line. This header consists of \arg{Name}:\arg{Value} fields. Each such field appears as a term \term{\arg{Name}}{Value} in the \arg{Request}, where \arg{Name} is canonised for use with Prolog. Canonisation implies that the \arg{Name} is converted to lower case and all occurrences of the \chr{-} are replaced by \chr{_}. The value for the \const{Content-length} fields is translated into an integer. \end{description} Here is an example: \begin{code} ?- http_read_request(user, X). |: GET /mydb?class=person HTTP/1.0 |: Host: gollem |: X = [ input(user), method(get), search([ class = person ]), path('/mydb'), http_version(1-0), host(gollem) ]. \end{code} \subsubsection{Handling POST requests} Where the HTTP \const{GET} operation is intended to get a document, using a \arg{path} and possibly some additional search information, the \const{POST} operation is intended to hand potentially large amounts of data to the server for processing. The \arg{Request} parameter above contains the term \term{method}{post}. The data posted is left on the input stream that is available through the term \term{input}{Stream} from the \arg{Request} header. This data can be read using http_read_data/3 from the HTTP client library. Here is a demo implementation simply returning the parsed posted data as plain text (assuming pp/1 pretty-prints the data). \begin{code} reply(Request) :- member(method(post), Request), !, http_read_data(Request, Data, []), format('Content-type: text/plain~n~n', []), pp(Data). \end{code} If the POST is initiated from a browser, content-type is generally either \const{application/x-www-form-urlencoded} or \const{multipart/form-data}. The latter is broken down automatically if the plug-in \pllib{http/http_mime_plugin} is loaded. \subsection{Running the server} The functionality of the server should be defined in one Prolog file (of course this file is allowed to load other files). Depending on the wanted server setup this `body' is wrapped into a small Prolog file combining the body with the appropriate server interface. There are three supported server-setups. For most applications we advice the multi-threaded server. Examples of this server architecture are the \url[PlDoc]{http://www.swi-prolog.org/packages/pldoc.html} documentation system and the \url[SeRQL]{http://www.swi-prolog.org/packages/SeRQL/} Semantic Web server infrastructure. All the server setups may be wrapped in a \jargon{reverse proxy} to make them available from the public web-server as described in \secref{proxy}. \begin{itemlist} \item [Using \pllib{thread_httpd} for a multi-threaded server] This server exploits the multi-threaded version of SWI-Prolog, running the users body code parallel from a pool of worker threads. As it avoids the state engine and copying required in the event-driven server it is generally faster and capable to handle multiple requests concurrently. This server is harder to debug due to the involved threading, although the GUI tracer provides reasonable support for multi-threaded applications using the tspy/1 command. It can provide fast communication to multiple clients and can be used for more demanding servers. \item [Using \pllib{xpce_httpd} for an event-driven server] This approach provides a single-threaded event-driven application. The clients talk to XPCE sockets that collect an HTTP request. The server infra-structure can talk to multiple clients simultaneously, but once a request is complete the wrappers call the user's goal and blocks all further activity until the request is handled. Requests from multiple clients are thus fully serialised in one Prolog process. This server setup is very suitable for debugging as well as embedded server in simple applications in a fairly controlled environment. \item [Using \pllib{inetd_httpd} for server-per-client] In this setup the Unix \program{inetd} user-daemon is used to initialise a server for each connection. This approach is especially suitable for servers that have a limited startup-time. In this setup a crashing client does not influence other requests. This server is very hard to debug as the server is not connected to the user environment. It provides a robust implementation for servers that can be started quickly. \end{itemlist} \subsubsection{Common server interface options} All the server interfaces provide \term{http_server}{:Goal, +Options} to create the server. The list of options differ, but the servers share common options: \begin{description} \termitem{port}{?Port} Specify the port to listen to for stand-alone servers. \arg{Port} is either an integer or unbound. If unbound, it is unified to the selected free port. \end{description} \subsubsection{Multi-threaded Prolog} \label{sec:mthttpd} The \pllib{http/thread_httpd.pl} provides the infrastructure to manage multiple clients using a pool of \jargon{worker-threads}. This realises a popular server design, also seen in Java Tomcat and Microsoft .NET. As a single persistent server process maintains communication to all clients startup time is not an important issue and the server can easily maintain state-information for all clients. In addition to the functionality provided by the other (XPCE and inetd) servers, the threaded server can also be used to realise an HTTPS server exploiting the \pllib{ssl} library. See option \term{ssl}{+SSLOptions} below. \begin{description} \predicate{http_server}{3}{:Goal, +Options} Create the server. \arg{Options} must provide the \term{port}{?Port} option to specify the port the server should listen to. If \arg{Port} is unbound an arbitrary free port is selected and \arg{Port} is unified to this port-number. The server consists of a small Prolog thread accepting new connection on \arg{Port} and dispatching these to a pool of workers. Defined \arg{Options} are: \begin{description} \termitem{port}{?Port} Port the server should listen to. If unbound \arg{Port} is unified with the selected free port. \termitem{workers}{+N} Defines the number of worker threads in the pool. Default is to use \arg{two} workers. Choosing the optimal value for best performance is a difficult task depending on the number of CPUs in your system and how much resources are required for processing a request. Too high numbers makes your system switch too often between threads or even swap if there is not enough memory to keep all threads in memory, while a too low number causes clients to wait unnecessary for other clients to complete. See also http_workers/2. \termitem{timeout}{+SecondsOrInfinite} Determines the maximum period of inactivity handling a request. If no data arrives within the specified time since the last data arrived the connection raises an exception, the worker discards the client and returns to the pool-queue for a new client. Default is \const{infinite}, making each worker wait forever for a request to complete. Without a timeout, a worker may wait forever on an a client that doesn't complete its request. \termitem{keep_alive_timeout}{+SecondsOrInfinite} Maximum time to wait for new activity on \emph{Keep-Alive} connections. Choosing the correct value for this parameter is hard. Disabling Keep-Alive is bad for performance if the clients request multiple documents for a single page. This may ---for example-- be caused by HTML frames, HTML pages with images, associated CSS files, etc. Keeping a connection open in the threaded model however prevents the thread servicing the client servicing other clients. The default is 5 seconds. \termitem{local}{+KBytes} Size of the local-stack for the workers. Default is taken from the commandline option. \termitem{global}{+KBytes} Size of the global-stack for the workers. Default is taken from the commandline option. \termitem{trail}{+KBytes} Size of the trail-stack for the workers. Default is taken from the commandline option. \termitem{ssl}{+SSLOptions} Use SSL (Secure Socket Layer) rather than plan TCP/IP. A server created this way is accessed using the \const{https://} protocol. SSL allows for encrypted communication to avoid others from tapping the wire as well as improved authentication of client and server. The \arg{SSLOptions} option list is passed to ssl_init/3. The port option of the main option list is forwarded to the SSL layer. See the \pllib{ssl} library for details. \end{description} \predicate{http_server_property}{2}{?Port, ?Property} True if \arg{Property} is a property of the HTTP server running at \arg{Port}. Defined properties are: \begin{description} \termitem{goal}{:Goal} Goal used to start the server. This is often http_dispatch/1. \termitem{start_time}{?Time} Time-stamp when the server was created. See format_time/3 for creating a human-readable representation. \end{description} \predicate{http_workers}{2}{:Port, ?Workers} Query or manipulate the number of workers of the server identified by \arg{Port}. If \arg{Workers} is unbound it is unified with the number of running servers. If it is an integer greater than the current size of the worker pool new workers are created with the same specification as the running workers. If the number is less than the current size of the worker pool, this predicate inserts a number of `quit' requests in the queue, discarding the excess workers as they finish their jobs (i.e.\ no worker is abandoned while serving a client). This can be used to tune the number of workers for performance. Another possible application is to reduce the pool to one worker to facilitate easier debugging. \predicate{http_stop_server}{2}{+Port, +Options} Stop the HTTP server at Port. Halting a server is done \textit{gracefully}, which means that requests being processed are not abandoned. The \arg{Options} list is for future refinements of this predicate such as a forced immediate abort of the server, but is currently ignored. \predicate{http_current_worker}{2}{?Port, ?ThreadID} True if \arg{ThreadID} is the identifier of a Prolog thread serving \arg{Port}. This predicate is motivated to allow for the use of arbitrary interaction with the worker thread for development and statistics. \predicate{http_spawn}{2}{:Goal, +Spec} Continue handling this request in a new thread running \arg{Goal}. After http_spawn/2, the worker returns to the pool to process new requests. In its simplest form, \arg{Spec} is the name of a thread pool as defined by thread_pool_create/3. Alternatively it is an option list, whose options are passed to thread_create_in_pool/4 if \arg{Spec} contains \term{pool}{Pool} or to thread_create/3 of the pool option is not present. If the dispatch module is used (see \secref{httpdispatch}), spawning is normally specified as an option to the http_handler/3 registration. We recomment the use of thread pools. They allow registration of a set of threads using common characteristics, specify how many can be active and what to do if all threads are active. A typical application may define a small pool of threads with large stacks for computation intensive tasks, and a large pool of threads with small stacks to serve media. The declaration could be the one below, allowing for max 3 concurrent solvers and a maximum backlog of 5 and 30 tasks creating image thumbnails. \begin{code} :- use_module(library(thread_pool)). :- thread_pool_create(compute, 3, [ local(20000), global(100000), trail(50000), backlog(5) ]). :- thread_pool_create(media, 30, [ local(100), global(100), trail(100), backlog(100) ]). :- http_handler('/solve', solve, [spawn(compute)]). :- http_handler('/thumbnail', thumbnail, [spawn(media)]). \end{code} \end{description} \subsubsection{From an interactive Prolog session using XPCE} The \pllib{http/xpce_httpd.pl} provides the infrastructure to manage multiple clients with an event-driven control-structure. This version can be started from an interactive Prolog session, providing a comfortable infra-structure to debug the body of your server. It also allows the combination of an (XPCE-based) GUI with web-technology in one application. \begin{description} \predicate{http_server}{2}{:Goal, +Options} Create an instance of \class{interactive_httpd}. \arg{Options} must provide the \term{port}{?Port} option to specify the port the server should listen to. If \arg{Port} is unbound an arbitrary free port is selected and \arg{Port} is unified to this port-number. Currently no options are defined. \end{description} The file \file{demo_xpce} gives a typical example of this wrapper, assuming \file{demo_body} defines the predicate reply/1. \begin{code} :- use_module(xpce_httpd). :- use_module(demo_body). server(Port) :- http_server(reply, Port, []). \end{code} The created server opens a server socket at the selected address and waits for incoming connections. On each accepted connection it collects input until an HTTP request is complete. Then it opens an input stream on the collected data and using the output stream directed to the XPCE \class{socket} it calls http_wrapper/5. This approach is fundamentally different compared to the other approaches: \begin{itemlist} \item [Server can handle multiple connections] When \emph{inetd} will start a server for each \emph{client}, and CGI starts a server for each \emph{request}, this approach starts a single server handling multiple clients. \item [Requests are serialised] All calls to \arg{Goal} are fully serialised, processing on behalf of a new client can only start after all previous requests are answered. This easier and quite acceptable if the server is mostly inactive and requests take not very long to process. \item [Lifetime of the server] The server lives as long as Prolog runs. \end{itemlist} \subsubsection{From (Unix) inetd} All modern Unix systems handle a large number of the services they run through the super-server \emph{inetd}. This program reads \file{/etc/inetd.conf} and opens server-sockets on all ports defined in this file. As a request comes in it accepts it and starts the associated server such that standard I/O refers to the socket. This approach has several advantages: \begin{itemlist} \item [Simplification of servers] Servers don't have to know about sockets and -operations. \item [Centralised authorisation] Using \emph{tcpwrappers} simple and effective firewalling of all services is realised. \item [Automatic start and monitor] The inetd automatically starts the server `just-in-time' and starts additional servers or restarts a crashed server according to the specifications. \end{itemlist} The very small generic script for handling inetd based connections is in \file{inetd_httpd}, defining http_server/1: \begin{description} \predicate{http_server}{2}{:Goal, +Options} Initialises and runs http_wrapper/5 in a loop until failure or end-of-file. This server does not support the \arg{Port} option as the port is specified with the \program{inetd} configuration. The only supported option is \arg{After}. \end{description} Here is the example from \file{demo_inetd} \begin{code} #!/usr/bin/pl -t main -q -f :- use_module(demo_body). :- use_module(inetd_httpd). main :- http_server(reply). \end{code} With the above file installed in \file{/home/jan/plhttp/demo_inetd}, the following line in \file{/etc/inetd} enables the server at port 4001 guarded by \emph{tcpwrappers}. After modifying inetd, send the daemon the \const{HUP} signal to make it reload its configuration. For more information, please check \manref{inetd.conf}{5}. \begin{code} 4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd \end{code} \subsubsection{MS-Windows} There are rumours that \emph{inetd} has been ported to Windows. \subsubsection{As CGI script} To be done. \subsubsection{Using a reverse proxy} \label{sec:proxy} There are three options for public deployment of a service. One is to run it on a dedicated machine on port 80, the standard HTTP port. The machine may be a virtual machine running ---for example--- under \url[VMWARE]{http://www.vmware.com} or \url[XEN]{http://www.cl.cam.ac.uk/research/srg/netos/xen/}. The (virtual) machine approach isolates security threads and allows for using a standard port. The server can also be hosted on a non-standard port such as 8000, or 8080. Using non-standard ports however may cause problems with intermediate proxy- and/or firewall policies. Isolation can be achieved using a Unix \jargon{chroot} environment. Another option, also recommended for \jargon{Tomcat} servers, is the use of Apache \jargon{reverse proxies}. This causes the main web-server to relay requests below a given URL location to our Prolog based server. This approach has several advantages: \begin{itemize} \item We can access the server on port 80, just as for a dedicated machine. We do not need a machine though and we only need access to the Apache configuration. \item As Apache is doing the front-line service, the Prolog server is normally protected from malformed HTTP requests that could result in denial of service or otherwise compromise the server. In addition, Apache can provide encodings such as compression to the outside world. \end{itemize} Note that the proxy technology can be combined with isolation methods such as dedicated machines, virtual machines and chroot jails. The proxy can also provide load balancing. \paragraph{Setting up a reverse proxy} The Apache reverse proxy setup is really simple. Ensure the modules \const{proxy} and \const{proxy_http} are loaded. Then add two simple rules to the server configuration. Below is an example that makes a PlDoc server on port 4000 available from the main Apache server at port 80. \begin{code} ProxyPass /pldoc/ http://localhost:4000/pldoc/ ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/ \end{code} Apache rewrites the HTTP headers passing by, but using the above rules it does not examine the content. This implies that URLs embedded in the (HTML) content must use relative addressing. If the locations on the public and Prolog server are the same (as in the example above) it is allowed to use absolute locations. I.e.\ \const{/pldoc/search} is ok, but \const{http://myhost.com:4000/pldoc/search} is \emph{not}. If the locations on the server differ, locations must be relative (i.e.\ not start with \chr{/}. This problem can also be solved using the contributed Apache module \const{proxy_html} that can be instructed to rewrite URLs embedded in HTML documents. In our experience, this is not troublefree as URLs can appear in many places in generated documents. JavaScript can create URLs on the fly, which makes rewriting virtually impossible. \subsection{The wrapper library} The body is called by the module \pllib{http/http_wrapper.pl}. This module realises the communication between the I/O streams and the body described in \secref{body}. The interface is realised by http_wrapper/5: \begin{description} \predicate{http_wrapper}{5}{:Goal, +In, +Out, -Connection, +Options} Handle an HTTP request where \arg{In} is an input stream from the client, \arg{Out} is an output stream to the client and \arg{Goal} defines the goal realising the body. \arg{Connection} is unified to \const{'Keep-alive'} if both ends of the connection want to continue the connection or \const{close} if either side wishes to close the connection. This predicate reads an HTTP request-header from \arg{In}, redirects current output to a memory file and then runs \exam{call(Goal, Request)}, watching for exceptions and failure. If \arg{Goal} executes successfully it generates a complete reply from the created output. Otherwise it generates an HTTP server error with additional context information derived from the exception. http_wrapper/5 supports the following options: \begin{description} \termitem{request}{-Request} Return the executed request to the caller. \termitem{peer}{+Peer} Add peer(Peer) to the request header handed to \arg{Goal}. The format of \arg{Peer} is defined by tcp_accept/3 from the clib package. \end{description} \predicate{http:request_expansion}{2}{+RequestIn, -RequestOut} This \jargon{multifile} hook predicate is called just before the goal that produces the body, while the output is already redirected to collect the reply. If it succeeds it must return a valid modified request. It is allowed to throw exceptions as defined in \secref{httpspecials}. It is intended for operations such as mapping paths, deny access for certain requests or manage cookies. If it writes output, these must be HTTP header fields that are added \emph{before} header fields written by the body. The example below is from the session management library (see \secref{httpsession}) sets a cookie. \begin{code} ..., format('Set-Cookie: ~w=~w; path=~w~n', [Cookie, SessionID, Path]), ..., \end{code} \predicate{http_current_request}{1}{-Request} Get access to the currently executing request. \arg{Request} is the same as handed to \arg{Goal} of http_wrapper/5 \emph{after} applying rewrite rules as defined by http:request_expansion/2. Raises an existence error if there is no request in progress. \predicate{http_relative_path}{2}{+AbsPath, -RelPath} Convert an absolute path (without host, fragment or search) into a path relative to the current page, defined as the path component from the current request (see http_current_request/1). This call is intended to create reusable components returning relative paths for easier support of reverse proxies. If ---for whatever reason--- the conversion is not possible it simply unifies \arg{RelPath} to \arg{AbsPath}. \end{description} \input{httphost} \input{httplog} \subsection{Debugging Servers} \label{sec:debug} The library \pllib{http/http_error.pl} defines a hook that decorates uncaught exceptions with a stack-trace. This will generate a \emph{500 internal server error} document with a stack-trace. To enable this feature, simply load this library. Please do note that providing error information to the user simplifies the job of a hacker trying to compromise your server. It is therefore not recommended to load this file by default. The example program \file{calc.pl} has the error handler loaded which can be triggered by forcing a divide-by-zero in the calculator. \subsection{Handling HTTP headers} \label{sec:httpheader} The library \pllib{http/http_header} provides primitives for parsing and composing HTTP headers. Its functionality is normally hidden by the other parts of the HTTP server and client libraries. We provide a brief overview of http_reply/3 which can be accessed from the reply body using an exception as explain in \secref{httpspecials}. \begin{description} \predicate{http_reply}{3}{+Type, +Stream, +HdrExtra} Compose a complete HTTP reply from the term \arg{Type} using additional headers from \arg{HdrExtra} to the output stream \arg{Stream}. \arg{ExtraHeader} is a list of \term{Field}{Value}. \arg{Type} is one of: \begin{description} \termitem{html}{+HTML} Produce a HTML page using print_html/1, normally generated using the \pllib{http/html_write} described in \secref{htmlwrite}. \termitem{file}{+MimeType, +Path} Reply the content of the given file, indicating the given MIME type. \termitem{tmp_file}{+MimeType, +Path} Similar to \term{File}{+MimeType, +Path}, but do not include a modification time header. \termitem{stream}{+Stream, +Len} Reply using the next \arg{Len} characters from \arg{Stream}. The user must provides the MIME type and other attributes through the \arg{ExtraHeader} argument. \termitem{cgi_stream}{+Stream, +Len} Similar to \term{stream}{+Stream, +Len}, but the data on \arg{Stream} must contain an HTTP header. \termitem{moved}{+URL} Generate a ``301 Moved Permanently'' page with the given target \arg{URL}. \termitem{moved_temporary}{+URL} Generate a ``302 Moved Temporary'' page with the given target \arg{URL}. \termitem{see_other}{+URL} Generate a ``303 See Other'' page with the given target \arg{URL}. \termitem{not_found}{+URL} Generate a ``404 Not Found'' page. \termitem{forbidden}{+URL} Generate a ``403 Forbidden'' page, denying access without challenging the client. \termitem{authorise}{+Method, +Realm} Generate a ``401 Authorization Required'', requesting the client to retry using proper credentials (i.e.\ user and password). \termitem{not_modified}{} Generate a ``304 Not Modified'' page, indicating the requested resource has not changed since the indicated time. \termitem{server_error}{+Error} Generate a ``500 Internal server error'' page with a message generated from a Prolog exception term (see print_message/2). \end{description} \end{description} \subsection{The \pllib{http/html_write} library} \label{sec:htmlwrite} \newcommand{\elem}[1]{\const{#1}} Producing output for the web in the form of an HTML document is a requirement for many Prolog programs. Just using format/2 is satisfactory as it leads to poorly readable programs generating poor HTML. This library is based on using DCG rules. The \pllib{http/html_write} structures the generation of HTML from a program. It is an extensible library, providing a \jargon{DCG} framework for generating legal HTML under (Prolog) program control. It is especially useful for the generation of structured pages (e.g.\ tables) from Prolog data structures. The normal way to use this library is through the DCG html//1. This non-terminal provides the central translation from a structured term with embedded calls to additional translation rules to a list of atoms that can then be printed using print_html/[1,2]. \begin{description} \dcg{html}{1}{:Spec} The DCG non-terminal html//1 is the main predicate of this library. It translates the specification for an HTML page into a list of atoms that can be written to a stream using print_html/[1,2]. The expansion rules of this predicate may be extended by defining the multifile DCG html_write:expand//1. \arg{Spec} is either a single specification or a list of single specifications. Using nested lists is not allowed to avoid ambiguity caused by the atom \const{[]} \begin{itemlist} \item [Atomic data] Atomic data is quoted using html_quoted//1. \item [\arg{Fmt} - \arg{Args}] \arg{Fmt} and \arg{Args} are used as format-specification and argument list to format/3. The result is quoted and added to the output list. \item [\bsl\arg{List}] Escape sequence to add atoms directly to the output list. This can be used to embed external HTML code or emit script output. \arg{List} is a list of the following terms: \begin{itemlist} \item [\arg{Fmt} - \arg{Args}] \arg{Fmt} and \arg{Args} are used as format-specification and argument list to format/3. The result is added to the output list. \item [\arg{Atomic}] Atomic values are added directly to the output list. \end{itemlist} \item [\bsl\arg{Term}] Invoke the non-terminal \arg{Term} in the calling module. This is the common mechanism to realise abstraction and modularisation in generating HTML. \item [\arg{Module}:\arg{Term}] Invoke the non-terminal :. This is similar to \bsl\arg{Term} but allows for invoking grammar rules in external packages. \item [\&(Entity)] Emit {\tt\&;} or {\tt\&\#;} if \arg{Entity} is an integer. SWI-Prolog atoms and strings are represented as Unicode. Explicit use of this construct is rarely needed because code-points that are not supported by the output encoding are automatically converted into character-entities. \item [\term{Tag}{Content}] Emit HTML element \arg{Tag} using \arg{Content} and no attributes. \arg{Content} is handed to html//1. See \secref{htmllayout} for details on the automatically generated layout. \item [\term{Tag}{Attributes, Content}] Emit HTML element \arg{Tag} using \arg{Attributes} and \arg{Content}. \arg{Attributes} is either a single attribute of a list of attributes. Each attributes is of the format \term{Name}{Value} or \mbox{\arg{Name}=\arg{Value}}. \arg{Value} is the atomic attribute value but allows for a limited functional notation: \begin{itemlist} \item [$A$ + $B$] Concatenation of $A$ and $B$ \item [\term{encode}{Atom}] Use www_form_encode/2 to create a valid URL component. \item [\term{location_by_id}{ID}] HTTP location of the HTTP handler with given ID. See http_location_by_id/2. \item [List] A list is handled as a URL `search' component. The list members are terms of the format \mbox{\arg{Name} = \arg{Value}} or \term{Name}{Value}. Values are encoded as in the encode option described above. \end{itemlist} The example below generates a URL that references the predicate set_lang/1 in the application with given parameters. The http_handler/3 declaration binds \const{/setlang} to the predicate set_lang/1 for which we provide a very simple implementation. The code between \const{...} is part of an HTML page showing the english flag which, when pressed, calls \term{set_lang}{Request} where \arg{Request} contains the search parameter \mbox{\const{lang} = \const{en}}. Note that the HTTP location (path) \const{/setlang} can be moved without affecting this code. \begin{code} :- http_handler('/setlang', set_lang, []). set_lang(Request) :- http_parameters(Request, [ lang(Lang, []) ]), http_session_retractall(lang(_)), http_session_assert(lang(Lang)), reply_html_page(title('Switched language'), p(['Switch language to ', Lang])). ... html(a(href(location_by_id(set_lang) + [lang(en)]), img(src('/www/images/flags/en.png')))), ... \end{code} \end{itemlist} \dcg{page}{2}{:HeadContent, :BodyContent} The DCG non-terminal page//2 generated a complete page, including the SGML \const{DOCTYPE} declaration. \arg{HeadContent} are elements to be placed in the \elem{head} element and \arg{BodyContent} are elements to be placed in the \elem{body} element. To achieve common style (background, page header and footer), it is possible to define DCG non-terminals head//1 and/or body//1. Non-terminal page//1 checks for the definition of these non-terminals in the module it is called from as well as in the \const{user} module. If no definition is found, it creates a head with only the \arg{HeadContent} (note that the \elem{title} is obligatory) and a \elem{body} with \const{bgcolor} set to \const{white} and the provided \arg{BodyContent}. Note that further customisation is easily achieved using html//1 directly as page//2 is (besides handling the hooks) defined as: \begin{code} page(Head, Body) --> html([ \['\n'], html([ head(Head), body(bgcolor(white), Body) ]) ]). \end{code} \dcg{page}{1}{:Contents} This version of the page/[1,2] only gives you the SGML \const{DOCTYPE} and the \elem{HTML} element. \arg{Contents} is used to generate both the head and body of the page. \dcg{html_begin}{1}{+Begin} Just open the given element. \arg{Begin} is either an atom or a compound term, In the latter case the arguments are used as arguments to the begin-tag. Some examples: \begin{code} html_begin(table) html_begin(table(border(2), align(center))) \end{code} This predicate provides an alternative to using the \bsl\arg{Command} syntax in the html//1 specification. The following two fragments are the same. The preferred solution depends on your preferences as well as whether the specification is generated or entered by the programmer. \begin{code} table(Rows) --> html(table([border(1), align(center), width('80%')], [ \table_header, \table_rows(Rows) ])). % or table(Rows) --> html_begin(table(border(1), align(center), width('80%'))), table_header, table_rows, html_end(table). \end{code} \dcg{html_end}{1}{+End} End an element. See html_begin/1 for details. \end{description} \subsubsection{Emitting HTML documents} The non-terminal html//1 translates a specification into a list of atoms and layout instructions. Currently the layout instructions are terms of the format \term{nl}{N}, requesting at least \arg{N} newlines. Multiple consecutive \term{nl}{1} terms are combined to an atom containing the maximum of the requested number of newline characters. To simplify handing the data to a client or storing it into a file, the following predicates are available from this library: \begin{description} \predicate{reply_html_page}{2}{:Head, :Body} Same as \term{reply_html_page}{default, Head, Body}. \predicate{reply_html_page}{3}{+Style, :Head, :Body} Writes an HTML page preceded by an HTTP header as required by \pllib{http_wrapper} (CGI-style). Here is a simple typical example: \begin{code} reply(Request) :- reply_html_page(title('Welcome'), [ h1('Welcome'), p('Welcome to our ...') ]). \end{code} The header and footer of the page can be hooked using the grammar-rules user:head//2 and user:body//2. The first argument passed to these hooks is the \arg{Style} argument of reply_html_page/3 and the second is the 2nd (for head//2) or 3rd (for body//2) argument of reply_html_page/3. These hooks can be used to restyle the page, typically by embedding the real body content in a \elem{div}. E.g., the following code provides a menu on top of each page of that is identified using the style \textit{myapp}. \begin{code} :- multifile user:body//2. user:body(myapp, Body) --> html(body([ div(id(top), \application_menu), div(id(content), Body) ])). \end{code} Redefining the \elem{head} can be used to pull in scripts, but typically html_requires//1 provides a more modular approach for pulling scripts and CSS-files. \predicate{print_html}{1}{+List} Print the token list to the Prolog current output stream. \predicate{print_html}{2}{+Stream, +List} Print the token list to the specified output stream \predicate{html_print_length}{2}{+List, -Length} When calling html_print/[1,2] on \arg{List}, \arg{Length} characters will be produced. Knowing the length is needed to provide the \const{Content-length} field of an HTTP reply-header. \end{description} \input{post.tex} \subsubsection{Adding rules for html//1} In some cases it is practical to extend the translations imposed by html//1. When using XPCE for example, it is comfortable to be able defining default translation to HTML for objects. We also used this technique to define translation rules for the output of the SWI-Prolog \pllib{sgml} package. The html//1 non-terminal first calls the multifile ruleset html_write:expand//1. \begin{description} \dcg{html_write:expand}{1}{+Spec} Hook to add additional translation rules for html//1. \dcg{html_quoted}{1}{+Atom} Emit the text in \arg{Atom}, inserting entity-references for the SGML special characters \verb$<&>$. \dcg{html_quoted_attribute}{1}{+Atom} Emit the text in \arg{Atom} suitable for use as an SGML attribute, inserting entity-references for the SGML special characters \verb$<&>"$. \end{description} \subsubsection{Generating layout} \label{sec:htmllayout} Though not strictly necessary, the library attempts to generate reasonable layout in SGML output. It does this only by inserting newlines before and after tags. It does this on the basis of the multifile predicate html_write:layout/3 \begin{description} \predicate{html_write:layout}{3}{+Tag, -Open, -Close} Specify the layout conventions for the element \arg{Tag}, which is a lowercase atom. \arg{Open} is a term \arg{Pre}-\arg{Post}. It defines that the element should have at least \arg{Pre} newline characters before and \arg{Post} after the tag. The \arg{Close} specification is similar, but in addition allows for the atom \const{-}, requesting the output generator to omit the close-tag altogether or \const{empty}, telling the library that the element has declared empty content. In this case the close-tag is not emitted either, but in addition html//1 interprets \arg{Arg} in \term{Tag}{Arg} as a list of attributes rather than the content. A tag that does not appear in this table is emitted without additional layout. See also print_html/[1,2]. Please consult the library source for examples. \end{description} \subsubsection{Examples} In the following example we will generate a table of Prolog predicates we find from the SWI-Prolog help system based on a keyword. The primary database is defined by the predicate predicate/5 We will make hyperlinks for the predicates pointing to their documentation. \begin{code} html_apropos(Kwd) :- findall(Pred, apropos_predicate(Kwd, Pred), Matches), phrase(apropos_page(Kwd, Matches), Tokens), print_html(Tokens). % emit page with title, header and table of matches apropos_page(Kwd, Matches) --> page([ title(['Predicates for ', Kwd]) ], [ h2(align(center), ['Predicates for ', Kwd]), table([ align(center), border(1), width('80%') ], [ tr([ th('Predicate'), th('Summary') ]) | \apropos_rows(Matches) ]) ]). % emit the rows for the body of the table. apropos_rows([]) --> []. apropos_rows([pred(Name, Arity, Summary)|T]) --> html([ tr([ td(\predref(Name/Arity)), td(em(Summary)) ]) ]), apropos_rows(T). % predref(Name/Arity) % % Emit Name/Arity as a hyperlink to % % /cgi-bin/plman?name=Name&arity=Arity % % we must do form-encoding for the name as it may contain illegal % characters. www_form_encode/2 is defined in library(url). predref(Name/Arity) --> { www_form_encode(Name, Encoded), sformat(Href, '/cgi-bin/plman?name=~w&arity=~w', [Encoded, Arity]) }, html(a(href(Href), [Name, /, Arity])). % Find predicates from a keyword. '$apropos_match' is an internal % undocumented predicate. apropos_predicate(Pattern, pred(Name, Arity, Summary)) :- predicate(Name, Arity, Summary, _, _), ( '$apropos_match'(Pattern, Name) -> true ; '$apropos_match'(Pattern, Summary) ). \end{code} \subsubsection{Remarks on the \pllib{http/html_write} library} This library is the result of various attempts to reach at a more satisfactory and Prolog-minded way to produce HTML text from a program. We have been using Prolog for the generation of web pages in a number of projects. Just using format/2 never was a real option, generating error-prone HTML from clumsy syntax. We started with a layer on top of format/2, keeping track of the current nesting and thus always capable of properly closing the environment. DCG based translation however naturally exploits Prolog's term-rewriting primitives. If generation fails for whatever reason it is easy to produce an alternative document (for example holding an error message). The approach presented in this library has been used in combination with \pllib{http/httpd} in three projects: viewing RDF in a browser, selecting fragments from an analysed document and presenting parts of the XPCE documentation using a browser. It has proven to be able to deal with generating pages quickly and comfortably. In a future version we will probably define a goal_expansion/2 to do compile-time optimisation of the library. Quotation of known text and invocation of sub-rules using the \bsl\arg{RuleSet} and : operators are costly operations in the analysis that can be done at compile-time. \input{jswrite} \input{httppath} \input{htmlhead} \input{httppwp} \subsection{Security} Writing servers is an inherently dangerous job that should be carried out with some considerations. You have basically started a program on a public terminal and invited strangers to use it. When using the interactive server or inetd based server the server runs under your privileges. Using CGI scripted it runs with the privileges of your web-server. Though it should not be possible to fatally compromise a Unix machine using user privileges, getting unconstrained access to the system is highly undesirable. Symbolic languages have an additional handicap in their inherent possibilities to modify the running program and dynamically create goals (this also applies to the popular perl and java scripting languages). Here are some guidelines. \begin{itemlist} \item [Check your input] Hardly anything can go wrong if you check the validity of query-arguments before formulating an answer. \item [Check filenames] If part of the query consists of filenames or directories, check them. This also applies to files you only read. Passing names as \file{/etc/passwd}, but also \file{../../../../../etc/passwd} are tried by experienced hackers to learn about the system they want to attack. So, expand provided names using absolute_file_name/[2,3] and verify they are inside a folder reserved for the server. Avoid symbolic links from this subtree to the outside world. The example below checks validity of filenames. The first call ensures proper canonisation of the paths to avoid an mismatch due to symbolic links or other filesystem ambiguities. \begin{code} check_file(File) :- absolute_file_name('/path/to/reserved/area', Reserved), absolute_file_name(File, Tried), atom_concat(Reserved, _, Tried). \end{code} \item [Check scripts] Should input in any way activate external scripts using shell/1 or \exam{open(pipe(Command), ...)}, verify the argument once more. \item [Check meta-calling] \emph{The} attractive situation for you and your attacker is below: \begin{code} reply(Query) :- member(search(Args), Query), member(action=Action, Query), member(arg=Arg, Query), call(Action, Arg). % NEVER EVER DO THIS! \end{code} All your attacker has to do is specify \arg{Action} as \const{shell} and \arg{Arg} as \const{/bin/sh} and he has an uncontrolled shell! \end{itemlist} \subsection{Tips and tricks} \begin{itemlist} \item [URL Locations] With an application in mind, it is tempting to make all URL locations short and directly connected to the root (\const{/}). This is \emph{not} a good idea. It is adviced to have all locations in a server below a directory with an informative name. Consider to make the root location something that can be changed using a global setting. \begin{itemize} \item Page generating code can easily be reused. Using locations directly below the root however increases the likelihood of conflicts. \item Multiple servers can be placed behind the same public server as explained in \secref{proxy}. Using a common and fairly unique root, redirection is much easier and less likely to lead to conflicts. \end{itemize} \item [Debugging] Please check the section \url[``Thread-support library(threadutil)'']{http://gollem.science.uva.nl/SWI-Prolog/Manual/thutil.html} of the SWI-Prolog reference manual. \end{itemlist} \section{Transfer encodings} \label{sec:transfer} \index{chunked,encoding}% \index{deflate,encoding}% The HTTP protocol provides for \jargon{transfer encodings}. These define filters applied to the data described by the \const{Content-type}. The two most popular transfer encodings are \const{chunked} and \const{deflate}. The \const{chunked} encoding avoids the need for a \const{Content-length} header, sending the data in chunks, each of which is preceded by a length. The \const{deflate} encoding provides compression. Transfer-encodings are supported by filters defined as foreign libraries that realise an encoding/decoding stream on top of another stream. Currently there are two such libraries: \pllib{http/http_chunked.pl} and \pllib{zlib.pl}. There is an emerging hook interface dealing with transfer encodings. The \pllib{http/http_chunked.pl} provides a hook used by \pllib{http/http_open.pl} to support chunked encoding in http_open/3. Note that both \file{http_open.pl} \emph{and} \file{http_chunked.pl} must be loaded for http_open/3 to support chunked encoding. \subsection{The \pllib{http/http_chunked} library} \begin{description} \predicate{http_chunked_open}{3}{+RawStream, -DataStream, +Options} Create a stream to realise HTTP chunked encoding or decoding. The technique is similar to library(zlib), using a Prolog stream as a filter on another stream. See online documentation at \url{http://gollem.science.uva.nl/SWI-Prolog/pldoc/} for details. \end{description} \input{json.tex} \section{Status} The SWI-Prolog HTTP library is in active use in a large number of projects. It is considered one of the SWI-Prolog core libraries that is actively maintained and regularly extended with new features. This is particularly true for the multi-threaded server. The XPCE and inetd based servers are not widely used. This library is by no means complete and you are free to extend it. \printindex \end{document}