Document Status Distribution: General Release Title: Acorn URL fetcher API specification Drawing Number: 1215,220/FS Issue: 0.24 Author(s): Paul Wain Carl Elkins Stewart Brodie Issue Date: 04/08/98 Change Number: ECO 4082 Last Issue: 0.23 (25/06/98) Contents Issue History Overview Outstanding Issues Client to URL module interface Protocol module to URL module interface URL module to Protocol module interface URL module service calls URL module *-command URL Errors References Glossary Issue History 0.16 19/10/97 First formal version of specification based on uncontrolled textual programmers notes. 0.16a 20/10/97 Incorporated notes from ADH & SB. 0.19 17/11/97 Incorporated details of service calls. 0.20 20/11/97 Incorporated details of URL parsing SWI. 0.21 11/06/98 All other updates incorporated. 0.22 22/06/98 Comments after first review incorporated Added details of proxy enumeration SWI. 0.23 25/06/98 Comments from interested parties incorporated. 0.24 04/08/98 No longer live. Overview ======== The URL (Universal Resource Locator) module is a general purpose module for fetching data from various Internet services. This specification reflects the behaviour of version 0.39 of the URL_Fetcher module. The purpose of the module is to provide a uniform entry point into a set of "fetcher" protocols (eg FTP, HTTP, Gopher, NNTP, etc.), without the need for a client application to understand how that protocol works. This is done using a number of generalised URL SWIs. The fetcher protocols modules (hereafter just "protocol modules") with which the URL module communicates, are called only by the URL module itself. The entry points into the protocol modules have similar names to the entry points into the URL module, but these are NOT the same, despite similarities. The system structure is shown in figure 1 below. /----------------\ | Applications | \----------------/ | | v /---------------------------\ | URL module | \---------------------------/ ^ | ^ | | | | | | v | v /----------\ /----------\ | HTTP | | FTP | . . . . . \----------/ \----------/ Figure 1: URL Fetching system structure Each client fetch occurs with in the context of a 'session'. Each session is identified by a different session identifier. Client session identifiers are issued by the URL module upon request and remain valid until the client informs the URL module to discard the session. Subsequently, session identifiers may be re-issued by the URL module for new sessions. Only a single object fetch can be performed in any one given session. Sessions cannot be re-used by clients, even if a prior object fetch in that session has completed. The typical client usage of the system is: * Obtain a session identifier (SWI URL_Register) * Start fetching an object (SWI URL_GetURL) * Repeatedly, whilst multi-tasking if in the desktop environment: - Read blocks of data (SWI URL_ReadData) - Process that data * Discard session (SWI URL_Deregister) If an application decides it requires a premature termination (eg. the user asked the application to quit whilst an object was being downloaded), then the application calls SWI URL_Stop immediately and then discards the session with SWI URL_Deregister. Typical clients, such as web browsers, will, most likely, have several sessions active concurrently. The URL module uses its own session identifiers that are passed in many of the SWI interfaces to the protocol modules which are not those known to the client application - the URL module maintains its own private sessions into the protocol modules. Service calls are also provided to ease interaction between the URL module and the fetchers, mainly to inform other modules of the arrival or departure of a particular module. Each protocol module accepts data and returns results as per the HTTP protocol. Thus any extra client data associated with a request (passed in R4 to SWI URL_GetURL) will take the format of a (possibly empty) set of HTTP headers, an empty line and then the data; and each response will start with an HTTP/1.0 or HTTP/1.1 Response-Line of the format: "HTTP/1.0 200 OK" followed by various headers identifying the content-type of the retrieved data, followed by an empty line, followed by the data itself. Outstanding Issues ================== None. Client to URL module interface ============================== A typical client would be an application, such as a Web Browser. The following SWI calls provide the interface for an application to control and transfer data via the URL module. SWI URL_Register (&83E00) On entry: R0: flags (currently reserved, must be zero) On exit: R0: Reserved - currently zero. R1: Session identifier. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI initialises a client session with the URL module and provides the client with a session identifier that can be used to monitor the status of the URL module within that client's context. The session identifier is unique for each client session that is registered with URL and is also used as an identifier in subsequent interactions with the URL module. Multiple registration by the same client application is permitted. This will provide the client with multiple identifiers to the URL module. Calling this SWI does not result in the calling of any protocol module SWIs. The URL module imposes no limit on the number of concurrently registered sessions, other than having the required memory available in which to store details of the session. SWI URL_GetURL (&83E01) On entry: R0: flags bit 0 => R6 is valid. bit 1 => R5 holds length of data in R4 specified buffer, otherwise a single NUL terminated string in buffer. bits 31-2 => Reserved (0). R1: Session identifier. R2: bits 7-0 => Method (8-bit value, held in bits 7-0) This is protocol dependent. See table below for values. bits 15-8 => Method dependent bits 31-16 Reserved (must be zero). R3: URL The document we are after including the protocol. eg "http://www.acorn.co.uk/" R4: data block Data to send in addition to the URL. Validity is protocol and method dependent. R5: If R0:1 is set, length of data in R4 data block If R0:0 is clear, must be 2. R6: User Agent Pointer to string to use as User Agent identifier in request header if R0:0 must also be set. (NULL pointer or NULL string implies use default identifier - see below). On exit: R0: protocol status (as defined for SWI URL_Status, below) All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to instigate a transfer of data to or from (mainly from) a resource server. When this SWI has been called, the URL module checks the per-session and global proxy settings, looking for a match (see SWI URL_SetProxy for details on setting proxies and proxy conflict resolution). If no proxy is to be used, then URL looks for a protocol module which is capable of handling the URL specified by R3. If a proxy setting was found, then a pointer to the proxy URL is placed in R7, R0:31 is forced to value 1, and URL looks for a protocol module which is capable of handling the specified proxy URL. In both cases, if a suitable module cannot be located, the URL module generates an error. If a protocol module capable of handling the URL was found, then all client registers are passed onto the protocol module via the Protocol_GetData SWI call with the exceptions stated above for proxy handling. On exit, R0 will hold the status code returned by the protocol module. The extra data pointed to by R4 on entry is method and protocol specific. For example, in HTTP, the data comprises HTTP headers and, if appropriate, an entity body. Protocol modules should use this style wherever possible. Note that these headers do not include lines such as an HTTP Request-Line (ie. the "GET / HTTP/1.0" part. For example, when posting data to an HTTP URL as the result of a form submission on a web page, the web browser would supply a Content-Type header, Content-Length header, potentially some kind of encoding header, a blank line and then the entity body. The User Agent string pointed to by R6 if R0:0 is set, is in indication to the underlying protocol module of how the module should identify itself to remote systems. This controls the User-Agent header for the HTTP protocol module, for example. The protocol module is free to define its default identifier as it pleases, however, following the format of the HTTP User-Agent is recommended where possible and appropriate to the protocol. Modules may choose to ignore or amend any User-Agent string. For example, the AcornHTTP module will suffix the client's User-Agent with its own version number, resulting in complete identifiers such as: User-Agent: Acorn Browse/2.06 AcornHTTP/0.82 where the client only specified "Acorn Browse/2.06". Table of method numbers FTP HTTP and others Comment 1 RETR/LIST GET ("Get this object" operation) 2 n/a HEAD ("Get entity headers" operation) 3 n/a OPTIONS ("Get server options" operation) 4 n/a POST ("HTTP POST" operation) 5 n/a TRACE ("HTTP TRACE" operation) 6 n/a n/a (Reserved to Acorn - do not use) 7 n/a n/a (Reserved to Acorn - do not use) 8 STOR PUT ("Store this object" operation) 9 MKD n/a ("Create directory" operation) 10 RMD n/a ("Remove directory" operation) 11 RNFR/RNTO n/a ("Rename object" operation) 12 DELE DELETE ("Delete object" operation) 13 STOU n/a ("Store object unique" operation) Applications for new method codes should be made to Developer Support. The range 128-254 is reserved for private non-distributed modules. Method numbers 0 and 255 are reserved and must not be used. The list of methods specific to FTP quoted above are fully implemented in version 0.28 of the FTP Fetcher module. The list of methods specific to HTTP quoted above are fully implemented in version 0.82 of the AcornHTTP module. SWI URL_Status (&83E02) On entry: R0: flags bits 31-0 Reserved (0) R1: Session identifier On exit: R0: Status word: bit 0 => Connected to server. bit 1 => Sent request. bit 2 => Sent data. bit 3 => Initial response received. bit 4 => Transfer in progress. bit 5 => All data received. bit 6 => Transfer aborted. bits 31-7 Reserved (0). R1: Preserved. R2: server response code "HTTP" response code (200, 401 etc.) R3: bytes read so far (total body data count) R4: total bytes to be transferred in whole transaction if known (approximate value only), or -1 if unknown. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to monitor the transfer of data from a remote service. It is protocol independent - the exit status bits are common to all services. Clients must test this field bit-wise, since the value is cumulative. Clients may not assume that the states returned in R0 will progress in any particular combination or order. However, the likely progression during a fetch for a resource being retrieved over a network (when the bits are combined into a single decimal value) is: 0,1,3,7,15,31 and then R0:5 set upon completion, and R0:6 set at any stage when an error has occurred. Since each protocol module is returning its results according to the HTTP protocol, R2 can be treated as an HTTP response code whatever the URL being fetched. For example, the FileFetcher module will indicate file not found errors by setting the response code to 404 (HTTP's Not Found error code). Note that in the case of, for example, an HTTP 400 (Forbidden) return, some explanatory data may be received, too. If the amount of data to be received is unknown, R4 will contain -1, however R3 will contain the number of bytes received so far. The R4 value should be treated as approximate, since the exact interpretation varies between protocols. When this SWI is called, the URL module invokes the Protocol_Status SWI for the protocol module concerned with the request. SWI URL_ReadData (&83E03) On entry: R0: flags bits 31-0 Reserved (0) R1: Session identifier. R2: client buffer for received data R3: size of buffer pointed to by R2 On exit: R0: Status word (see SWI URL_Status) R2: Preserved. Contents of buffer modified. R4: Number of bytes transferred to R2 buffer. R5: Number of bytes still to be read to complete object (if known) or -1 if unknown. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. R2 is a pointer to a buffer on entry (and R3 is the size of the buffer), on exit the buffer contains the new data, R4 contains the amount of data written to the buffer and R5 contains the amount of data left to be read. If the amount of data left is unknown R5 will contain -1. R1 always returns the protocol status code. In the event of all the data being read (R5 = 0 on exit), a call to URL_Stop is not required as this is performed automatically when URL_Deregister is called for the client session. Once all data has been read a call to URL_Status can return no meaningful information, simply indicating that the transfer has completed. The data returned will take the form of a complete HTTP compatible response. Responses should use HTTP/1.0 if possible and avoid HTTP/1.1. For example, AcornHTTP will downgrade any higher version responses to HTTP/1.0, having taken care to remove any features applicable only to the higher version, such as chunked transfer encodings. When this SWI is called, the URL module invokes the Protocol_ReadData SWI for the protocol module concerned with the request. SWI URL_SetProxy (&83E04) On entry: R0: flags bits 31-0 Reserved (0). R1: Session identifier. R2: Address of buffer containing a URL base. R3: URL method to proxy (address of URL fetch identifier to be proxied). R4: 0 => Proxy request. 1 => Dont proxy request. All other values reserved. On exit: R0: Status word (see SWI URL_ParseURL for details) All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call is used to set up a proxy server to use for a session with the URL module . If R1 is zero then the proxy is considered global and is used for all sessions. If R1 is a valid session identifier then the proxy server for that session only is set. R2 is a pointer to a string containing the base URL to pass the request on to when a proxy request is made. This is of the form "http://www-cache.demon.co.uk:8080/" (note the trailing /). A common error is to omit the port number. If the port number is not specified, then the default port number is used. See discussion under URL_ProtocolRegister regarding how the default port number is derived. R3 is a pointer to a buffer containing the initial part of the URL to proxy - the URL scheme (eg "http:", "ftp:"). This system has the advantage that requests to certain hosts can be proxied and not others (eg by giving "http://www.acorn.co.uk/" as the scheme). However, if R4 is 1, this indicates that no matter how the proxy settings have been defined, requests to the base URL should not be proxied in this case (R3 is undefined). When a URL_GetURL request is received, the proxy settings are evaluated in the following order: 1 Client no-proxy 2 Client proxy 3 Global no-proxy 4 Global proxy This is to ensure all client settings override global settings and thus remain safe for the given client - ie. a client which sets up a proxy server and then defaults all other URLs to no-proxy, can, no matter how the global settings are changed, be sure of where requests will end up. If R2=0 on entry, then all proxy settings for the specified session are cleared. Calling this SWI does not result in any calls being made to protocol modules. SWI URL_Stop (&83E05) On entry: R0: flags bits 31-0 Reserved (0). R1: Session identifier. On exit: R0: Status word (see URL_ParseURL for details) All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call aborts a current request if there is one associated with the session identifier. In the event of no request being associated with the identifier, an error is generated. The purpose of this SWI call is to provide the client with a way of enforcing the termination of a request. It is not called by the client just because all the data associated with the request has finished being transferred, although it may do that if it so chooses. The URL_Stop call will be made automatically by the URL module when the session is deregistered by the client using SWI URL_Deregister. When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned with the request. URL_Deregister (&83E06) On entry: R0: flags bits 31-0 Reserved (0). R1: Session identifier. On exit: R0 Status word (see SWI URL_ParseURL for details) All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call deregisters the client session from the URL module, freeing up any information the URL module may have kept about the client session (eg proxy information). The session identifier ceases to be valid and becomes available for re-issue on a subsequent call to SWI URL_Register. When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned, if it has not already done so (eg during the processing of URL_Stop). SWI URL_ParseURL (&83E07) On entry: R0: flags bit 0 if set, R5 contains number of words in data block bits 31-1 Reserved (0). R1: Reason code. 0 => Return component buffer requirements. 1 => Return component data in specified buffers. 2 => Construct full URL from component buffers 3 => `Quick parse' R2: Pointer to base URL. R3: Pointer to URL relative to base URL (or NULL if none). R4: Pointer to data block of R5 words (unless R1=3) (see below). R5: If R0:0 set, size of R4 block in words. If R3 is non-NULL, it is assumed to point to a partial URL which needs to be resolved with respect to the base URL pointed to by R2. If R3 is NULL, then R2 is assumed to point to a full URL. On exit: R0: flags bits 31-0 Reserved (0). All other registers preserved. SWI is not re-entrant. Interrupt status undefined. Data block at R4 is updated in line with entry reason code. This SWI is used to parse URLs into their constituent parts, enabling clients to extract the various fields from the URL in a reliable manner. The call is also capable of resolving a relative URL to produce a fully-qualified URL, and of reconstructing a full URL from a set of components. The data block referred to above is either a block of integers which will be updated to contain the size of the required buffer for each element, or a block containing pointers to buffers for the actual data. All strings are zero-terminated and all lengths include space for the zero terminator. The number of entries in the block is specified in R5 if R0:0 is set on entry. If R0:0 is clear, then the default value of 10 is assumed. The format of the data block is: Offset Usage + 0 Fully canonicalised URL. + 4 URL protocol (eg. "http", "ftp") forced to lower-case. + 8 Hostname (eg. "www.acorn.com") forced to lower-case. + 12 Port (eg. "80"). + 16 Username - used for FTP authentication and mailto. + 20 Password - for FTP. + 24 Account - for FTP. + 28 Path (eg. "pub/riscos/releases") [See note]. + 32 Query - for HTTP, things after a query character. + 36 Fragment - for HTTP, things after a hash character. It is anticipated that this SWI will be called twice: the first time to find the lengths of the buffers, and the second to retrieve a copy of the data into the buffers. The URLs pointed to by R2 and R3 (if used) need not be fully-qualified. e.g. R2 may point to "www.acorn.com/browser/". The fully canonicalised version of the URL at block+0 refers to a fully-qualified, canonicalised version of it, which in this example would be "http://www.acorn.com/browser/". During canonicalisation, the port number will be elided if possible. See the discussion under SWI URL_ProtocolRegister for details of how URL discovers whether this is possible or not. [Note] The path will not start with a / unless the URL being parsed explicitly specified one this is in keeping with the URL specification, so for example, given the URL "http://www.acorn.com/browser/", then the path component is "browser/", and not "/browser/" ; the slash between the hostname and path is a separator only, not a part of either component. The entry reason codes are described below. URL_ParseURL_ReturnLengths (R1 = 0) When R1 is 0 on entry to the SWI, the data block is treated as a block of unsigned 32-bit integers. The contents of the block are ignored on entry, but on exit are filled in with the lengths of the individual components of the URL. A value of zero is stored for a field which does not exist; non-zero values include space for a zero-byte terminator. URL_ParseURL_ReturnData (R1 = 1) When R1 is 1 on entry to the SWI, the data block is treated as a block of pointers to buffers to receive the components of the URL. Each of the pointers in the data block must be either zero, indicating that the caller is not interested in that field, or point to a buffer which is sufficiently long to receive the field. The client can ensure this by having previously used reason code 0 to determine the length required. URL_ParseURL_ComposeFromComponents (R1 = 2) When R1 is 2 on entry to the SWI, the data block is treated as containing the broken down fields of a URL. Each of the pointers in the data block must be either zero or point to a buffer containing the value of the component, with the exception of the full URL field, which is a pointer to a buffer to receive the fully canonicalised URL. This buffer is filled in on exit. URL_ParseURL_QuickResolve (R1 = 3) When R1 is 3 on entry to the SWI, R4 points to a buffer for receiving the fully resolved URL. R5 is the length of the buffer. On exit, the buffer is filled in with the fully resolved URL obtained, and R5 is decreased by the length of the URL (including terminating zero byte). Hence R5 will be negative on exit if the buffer wasn't large enough. There is no fixed rule for calculating the minimum buffer length required for the answer. To guarantee that the buffer is large enough, it should be calculated as: length(base URL) + length(relative URL) + 4 Clients are strongly recommended to use this reason code if they wish to resolve a relative URL or canonicalise a URL and are only interested in the fully resolved and canonicalised form of the URL, since it is significantly faster than using reason code 0 and then reason code 1. SWI URL_EnumerateSchemes (&83E08) On entry: R0: flags (currently reserved, must be zero) R1: context (0 for first call) On exit: R0: status flags (currently unused) R1: context for next call (-1 if finished) R2: Pointer to read-only URL fetch scheme (if R1 is not -1) R3: Pointer to read-only help string (if R1 is not -1) R4: Protocol module SWI base (if R1 is not -1) R5: Protocol module version (*100, if R1 is not -1) All other registers preserved. SWI is not re-entrant Interrupt status is undefined URL will not cope gracefully if the protocol module list is updated between calls to this SWI (you may get duplicate modules or miss some out). SWI URL_EnumerateProxies (&83E09) On entry: R0: flags bit 0 if set, enumerate the no-proxy list. bits 31-1 reserved. Must be zero R1: Session identifier (or zero for global proxies/no-proxies) R2: context (0 for first call) On exit: R0: status flags (currently unused, but corrupted) R1: Preserved R2: context for next call (-1 if finished) R3: Pointer to read-only URL to proxy (if R2 is not -1) R4: Pointer to read-only proxy URL information (if R2 is not -1) All other registers preserved. SWI is not re-entrant Interrupt status is undefined URL will not cope gracefully if the proxy list is updated between calls to this SWI (you may get duplicate entries or miss some out). If R0:0 is set on entry, then R4 will be corrupted on exit and may not contain a meaningful value. The information pointed to by R3 and R4 where applicable is a copy of that which was passed to SWI URL_SetProxy when the setting was made. Protocol module to URL module interface ======================================= This section defines the calls provided by the URL module to enable a fetcher protocol module to interact with it. SWI URL_ProtocolRegister (&83E20) On entry: R0: flags bit 0 if set, R5 contains protocol flags word bit 1 if set, R6 contains the default port number bits 31-2 Reserved (0). R1: Protocol modules SWI base. R2: URL fetch scheme supported eg "http:" etc. R3: version number * 100 eg 116 => version 1.16 R4: informational string Up to 50 characters of descriptive text eg "Acorn HTTP fetcher". R5: Protocol flags word, if R0:0 set. See below. R6: Default port number, if R0:1 set. See below. On exit: R0: flags bits 31-0 Reserved (0). All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call is used by a protocol fetcher module to register its SWI base and the type of URL that it accepts with the URL module. The SWIs that are accessible from this SWI base are defined in the following section. If the module cannot be registered (eg another module is already claiming that URL base), then an error will be returned. R3 is an integer version number and R4 is a pointer to a string containing more information which will be displayed by the *URLProtoShow command (or 0 if no descriptive text is provided). Typically, it will be called during a protocol module's initialisation code or on a callback set from the module's initialisation code. If the protocol module is registered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules. If R0:0 is set, then R5 contains a protocol flags word. This is used to describe to URL how the resolver should treat URLs from this scheme. The current bits defined are: Bit Meaning when set 0 Path is NOT UNIX-like 1 No parsing should be performed on this scheme 2 Scheme allows "user@" to precede the hostname component 3 Hash (ASCII 35) allowed in hostname (eg. for file: URLs) 4 No hostname component (eg. mailto: URLs) 5 Remove *leading* ".." components in pathname. Note that the meanings of set bits are such that zero is a reasonable value to pass for unknown schemes. Note that if URL is requested to resolve URLs using schemes unknown to it, it will assume a protocol flags word value of zero. This may lead to inconsistent behaviour depending on whether the protocol module is loaded or not. If R0:1 is set, then R6 contains the default port number for this scheme. This is used by the URL resolving code to determine if explicitly specified port numbers can be elided from the URL. For example, when constructing the canonicalised form of "http://www.acorn.com:80/", the port bit is dropped as it serves no useful purpose, leaving "http://www.acorn.com/" The URL module is primed with knowledge of the following protocols: mailto:, telnet:, finger:, file:, filer_opendir:, filer_run:, local:, gopher:, ftp:, http:, https:, whois: It is not necessary for modules implementing those protocols to set either flag bit and hence no need for them to set R5 or R6. SWI URL_ProtocolDeregister (&83E21) On entry: R0: flags bits 31-0 Reserved (0). R1: SWI base On exit: R0: flags bits 31-0 Reserved (0). R1: number of sessions affected Number of client sessions that were using this module. All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call should be used by the protocol module to tell the URL module that it is no longer available. The URL module will raise the appropriate disconnect messages with its clients, and tell the protocol module the number of clients that were affected. Typically, it will be called during a protocol module's finalisation code. If the protocol module is deregistered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules. URL module to Protocol module interface ======================================= The protocol module SWI interface is only called by the URL module. URL module clients should never call the ReadData/Status/GetData/Stop SWIs directly. The protocol modules are required to supply a SWI interface. There are currently 4 SWIs that need to be supported which run from SWI_base to SWI_base+3. New SWIs common to all protocol modules will only be added at the low-end of the SWI range. Protocol modules must generate standard SWI not known error (error number &1E6) if they receive a call which they do not understand, so that the URL module can determine that they do not support the SWI. Note that there is no general requirement to use SWIs from offset 0 into a SWI chunk, although it makes sense to do this. Protocol modules which support multiple protocols should ensure that they do not place their internal "SWI bases" less than 16 SWIs apart to allow space to future expansion. eg. AcornHTTP registers http: as &83F80 and https: as &83F90. Protocol specific SWIs should be added at the top-end of the SWI chunk (ie start at SWI_base+63 and work down) - the AcornHTTP module uses that range to provide clients with access to its HTTP cookie management code, for example. NOTE: the Session identifiers used by the URL module to talk to the protocol modules are NOT the same identifiers used by clients to talk to the URL module. They are NOT interchangeable. SWI Protocol_GetData (SWI_base+0) On entry: R0: flags bits 30-0 => as specified by client in URL_GetURL bit 31 => R7 is valid. R1: Session identifier R2: method (See table earlier in document) R3: URL (including fetch scheme). R4: Pointer to block of data in addition to URL. R5: protocol dependent R6: protocol dependent R7: If R0:31 is set, proxy URL information. See below. On exit: R0: Protocol status word (see SWI URL_Status for details) All other registers are protocol dependent. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This call is used to start retrieving data. The protocol module should raise any events for the client via the session identifier provided in R1. The URL module calls this SWI in response to one of its clients calling SWI URL_GetURL. The proxy URL information specified in R7 (if R0:31 is set) gives the location of the proxy to be used in the format of a URL. For example: http://www-cache.demon.co.uk:8080/ This information is supplied by the URL module and not the client. The protocol module must note that on a proxied request, the target URL indicated by R3 may not have the same fetch scheme. For example, it might be an ftp: URL being proxied through an HTTP proxy service. SWI Protocol_Status (SWI_base+1) On entry: R0: flags R1 Session identifier. On exit: R0: Protocol status word (see SWI URL_Status for details) R2: As URL_Status. R3: As URL_Status. R4: As URL_Status. All other registers are preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This SWI is used to monitor the transfer of data from the remote service. It is protocol independent, with the exit status bits of R0 being common to all fetcher services. R2 should contain the remote servers most recent response code where possible ; note that even in the case of, for example, an HTTP 400 (Forbidden) response, some explanatory data may be received, and thus R3 may be non-zero. If the client is unknown to the protocol module then an error should be returned. If the clients last request has finished, but the client session has not yet been deregistered, then the protocol module should return the status code as of the time that the request finished (ie bit 6 or 5 will be set along with another combination if relevant). The URL module calls this SWI in response to one of its clients calling SWI URL_Status. SWI Protocol_ReadData (SWI_base+2) On entry: R0: flags R1: Session identifier R2: Address of clients data buffer. R3: Size of clients data buffer. On exit: R0 Protocol status word. R2: As URL_ReadData. R3: As URL_ReadData. R4: As URL_ReadData. R5: As URL_ReadData. All other registers are preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. The register usage and description is the same as for SWI URL_ReadData. The URL module calls this SWI in response to one of its clients calling SWI URL_ReadData. Protocol_Stop (SWI_base+3) On entry: R0: flags R1: Session identifier On exit: R0: Protocol status word (see SWI URL_Status for details) All other registers are preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This call aborts a current request if there is one associated with the session identifier. The URL module calls this SWI in response to one of its clients calling SWI URL_Deregister or SWI URL_Stop. URL Module Service Calls ======================== The URL fetcher system has been allocated a block of 256 service calls (&83E00-&83EFF). Two are currently defined. The other 254 are reserved by Acorn for future use. Service_URLProtocolModule (&83E00) This service call is issued by the URL protocol module to communicate important events to the protocol modules. On entry: R0: reason code Reason for the service call. R1: &83E00 (Service_URLProtocolModule) All other registers are reason code dependent. On exit: All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Protocol modules must ignore reason codes which they do not understand. Defined Reason Codes URLModuleStarted R0: 0 URL module has initialised. R1: &83E00 Service_URLProtocolModule R2: version Version number of URL module * 100 Upon receiving this service call, protocol modules should re-register with the new URL module by issuing SWI URL_ProtocolRegister as usual. This service call must not be claimed. It must assume that any previous registration is no longer valid. URLModuleDying R0: 1 URL module is dying. R1: &83E00 Service_URLProtocolModule R2: version Version number of URL module * 100 Upon receiving this service call, protocol modules should note that the URL module has gone away and not attempt to talk to it any more until a future Service_URLProtocolModule/URLModuleStarted service call arrives. This service call must not be claimed. All other reason codes are reserved to Acorn and must not be used. Service_URLProtocolModule_ProtocolModule (&83E01) On entry: R0: reason code R1: &83E01 (Service_URLProtocolModule_ProtocolModule) R2: URL fetch scheme (eg. "http:", "ftp:") R3: SWI base chunk of protocol module R4: Description of module as shown by *URLProtoShow On exit: All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Modules must ignore reason codes which they do not understand. Defined reason codes: URLProtocolModuleStarted R0: 0 Protocol module has just registered URLProtocolModuleDying R0: 1 Protocol module has just deregistered All other reason codes are reserved. URL Module *-command ==================== The URL module provides a single *-command. Syntax: *URLProtoShow Parameters: None Use: Display information on currently registered protocol modules. Help text: *URLProtoShow shows all the current protocols known and their SWI bases. Example: *URLProtoShow Base URL SwiBase Version Comment ============================================================================= --- 0x83e00 038 URL Acorn 1997-8 (Built: 07 May 1998) gopher: 0x508c0 010 Gopher Fetcher Acorn 1997-8 (Built: 17 Feb 1998) ftp: 0x4bd00 028 FTP Fetcher Acorn 1997-8 (Built: 19 Mar 1998) file: 0x83f40 038 File Fetcher Acorn 1997-8 (Built: 04 Jun 1998) http: 0x83f80 082 Acorn HTTP Acorn 1997-8 (Built: 07 May 1998) Related SWIs: SWI URL_EnumerateSchemes URL Errors ========== The URL module is allocated two ranges of error numbers, each range being 256 long. The first 32 errors are reserved to the URL module and the rest are reserved to Acorn protocol modules. Module Error range URL &80DE00 - &80DE1F HTTP &80DE20 - &80DE3F MAILTO &80DE40 - &80DE5F File &80DE60 - &80DE7F FTP &80DE80 - &80DE9F Gopher &80DEA0 - &80DEBF WhoIs &80DEC0 - &80DEDF Finger &80DEE0 - &80DEFF WAIS &81EF00 - &81EF1F HTTPS &81EF20 - &81EF3F News &81EF40 - &81EF5F Error numbers &81EF60-&81EFFF are reserved for Acorn use only. URL Module Errors Error Number Meaning &80DE00 Session ID not found. A client passed an unknown session ID in R1 to one of the URL module's SWIs. &80DE01 URL ran out of memory &80DE02 No matching fetcher for the URL could be found &80DE03 SWI not found (URL Module). URL attempted to call a fetcher's SWI and received a SWI not known error. &80DE04 Session already has had an object fetch performed in it. You cannot re-use this session. &80DE05 No fetch in progress for this session ID. You have called URL_ReadData or URL_Status having already terminated the fetch. &80DE06 SWI Method already exists. URL already knows of a module which provides this method for fetching - another cannot register. &80DE07 No fetch in progress for this session ID. You have not called URL_GetURL before URL_Stop,URL_ReadData or URL_Status. &80DE08 Message not found in Messages file. &80DE09 (No longer used) &80DE0A Unable to parse URL. Error numbers for protocol modules are not within the scope of this specification. Performance Targets =================== Final code size of the version described by this document should be about 20K. When fetches are active, more memory will be claimed from the RMA to record details of the session. The amount claimed depends on the URL being fetched plus the small overhead for the session information. Temporary workspace is claimed from the RMA as required for URL resolution equivalent to three times the total combined length of the base and relative URLs involved. Workspace is claimed from the RMA to store details of registered proxies. All session-specific memory, including proxy information, is freed when the session is terminated. References ========== The following RFC documents are of direct relevance to the URL module: RFC 1738 - Uniform Resource Locators RFC 1808 - Relative Uniform Resource Locators RFC 2068 - HyperText Transfer Protocol specification version 1.1 Glossary ======== FTP File Transfer Protocol - an application level protocol for the transfer of files between a remote host computer and a local client, as defined by RFC959. HTTP HyperText Transfer Protocol - a protocol designed to transfer resources ("documents") from a remote server machine to a local client, as defined by RFC1945 (version 1.0) and RFC2068 (version 1.1). HTTPS Secure HyperText Transfer Protocol - HTTP protocol over a communication channel encrypted using SSL. URL Uniform Resource Locator, as defined by RFC1738 - a subclass of URIs (Uniform Resource Identifiers, defined in RFC1630) which map onto network access protocols. More commonly, the addresses of objects on the World Wide Web. NNTP Network News Transfer Protocol, as defined by RFC977. Gopher The Internet Gopher Protocol - a distributed document search and retrieval protocol. SSL Secure Sockets Layer. A specification for encryption of communications on networks. WAIS Wide Area Information Servers, as defined by RFC1625.