URL Syntax

URL is an abbreviation of "Uniform Resource Locator". URLs are used as references to documents that are located on the Internet, on intranets, on local filesystems etc., or they may even refer to application-internal resources. In the following subsections, different kinds of URLs together with their handling within w3browse are described. More about the gory details of URLs (and URIs) can be found in [URL/URI/IRI].

Generic URLs

A generic URL consists of two parts: a scheme and a scheme-specific part. The scheme determines how the rest of the URL is to be interpreted. The syntax is:

scheme:rest-of-url

The rest-of-url part may only consist of a restricted set of characters such as letters, digits and a few graphic symbols, all taken from the US-ASCII character set. Other characters have to be escaped, that is, they are replaced by another sequence of characters which represent that character, e.g. a space is replaced by %20. Unescaping reverses this process. Scheme-specific parts that consist of different components, such as those of server-based URLs (see below), may require certain reserved characters to be escaped too if they are to be used within a component.

Any URL that does not fit into another category is treated as a generic URL, e.g.

javascript:alert('This%20is%20an%20alert.')
about:cookies

Note that some older web-browsers do not recognize URL-escaped characters within the javascript scheme.

Server-based URLs

Many URLs are used to access resources that are provided by servers which are located on a network. The schemes of such server-based URLs are usually named by the protocol that is used for the transport and share a common syntax:

scheme://user:password@host:port/directory/basename?query#fragment

Most parts of such a URL are optional, the shortest useful form is scheme://host/, e.g.

http://localhost/
http://www.aksware.de/

The following schemes that are supported by w3browse fall into the category of server-based URLs:

http
HTTP (Hyper Text Transfer Protocol)
https
HTTP secured by SSL/TLS
ftp
FTP (File Transfer Protocol)
ftps
FTP secured by SSL/TLS
wsp
WSP (Wireless Session Protocol)
wsps
WSP secured by WTLS
wstp
WSP+WTP (Wireless Session and Transaction Protocol)
wstps
WSP+WTP secured by WTLS

The URL part user:password@host:port is sometimes called netloc (network location) and is used to specify the address of a server. The mandatory subpart host allows the DNS name or IP address of a host to be specified. The optional subpart :port may specify a different port number in case the desired service is not available on the scheme-specific default port. The leading optional subpart user:password@ specifies a user-id and a password for login-based schemes such as ftp and ftps. When used together with other schemes, w3browse generates an appropriate HTTP Authorization: header while connecting to such a server.

The part /directory/basename of a URL is commonly known as path and may be regarded as a hierarchy of documents on the server, but note that this hierarchy is not necessarily on or part of a filesystem. It is up to the server to decide what kind of action to perform when a certain path (and query) is requested. The subparts directory/ and basename are each optional. directory is actually a sequence of names that are separated by slashes (/).

The so-called query part ?query of a URL has often the form

?name1=value1&name2=value2&...

where named parameters are used to transfer values, e.g. entered into an HTML formular, back to a server, e.g. in order to perform a search.

The last URL part #fragment is not sent to a server, instead it is used by the requestor to identify or address a part of the retrieved document. The exact interpretation of the fragment identifier depends on the content-type of that document.

File URLs

URLs of type file are used to access files on the local filesystem. The syntax of such URLs is the same as for server-based URLs, but because there is no server involved, the netloc part can be left empty or may be set to the value localhost. The following three forms are all valid and are normalized to the last one by w3browse:

file://localhost/usr/share/doc/
file:///usr/share/doc/
file:/usr/share/doc/

The part following the file: prefix is really a filename or directory, so all variants of them are also valid here, but the path components have to be escaped if they contain special characters, e.g.

file:///c:/documents%20and%20settings/

The shortest useful form is file:/ and refers to the root of the local filesystem.

Mailto URLs

Mailto URLs are used to denote e-mail addresses and have the following general format (as implemented in w3browse):

mailto:e-mail-address?from=FROM&to=TO&cc=CC&subject=SUBJECT&body=BODY

Some or all parts of the query part may be missing, but the e-mail-address should be given because it is the primary To: header field of an e-mail message. The to= and cc= parts may be repeated multiple times. w3browse invokes its e-mail composer automatically when following such a link, e.g.

mailto:aleks_at_aksware_dot_de
mailto:support(at)aksware(dot)de?subject=w3browse
mailto:

The parameter MailDir of the dialog "Open URL Window" and further settings that have been made within the "e-Mail Application" for that environment are in effect when the e-mail composer is invoked.

Internal URLs

These kinds of URLs are special to w3browse and are used to refer to certain application-internal resources. The syntax of so-called internal URLs is similar to that of server-based URLs:

internal://netloc/directory/basename?query#fragment

But in this case, the netloc part identifies a certain subsystem of the application, e.g. help or mail. The following internal URLs are available and are given as URL prefixes together with their respective function:

internal://help/
Refers to the pages of the built-in help system.
internal://admin/
Provides access to a collection of administration tools.
internal://mail/
Accesses the instance of the e-mail application that is bound to a request context.

Another set of special URLs is introduced by the prefix about: and shares its syntax with generic URLs. These URLs are used to provide some shortcuts to other internal resources and applications:

about:
Redirects to internal://help/about.html
about:bookmarks
Redirects to internal://help/bookmarks.html
about:help
Redirects to internal://help/index.html
about:config
Redirects to internal://admin/w3bconfig/
about:blank
Returns an empty (blank) page.
about:cookies
Accesses the instance of the cookie manager that is bound, if enabled, to a request context.
about:authorization
Accesses the instance of the authorization manager that is bound to a request context.

A detailed description of all kinds of internal applications together with ways of how to access them is given in chapter "Internal Applications".

Restrictions

The protocols wstp and wstps are currently not implemented natively in w3browse, but they may be used in connection with proxy-servers and gateways, because in this case a URL is just handed over to the peer, which has then to perform the dirty work.