URL is an abbreviation of "Uniform Resource Locator". URLs are used as references to documents that are located on the Internet, on intranets, on local filesystems etc., or they may even refer to application-internal resources. In the following subsections, different kinds of URLs together with their handling within w3browse are described. More about the gory details of URLs (and URIs) can be found in [URL/URI/IRI].
A generic URL consists of two parts: a scheme and a scheme-specific part. The scheme determines how the rest of the URL is to be interpreted. The syntax is:
scheme:rest-of-url
The rest-of-url
part may only consist of a
restricted set of characters such as letters, digits and a few graphic
symbols, all taken from the US-ASCII character set. Other characters have to
be escaped, that is, they are replaced by another sequence of
characters which represent that character, e.g. a space is replaced by
%20
. Unescaping reverses this process.
Scheme-specific parts that consist of different components, such as those of
server-based URLs (see below), may require certain
reserved characters to be escaped too if they are to be used within a
component.
Any URL that does not fit into another category is treated as a generic URL, e.g.
javascript:alert('This%20is%20an%20alert.') about:cookies
Note that some older web-browsers do not recognize URL-escaped characters
within the javascript
scheme.
Many URLs are used to access resources that are provided by servers which are located on a network. The schemes of such server-based URLs are usually named by the protocol that is used for the transport and share a common syntax:
scheme://user:password@host:port/directory/basename?query#fragment
Most parts of such a URL are optional, the shortest useful form is
scheme://host/
, e.g.
http://localhost/ http://www.aksware.de/
The following schemes that are supported by w3browse fall into the category of server-based URLs:
http
https
ftp
ftps
wsp
wsps
wstp
wstps
The URL part user:password@host:port
is sometimes called netloc (network location) and is used to
specify the address of a server. The mandatory subpart host
allows the DNS name or IP address of a host to be specified. The optional
subpart :port
may specify a different port number in
case the desired service is not available on the scheme-specific default
port. The leading optional subpart user:password@
specifies a user-id and a password for login-based schemes such as
ftp
and ftps
. When used together with other
schemes, w3browse generates an appropriate HTTP
Authorization:
header while connecting to such a server.
The part /directory/basename
of a URL
is commonly known as path and may be regarded as a hierarchy of
documents on the server, but note that this hierarchy is not necessarily
on or part of a filesystem. It is up to the server to decide what kind of
action to perform when a certain path (and query) is requested. The subparts
directory/
and basename
are
each optional. directory
is actually a sequence of
names that are separated by slashes (/
).
The so-called query part ?query
of a
URL has often the form
?name1=value1&name2=value2&...
where named parameters are used to transfer values, e.g. entered into an HTML formular, back to a server, e.g. in order to perform a search.
The last URL part #fragment
is not sent to a
server, instead it is used by the requestor to identify or address a part of
the retrieved document. The exact interpretation of the fragment
identifier depends on the content-type of that document.
URLs of type file are used to access files on the local
filesystem. The syntax of such URLs is the same as for server-based URLs, but
because there is no server involved, the netloc part can be left empty or may
be set to the value localhost
. The following three forms are all
valid and are normalized to the last one by w3browse:
file://localhost/usr/share/doc/ file:///usr/share/doc/ file:/usr/share/doc/
The part following the file:
prefix is really a
filename or directory, so all variants of them are
also valid here, but the path components have to be escaped if they contain
special characters, e.g.
file:///c:/documents%20and%20settings/
The shortest useful form is file:/
and
refers to the root of the local filesystem.
Mailto URLs are used to denote e-mail addresses and have the following general format (as implemented in w3browse):
mailto:e-mail-address?from=FROM&to=TO&cc=CC&subject=SUBJECT&body=BODY
Some or all parts of the query part may be missing, but the
e-mail-address should be given because it is the primary
To:
header field of an e-mail message. The to=
and
cc=
parts may be repeated multiple times. w3browse
invokes its e-mail composer
automatically when following such a link, e.g.
mailto:aleks_at_aksware_dot_de mailto:support(at)aksware(dot)de?subject=w3browse mailto:
The parameter MailDir of the dialog "Open URL Window" and further settings that have been made within the "e-Mail Application" for that environment are in effect when the e-mail composer is invoked.
These kinds of URLs are special to w3browse and are used to refer to certain application-internal resources. The syntax of so-called internal URLs is similar to that of server-based URLs:
internal://netloc/directory/basename?query#fragment
But in this case, the netloc
part identifies a
certain subsystem of the application, e.g. help
or mail
.
The following internal URLs are available and are given as URL prefixes
together with their respective function:
internal://help/
internal://admin/
internal://mail/
Another set of special URLs is introduced by the prefix
about:
and shares its syntax with generic
URLs. These URLs are used to provide some shortcuts to other internal
resources and applications:
about:
internal://help/about.html
about:bookmarks
internal://help/bookmarks.html
about:help
internal://help/index.html
about:config
internal://admin/w3bconfig/
about:blank
about:cookies
about:authorization
A detailed description of all kinds of internal applications together with ways of how to access them is given in chapter "Internal Applications".
The protocols wstp
and wstps
are currently not
implemented natively in w3browse, but they may be used in
connection with proxy-servers and gateways, because in this case a URL is
just handed over to the peer, which has then to perform the dirty work.