Guidelines: URLsA URL (uniform resource locator) is a standardized address for a resource (e.g., a document, image, sound file or program) on the Internet. The URL system is based on the UNIX file addressing system, and thus, like that system, it is designed to be highly flexible and well suited for maximizing efficiency and usability. The guidelines for creating URLs for pages on LINMO-compatible web sites state that the URLs should be as simple to read, type, remember and guess as possible. They are: (1) A consistent system should be used. Consistency is a core principle of LINMO, as it enhances usability, credibility and elegance. Specific benefits of consistency of URLs are greater ease of remembering URLs and of finding specific pages, both for users and for the author during site development. (2) Descriptive names should be used. As is the case with consistency, this makes it easier to remember URLs and find specific pages. With a consistent and descriptive URL naming system, it is often possible for user to make accurate guesses as to the URLs for pages for which entries cannot be found in an index1. Likewise, it can sometimes be more convenient for a user to just type in a URL rather than navigate to an index page and then search the index for the desired topic. Together with consistency of naming, the use of descriptive names also helps minimize the problem of an author accidentally creating multiple pages about the same topic (which can easily happen on a site with hundreds of pages). Another possible advantage is that some search engines might consider descriptive names in their rankings. It is relatively easy to create a descriptive URL for each page on a LINMO-compatible web site, because, in contrast to hard copy (i.e., printed on paper) books and many conventional web sites, each page focuses on a single topic and is not a linear division of a topic of a sequence of multiple topics. (3) URLs should be kept as short as practical. This can contribute to making them easier to both type and remember. Brevity also facilitates URLs showing up in their entirety in the URL box at the top of a web browser without the need for horizontal scrolling. One convenient means for helping enforce brevity is setting a maximum number of words (e.g., four in the case of The Linux Information Project). Another is to use acronyms where appropriate. (4) Commonly-used acronyms should be used in URLs, particularly those that are as well known or better known than the set of words that they represent, because they can both shorten URLs and add to their clarity. However, acronyms that are not commonly used should be avoided, and care should be exercised regarding acronyms that can stand for multiple combinations of words that might be relevant to the topic. (5) Multiple words should be linked with underscores. This is because blank spaces are not acceptable in URLs, separated words are easier to read than non-separated words and underscores are the best type of separator. Periods should not be used, because they can cause confusion with the functional use of periods in URLs (e.g., to indicate filename extensions). Hyphens should be used only where they would ordinarily be used, i.e., in hyphenated words and phrases. (6) Only lower case letters, numerals and underscore characters should be used in URLs for web sites that are written in languages that use Roman or Cyrillic alphabets2. Lower case letters are consistent with the UNIX file naming system, and their consistent use, even in words that normally begin with or use upper case letters (such as proper nouns and acronyms), avoids possible confusion among users between lower and upper case letters in URLs. Other characters can have special meanings in URLs, such as slashes, dollar signs and periods, and they should thus be avoided. (7) Plurals and suffixes should generally be avoided except where they clarify meaning or serve to distinguish between similar terms with different meanings3. A major reason for this is to maintain consistency, which, in turn, facilitates finding pages by typing the URL into the browser address box. ________ 1With a well designed index, it should be an easy matter to find any desired page. However, errors can occur even in the best of indexes, and occasionally users make errors in using them. 2Most languages, including English and other Western European languages, are written with derivatives of the Roman alphabet. Russian, Bulgarian and several other Eastern European languages use the Cyrillic alphabet. In most languages written with other types of characters (e.g., Chinese, Japanese, Korean, Hindi, Arabic, Hebrew and Thai), there is no concept of upper case and lower case. A rapid increase is occurring in the number of web sites created in non-European languages, and this is being accompanied by a trend to use the characters of such languages in the URLs instead of Roman letters. 3For example, on The Linux Information Project (LINFO) web site, there are is a page with the URL string.html and another with the URL strings.html. The former is for a page about strings, which are sequences of characters; the latter is for a page about the strings command (i.e., the command named strings), which is used to extract strings from binary files (i.e., files that contain some non-text data). Created May 8, 2006. |