Selasa, 04 Januari 2011

Open Directory Project

The Open Directory Project (ODP), also known as Dmoz (from directory.mozilla.org, its original domain name), is a multilingual open content directory of World Wide Web links. It is owned by Netscape but it is constructed and maintained by a community of volunteer editors.

ODP uses a hierarchical ontology scheme for organizing site listings. Listings on a similar topic are grouped into categories which can then include smaller categories.

ODP was founded in the United States as Gnuhoo by Rich Skrenta and Bob Truel in 1998 while they were both working as engineers for Sun Microsystems. Chris Tolles, who worked at Sun Microsystems as the head of marketing for network security products, also signed on in 1998 as a co-founder of Gnuhoo along with co-founders Bryn Dole and Jeremy Wenokur. Skrenta had developed TASS, an ancestor of tin, the popular threaded Usenet newsreader for Unix systems. Coincidentally, the original category structure of the Gnuhoo directory was based loosely on the structure of Usenet newsgroups then in existence.

The Gnuhoo directory went live on June 5, 1998. After a Slashdot article suggested that Gnuhoo had nothing in common with the spirit of free software, for which the GNU project was known, Richard Stallman and the Free Software Foundation objected to the use of Gnu. So Gnuhoo was changed to NewHoo. Yahoo! then objected to the use of "Hoo" in the name, prompting them to switch the name again. ZURL was the likely choice. However, before the switch to ZURL, NewHoo was acquired by Netscape Communications Corporation in October 1998 and became the Open Directory Project. Netscape released the ODP data under the Open Directory License. Netscape was acquired by AOL shortly thereafter and ODP was one of the assets included in the acquisition. AOL later merged with Time-Warner.

Content

Gnuhoo borrowed the basic outline for its initial ontology from Usenet. For example, the topic covered by the comp.ai.alife newsgroup was represented by the category Computers/AI/Artificial_Life. The original divisions were for Adult, Arts, Business, Computers, Games, Health, Home, News, Recreation, Reference, Regional, Science, Shopping, Society and Sports. While these fifteen top-level categories have remained intact, the ontology of second- and lower-level categories has undergone a gradual evolution; significant changes are initiated by discussion among editors and then implemented when consensus has been reached.

In July 1998, the directory became multilingual with the addition of the World top-level category. The remainder of the directory lists only English language sites. By May 2005, seventy-five languages were represented. The growth rate of the non-English components of the directory has been greater than the English component since 2002. While the English component of the directory held almost 75% of the sites in 2003, the World level grew to over 1.5 million sites as of May 2005, forming roughly one-third of the directory. The ontology in non-English categories generally mirrors that of the English directory, although exceptions which reflect language differences are quite common.

Several of the top-level categories have unique characteristics. The Adult category is not present on the directory homepage but it is fully available in the RDF dump that ODP provides. While the bulk of the directory is categorized primarily by topic, the Regional category is categorized primarily by region. This has led many to view ODP as two parallel directories: Regional and Topical.
Mozzie, DMOZ's mascot

On November 14, 2000, a special directory within the Open Directory was created for people under 18 years of age. Key factors distinguishing this "Kids and Teens" area from the main directory are:

    * stricter guidelines which limit the listing of sites to those which are targeted or "appropriate" for people under 18 years of age;
    * category names as well as site descriptions use vocabulary which is "age appropriate";
    * age tags on each listing distinguish content appropriate for kids (age 12 and under), teens (13 to 15 years old) and mature teens (16 to 18 years old);
    * Kids and Teens content is available as a separate RDF dump;
    * editing permissions are such that the community is parallel to that of the Open Directory.

By May 2005, this portion of the Open Directory included over 32,000 site listings.

Since early 2004, the whole site has been in UTF-8 encoding. Prior to this, the encoding used to be ISO 8859-1 for English language categories and a language-dependent character set for other languages. The RDF dumps have been encoded in UTF-8 since early 2000.


Directory listings are maintained by editors. While some editors focus on the addition of new listings, others focus on maintaining the existing listings. This includes tasks such as the editing of individual listings to correct spelling and/or grammatical errors, as well as monitoring the status of linked sites. Still others go through site submissions to remove spam and duplicate submissions.

Robozilla is a Web crawler written to check the status of all sites listed in ODP. Periodically, Robozilla will flag sites which appear to have moved or disappeared and editors follow up to check the sites and take action. This process is critical for the directory in striving to achieve one of its founding goals: to reduce the link rot in web directories. Shortly after each run, the sites marked with errors are automatically moved to the unreviewed queue where editors may investigate them when time permits.

Due to the popularity of the Open Directory and its resulting impact on search engine rankings (See PageRank), domains with lapsed registration that are listed on ODP have attracted domain hijacking, an issue that has been addressed by regularly removing expired domains from the directory.

While corporate funding and staff for the ODP have diminished in recent years, volunteers have created editing tools such as linkcheckers to supplement Robozilla, category crawlers, spellcheckers, search tools that directly sift a recent RDF dump, bookmarklets to help automate some editing functions, mozilla based add-ons, and tools to help work through unreviewed queues.

Tidak ada komentar:

Posting Komentar