Knowledge Organization with Wikipedia: Joining the Free Encyclopaedia and Digital Libraries
BearbeitenFive years after it was founded almost unintentionally, the free-content encyclopaedia Wikipedia is still growing with astonishing success. Many thousand volunteers have created 4 million articles in more than 100 languages and wikipedia.org is one of the 20 most visited websites worldwide, an impact libraries can only dream of (according to (Alexa, 2006) archive.org is ranked around 120 and loc.gov around 1,200, to mention only the most popular library sites). While libraries work with elaborate rules and experts there is no dedicated management in Wikipedia. Everyone can directly edit almost any article and standards only emerge by means of self-organization. This makes Wikipedia less reliable while libraries apperently provide objective and accurate information. Nevertheless many online searchers prefer Wikipedia as first reference. But the differences are not that wide: libraries and Wikipedia both aim to collect and arrange knowledge and try to make it accessible for everyone with information needs. Despite different methods they both share common goals so cooperation between libraries and Wikipedia makes sense.
In this presentation several strategies to connect Wikipedia and digital libraries are shown. In making Wikipedia part of a digital library (or the other way round), especially knowledge organisation systems play an important role. Since 2005 there is a cooperation between the German Wikipedia and the German National Library that involves the usage of Personennamendatei (PND) name authority file (Hengel and Pfeifer, 2005). Around 100,000 biographic articles in the German Wikipedia are equipped with metadata about persons that contain a PND number in around 20,000 cases. This number generates a link on which Wikipedia users can directely navigate to library catalogues to find publications from or about a specific person. A reciprocal link to Wikipedia articles is planned and the method could also be expanded to other authority files and maybe even subject headings (Voss, 2005). Subject indexing in Wikipedia is handled with so called categories. In fact this system of tagging Wikipedia articles is the first collaborative tagging system with multiple hierarchical relationships: a collaboratively created thesaurus (Voss, 2006). Based on this categories mappings between Wikipedia and other information systems could also be established. Methods of thesaurus and ontology matching will help to get concordances if legal restricions are solved. First experiments in Wikipedia show that indexing Wikipedia articles with a foreign classification is not suitable, but German Wikipedia's categories in the field of library and information science could successfully be mapped to the JITA Classification System of Library and Information Science.
Beside categories you can also directly use Wikipedia articles to index other resources. Wikipedia contains many articles about complex concepts but also articles about explicit entities like people, organisations, places and so on. Each article is identified by a unique name, so Wikipedia can also be seen as a controlled vocabulary. Homonyms are handled with disambiguation pages (http://en.wikipedia.org/wiki/Wikipedia:Disambiguation) that list all meanings of a word with links to the according articles, and synonyms are joined with redirects (http://en.wikipedia.org/wiki/Wikipedia:Redirect) which link to preferred terms. Wikipedia is also the first strict hypertextual encyclopaedia. Methods of network analysis and data mining will provide networks of concepts that can be used for browsing and mapping knowledge. An extension of MediaWiki (the software Wikipedia runs on) adds typed links and supports RDF (Völkel et al, 2006) – this lets you create semantic networks with a wiki and may integrate Wikipedia into the promised Semantic Web. Beside normal hyperlinks between Wikipedia articles there are specific links to other databases that can be used for integrated services. These special links mostly contain a unique identifier per article. Examples are ISBN and ISSN numbers, laws, patent numbers, digital object identifiers and links to the the Internet Movie Database (IMDb). A third type of links are links between Wikipedias in different languages (different language versions of Wikipedia are mostly independent and have different highlights and specialities).
Wikipedia provides a vast number of possibilities to connect its knowledge structure with other systems, especially digital libraries. However the Wiki paradigm with no firm rules and directions may be unfamiliar. Its self-organization allows flexible and quick solutions; virtually everything can be changed at any time but if there is no one willing to work on a specific task voluntarily then it won't be processed . Also essential to Wikipedia is its restriction to free content. All textual content is licensed under the GNU Free Documentation License (GFDL) that allows anyone to use, modify and republish the content as long as authors are named and derivated works are published under the same license. Keeping this in mind Wikipedia content can be used in portals, catalogue enrichment and other context. Connections with other databases facilitate browsing-structures over multiple information systems. The various prospects of collaboration are not even sighted.
References
BearbeitenAlexa.com (2006): Traffic Rankings. <http://www.alexa.com/site/ds/top_500> (accessed May, 2006)
Hengel, Christel and Pfeifer, Barbara (2005): Kooperation der Personennamendatei (PND) mit Wikipedia. In: Dialog mit Bibliotheken, volume 17, number 3, page 18-24.
Völkel, Max; Krötzsch, Markus; Vrandecic, Denny; Haller, Heiko; Studer, Rudi (2006): Semantic Wikipedia. In: Proceedings of the 15th international conference on World Wide Web, May 2006. <http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_id=1055>
Voss, Jakob (2005): Metadata with Personendaten and beyond. In: Proceedings of the first Wikimania conference, August 2005. <http://meta.wikimedia.org/wiki/Transwiki:Wikimania05/Paper-JV2>
Voss, Jakob (2006): Collaborative thesaurus tagging the Wikipedia way. April 2006. <http://arxiv.org/abs/cs/0604036>
About the Autor
BearbeitenJakob Voss studied computer science and library science at Humboldt-University, Berlin. He is member of the board of Wikimedia Germany and involved in the German Wikipedia since 2002.