In the old folk tale of the stone soup, hungry travellers arrive in a village. The villagers were too occupied to notice their arrival. As a means to attract attention the travellers set out to prepare a soup in the main square. At their disposal they had a kettle and some stones. They decided to make a ‘stone soup’. That proved sufficient to get the attention they needed. Everybody agreed that the soup could use some additional flavours, so the villagers started adding their share of garnishes and at the end of the day all enjoyed a delicious soup.
This tale illustrates the power of collaboration. The preparation of the stone soup is a nice metaphor for the creation of Wikipedia entries. A Wikipage typically starts with some basic information, called a stub. Different people start adding little facts to it (ingredients/flavours) and finally by community collaboration a quality entry in Wikipedia arises. From its inception, Wikipedia has taken an approach of gradual growth in distributing knowledge. Topic focused, encyclopaedic articles naturally lend themselves to interlinking and to evolve over time. Community ownership ensures broad access and, in the great majority of cases, results in high quality.
Since 2005, Wikipedians have made a concerted effort to organise and improve the content of biomedically relevant articles via, for example, WikiProject Molecular and Cellular Biology and WikiProject Medicine). Recognizing the value of Wikipedia as a repository and foundry of this kind of knowledge, the NIH funded the Gene Wiki project in 2010 to help continue to stimulate growth and improve content focused on human genes.
Now in its second iteration, the Gene Wiki project is coordinating with the recently released Wikidata platform as it continues to advance its goals of democratizing access to biomedical knowledge. Wikidata is the centralized linked knowledge base for Wikimedia projects, such as all different language Wikipedias. Structured elements of Wikipedia articles, such as tables, can now be built dynamically from knowledge captured in Wikidata. As with Wikipedia articles, any Wikidata entry can be edited by anyone (both humans and computer programs). In the other direction, Wikidata provides interfaces, including a SPARQL endpoint, for external applications to query.
The anatomy of a Wikidata entry consists of a label, a short description, possible aliases and a set of statements (Figure 1). A statement (Figure 2) captures properties with their values, references and qualifiers to provide context and provenance. We are currently converting authoritative resources on genes, proteins, drugs and disease into the Wikidata data model. We do this by creating so called bots, software written in Python, which interact with the Wikidata API.
As Wikipedia provides open access to an evolving collection of human reada ble articles, Wikidata provides access to an evolving trove of structured data. Together, these public resources provide the means for research communities to efficiently share their insights with each other as well as the public at large. The knowledge is there to use, outside the scope of any professional limitation. Not only is this a great way of disseminating knowledge, it also opens up scientific knowledge for public scrutiny. Everyone has access to the data, to references about its descent, and to the discussion pages behind each Wikidata item where they can engage in discussing the quality of the knowledge added. Community input is sent back to the original data owners, which has already led to improvements in the source data. We are working to promote this two-way traffic in such a manner that it leads to higher quality scientific data and thus improves our collective understanding of the world around us. Our ultimate goal is to make Wikidata the central hub for scientific data in the life sciences (Figure 3).
Wikidata is the stone soup of scientific data. By putting Wikidata on our central square, locals open up their data silos to add flavors and to create a delicious soup.
Team: Andra Waagmeester (1) , Sebastian Burgstaller-Muehlbacher (2), Timothy Putman (2) Elvira Mitraka (3), Justin Leong (4), Paul Pavlidis (4), Lynn Schriml (3) , Benjamin Good (2), Andrew I. Su (2)