Wikidata: the new hub for cultural heritage
This article is by: Dr Martin Poulter, Wikimedian In Residence at the University of Oxford
There is a site that lets users create customised and unusual lists of art works: works of art whose title is an alliteration, self-portraits by female artists, watercolour paintings wider than they are tall, and so on. These queries do not use any gallery or museum’s web site or search interface but draw from many collections around the world. The art works can be presented in various ways, perhaps on a map of locations they depict, or in a timeline of their creation, colour-coded by the collection where they are held. The data are incomplete, but these are the early days of an ongoing and ambitious project to share data about cultural heritage—all of it.
Wikimedia is a family of charitable projects that are together building an archive of human knowledge and culture, freely shareable and reusable by anyone for any purpose. Wikipedia, the free encyclopedia, is only the best-known part of this effort. Wikidata is a free knowledge base, with facts and figures about tens of millions of items. These data are offered as freely as possible, with no restriction at all on their copying and reuse.
Already, large amounts of data about artworks are being shared by formal partnerships. The University of Barcelona have worked with Wikimedians to share data about Art Nouveau works, recognising that it is far better to have all these data in one place than scattered across various online and offline sources. The National Library of Wales has employed a Wikidata Visiting Scholar to share data about its artworks, including the people and places they depict. The Finnish National Gallery, the Rijksmuseum in Amsterdam and the National Galleries of Scotland are among the institutions who have either formally uploaded catalogue data to Wikidata, or made data freely available for import. To see the sizes of these shared catalogues, one just has to ask Wikidata.
Wikidata queries can be built using SPARQL, a database query language not for the faint-of-geek. However, there is an open community of users sharing and improving queries. The visualisations they create can be shared online or embedded inside other sites or apps. Developers can build applications for the public; easy to use, but offering a distinctive view of Wikidata’s web of knowledge.
One such application is Crotos, a family of tools generating image galleries and maps of art, filtered by format, artist, place depicted and other attributes. Crotos shows images of the art, so it only includes works with a digital image available in Wikimedia Commons. Wikidata itself has no such restriction: it describes art whether or not a freely-shareable scan is available.
So while the Wikidata site itself might not have mass appeal, the service it provides is gradually transforming the online world, providing a single source of data for some of the most popular web sites and apps. Those “infoboxes” summarising key facts and figures at the top of Wikipedia articles are increasingly being driven from Wikidata, so dates, locations and other facts can be entered in one place but appear on hundreds of sites.
The really exciting prospect is that of building visualisations and other interactive educational objects, integrating information from many collections and other data sources. Wikidata would be interesting enough as an art database, but it also shares bibliographic, genealogical, scientific, and other kinds of data, covering modern as well as historical topics. This allows combined queries, such as art by people born in a particular region and time period, or works depicting people described in a particular book.
Wikidata is massively multilingual, using language-independent identifiers and connecting these to names in hundreds of languages as well as to formal identifiers. In a way it is the ultimate authority file; a modern Rosetta Stone connecting identifiers from institutions’ authority files, scholarly databases and other catalogues (Hinojo (2015)).
There are thousands of properties that a Wikidata item can have. Just considering a small selection that are relevant to art and culture, it is clear that the number of possible queries is astronomical.
* People can also be connected to groups or organisations: member of, founder, employer, educated at.
With so many kinds of data, Wikidata draws in volunteer contributors with varying interests. Just as there are people who will sit down for an evening to improve a Wikipedia article or to categorise images on Wikimedia Commons, there are people fixing and improving Wikidata’s entries and queries. As with Wikipedia, Wikidata benefits from the intersection of different interests. Contributors speak different languages and have different background knowledge. Some are interested in a particular institution’s collection, while others are interested in a particular style of art, others in a given location or historic individual. Hence one entry can attract multiple contributors, each motivated by a different interest.
Over time, Wikidata’s role in Wikipedia will expand. Explore English Wikipedia and you find many list articles, such as List of works by Salvador Dalí or List of Hiberno-Saxon illuminated manuscripts. At the moment, these are all manually maintained, but a program—the ListeriaBot—has been created to turn Wikidata queries into lists suitable for Wikipedia: see for example this (draft) list of paintings of art galleries. Catalan Wikipedia, with a much smaller contributor base than the English language version, is already using the bot to write list articles such as Works of Jacob van Ruisdael, saving many hours of human effort. As automated creation of list articles becomes more widespread, cultural institutions that share catalogue data will help ensure the correctness and completeness of these articles.
Like Wikipedia, Wikidata depends on Verifiability: any statement of fact is expected to cite or link a credible published source. Hence it has active links to catalogues and other formally vetted sites, which usually supply more scholarly detail and primary research than Wikidata itself. So Wikidata is not a replacement for cultural institutions’ catalogues. The hub metaphor is apt: it is a central point, linking together disparate resources and giving them a useful shape. Its credibility will always depend on the formally vetted sources that it cites, and there will always be users who want to check what they read by following up the citations. In practice, this means that sharing ten thousand records with Wikidata is a way to get ten thousand incoming links to the institution’s own catalogue. What’s more, the free reuse of Wikidata means that other sites will use those links.
Wikidata and its partners have a huge task ahead of them, but the potential reward is vast. We could have data on all artworks, browsable in endless and genuinely new ways, with connections to their official catalogues, their physical locations, and scholarly literature. The sooner the cultural sector as a whole gets involved, the sooner we can bring this about.
I am grateful to Wikidata users Jane Darnell (User:Jane023), Magnus Manske (User:Magnus Manske - creator of User:ListeriaBot) and Andy Mabbett (User:Pigsonthewing) for many of the useful links in this article.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.