Expertise from the crowd

In total, the newly reopened Rijksmuseum has over a million artworks of which about 8 thousand are on display. Like an iceberg, the vast majority of the Rijksmuseum’s collection is hidden. Until recently that is, because the museum now runs a programme to digitize artworks and to put them online.

Every year, says Henrike Hövelman of the Rijksmuseum’s online print collection, the museum aims to put another 40 thousand objects onto the Internet. On that pace it will take less then 20 years to disclose the entire print collection of about 700 thousand items.

One of the problems with this pace of publication is that each year it would cost a curator about four years of fulltime work to annotate the items (if he took 10 minutes work for each). Next to that, curators are oriented towards art history and lack domain knowledge to annotate specific aspects of the item, such as the name of a depicted bird or flower.

In the SEALINCMedia research project (socially enriched access to linked cultural media) two teams led by TU’s Professors Alan Hanjalic and Geert-Jan Houben (EEMCS) aims to engage the crowd, laymen and experts alike, to contribute to the museum’s annotations.

The difficulty with this approach is how to ensure the quality of the contributed information. SEALINCMedia itself is one of sixteen research projects of the public private COMMIT research community dedicated to solving grand challenges in information and communication science.

PhD student Jasper Oosterman (EEMCS) has done a trial run with 86 prints of birds and plants. He has invited a team of experts as well as the crowd (paid volunteers) to add information to given prints with descriptions as vague as ‘blue bird on branch with red flower’. Volunteers received a marginal pay of 5 eurocents for each items they finished to keep them going.

At the end of the trial run, experts and volunteers had both correctly identified the species in half of the prints. Does that mean that crowd intelligence equals expert judgement? Not really. The crowd came up with the right names, but with a fair amount of noise (wrong tags) as well. Experts tend to write something down only when they’re fairly sure of it. Laymen don’t generally have the same kind of reservations.

Oosterman develops strategies and software to identify potential contributors (Crowd Identification) and to assemble them into a ‘working crowd’. Knowing more about those crowd workers allows him to distribute the right enrichment task to the person with the right expertise (Activity Planning), which should lead to higher quality annotations.

Within the project more is done to improve the quality of the resulting annotations. One such thing is to determine the trustworthiness of the information by looking at the speed at which it is entered. It’s plausible that fast-typed answers are the most assured, but how correct are they really?

Information often gets better by a process of annotation and reviewing. By having others judge the information their peers have entered. These judges can be other volunteers or experts. Both experts and volunteers are continually evaluated, which leads to a certain degree of trust within a certain domain of knowledge.

Meanwhile, motivational aspects are also being studied. Medals (bronze, silver, gold) might motivate knowledgeable annotators to keep involved. A free entry to the Rijksmuseum may also motivate people to finish a few more items.

Oosterman is about halfway with his PhD research on Web User Demand Elicitation (WUDE) under supervision of Professor Geert-Jan Houben and Alessandro Bozzon (Web Information Systems at EEMCS). Next to the Rijksmuseum Oosterman also performs his research on collections from Heritage Delft and the Netherlands Institute for Sound and Vision.

Related PhD-research projects within SEALINCMedia are:

– Design, development and evaluation of personalized semantic search strategies by Chris Dijkshoorn (VU Amsterdam)

– Management of trust and authority for accessing, integrating and distributing information resources by Achana Nottamkandath (VU Amsterdam)

– Design, development and evaluation of user interfaces for searching linked data by Mieke Leysen and Myriam Traub (CWI – Centre for Mathematics and Informatics, Amsterdam).

–> This is one of over 40 research projects that were presented on Delft Data Science New Year Event, held at the EEMCS building on January 13th.

Expertise from the crowd

Related

Rathenau: ‘Universities need to take better care of their young researchers’

‘Europe needs to adapt to a changing world’

The last oil-based research at Geoscience & Engineering is done