Lees in het Nederlands
A researcher is like a photographer. Scientists collect data and use it to create a virtual impression of reality. When was the photo taken, from which angle, with which lighting and shutter speed? Only the researcher knows the details and how the photo (data) should be interpreted.
Geert-Jan Houben used this analogy in his lecture at the Dies Natalis in January, which revolved around open science. What he means is that opening up your data isn't always that simple. “You have a responsibility to consider it very carefully. You don´t want other researchers to jump to the wrong conclusions because they have a different feel for it.”
We had arranged a double interview in TU Delft Library, Van Wezenbeek´s place of work. She is the director and as such very interested in open science. Houben is Professor of Data Science in the Faculty of EEMCS. He develops systems for opening up data and making it available for searches.
Now that publishing articles in public registers and open access journals (which are freely available to all) has become common practice, the open access movement is turning its attentions to data. Government-funded research, including all its derivatives, publications and data, must be freely available to all, according to the open access movement.
Why is sharing data so important?
Van Wezenbeek: “Research data must benefit society, particularly if research is funded with public money. It's a logical continuation of the scientific remit. Those who publish the data and those who use it for further research will all benefit. You get more cross-pollination between disciplines. Others will find patterns in your data that you had not seen yourself, because they look at it from a different angle.”
Houben: “Generalisation is a big thing in science. Imagine that you‘re studying the performance of racing cyclists. You want to know how it relates to the weather, to humidity, for example. You’ve taken various measurements, bearing in mind all kinds assumptions and conditions. You probably think that your conclusions apply to cycling performances elsewhere in the world too. If you share your basic data, researchers around the world will be able to reproduce your work. Their studies might show, for example, that the relationship between weather conditions and cycling performance is completely different in Washington DC than it is in the Netherlands. This would provide new insights for you and your US colleagues.”
Van Wezenbeek: “I’m convinced that using other people‘s knowledge and sharing your own will make you a better student, lecturer or researcher. That's what I believe in. We've been producing commercial scientific journals for centuries. And they only give a summary of the research. There's so much more going on now. It’s all about doing your bit.”
But it also means going out on a limb. Imagine that someone discovers a mistake in your data. Won’t fear of this put researchers off sharing their data?
Van Wezenbeek: “There's nothing wrong with scientists having to think harder about their data and how to present it.”
Researchers who deliberately manipulate data are more likely to get caught out. Will open science have a cleansing effect?
Van Wezenbeek: “If researchers are more aware that their work might be of interest to many other researchers, they will probably be more inclined to store, describe and process their work in line with the standards. I hear what you‘re saying, but ‘cleansing effect’ sounds so severe. As if there's so much sloppy science around. Open science will make researchers think twice. That’s all.
Houben: “Sharing data creates a form of peer review that goes beyond the peer assessments that already apply in the world of journals. Your data can be verified by a large community. It's like open source software. You share your work with the community, and the community makes a judgement.”
I can still imagine that researchers might not be keen to share their data. After all, it‘s the basis of their publications. If other scientists make a breakthrough using your data, you’ve wasted your own chances of publishing in a leading journal. ‘Publish or perish’ is what they say. Could this rat race prevent researchers from publishing their data?
Van Wezenbeek: “One of the best things to come out of the debate on open science is that we‘ve started looking differently at how we recognise science. It’s not all about high impact publications. A researcher who knows how to compile data that can be used by other researchers will also earn recognition.”
But does he or she actually get that recognition?
Van Wezenbeek: “We haven‘t completely banished the rat race, but we’re getting there.”
Houben: “Competition is good. But should rivalry focus solely on the classic artefacts - publications - or on other artefacts in the research process as well? We could stress the importance of writing good explanatory notes for data and design a rewards system for shared data.”
What do you mean by explanatory notes?
Houben: “Metadata. Descriptions of the data: the conditions under which they were gathered and how they should be interpreted. When a doctor writes you a prescription, he is responsible for telling you how to use the medicine. In the same way, researchers are responsible for explaining how their data should be used. That´s what this is all about.”
Some faculties recently introduced data stewards to help researchers release their research data. Is sharing data really that difficult?
Houben: “Every field has its own conventions on the meaning of terms. Take sports researchers. The type of rain they call drizzle could be called mist or steady rain by researchers in other fields. If these researchers use each other’s data, the terms need to be clearly defined. You can‘t just store data in a repository without considering aspects like this. Conversely, if you use someone else's data, you must be aware of differences in interpretation between disciplines. Researchers will have to get used to each other and then to each other’s fields before data exchange becomes efficient.”
That sounds a bit onerous. Will standards for storing data be developed?
Houben: “I think it will be a combination. You must be careful not to ‘over-standardise’ or ‘over-automate’. You‘ll always need personal contact to interpret certain data (tricky data) correctly. In many cases, you will have to safeguard the privacy of the trial subjects, while also divulging certain details. The data must be meaningful to other researchers. You'll often need to weigh this up. What needs to be published and what doesn’t?”
We mentioned government-funded research. According to the National Open Science Plan, it must always be published. The rule doesn‘t apply to research funded by industry, as company interests are at stake. But there’s a large grey area in between. Researchers are paid by the government and use university facilities. So, all research conducted at a university is partly government-funded. How will you deal with this grey area?
Van Wezenbeek: “The plan states that all research must in principle be made public. This can be overridden if there are good reasons for not revealing findings, such as company interests. But open access is the default setting. Data should be “as open as possible, as closed as necessary,” as we put it.
This article was also published in Delft Outlook, the scientific magazine of TU Delft.
Wilma van Wezenbeek (1967) is Director of TU Library. She is also Programme Manager open access at VSNU and lead author of the report National Plan Open Science.
Geert-Jan Houben (1963) is Professor web information systems and Director of Education at the EEMCS faculty. He is also scientific director of Delft Data Science and holder of the KIVI chair Big Data Science.