Who should own research data?
Monday, Mar 26, 2018, 10:03 PM | Source: Pursuit
The rise of pervasive internet and personal sensor technologies is giving psychological science an unprecedented window into human behaviour. For the first time, it seems feasible that we will be able to predict, explain and influence how we act - not just in the sterile laboratory - but in the real and messy world where it actually matters.
The Cambridge Analytica scandal embroiling Facebook, however, is a warning that researchers cannot conduct business as usual. Our personal data has been shown to be vulnerable to manipulation for political ends. As the public takes stock and governments react, we can expect research practices to come under increasing scrutiny too, which could put at serious risk the public good that could be derived from studying dense data sets.
What we need is a new way of exploiting data that doesn’t rely solely on trust or the threat of legal ramifications so that we can realise the full potential of big data without compromising civil liberties.
Personal data warehouse
Given data is quickly becoming the coin of the realm, it is reasonable to start asking who should own it. Right now we have a model in which tech titans, governments and research institutions discretely assert rights over a bewildering patchwork of data streams. However, it is the users, citizens and participants who are generating the data. Is it unreasonable to think that it is these creators who should own it?
Under a model like this, data would become an asset that participants could license to corporates, governments and academic researchers - either in the interests of the public good or for compensation.
People would build a personal data warehouse that might include generic data that could be used for multiple purposes such as GPS coordinates as well as surveys posted by researchers. They would be provided with a set of retrieval and visualisation tools that allow them to understand and tend their data. As time progresses, the value of the data asset would grow with its extent. Researchers might then offer compensation for a given type of data and participants would consent on a case by case basis.
The researchers would be purchasing the right to analyse data, not the data itself, so the participant is then free to participate in other studies to earn additional compensation from other researchers for the same data.
It is usually the case that scientists are interested in the general principles underlying a certain phenomenon - and so we typically aggregate information over many people to increase the certainty of our conclusions. While it is convenient to have access to the raw data, it is not necessary.
There may also be real benefits in creating a degree of separation between researchers and data. In recent years, psychology is one of a number of disciplines that has been in the grip of the so-called “replication crisis”.
In one large scale study, a consortium of researchers set out to replicate 100 studies from the psychological literature. They were successful in only 39 per cent of cases, casting doubt on many findings that had been taken as cannon. The ability to replicate a study is a cornerstone of the scientific method, and so a great deal of collective soul searching has resulted.
Changing the status quo
There are many factors that are likely to have led to the current state of affairs, an important one has been the way in which data are “cleaned” to remove outliers. Too close a connection between researchers and data can lead them to construct narratives about certain data points to justify their exclusion - ultimately undermining the integrity of their analysis. If researchers were provided only with the end results of the analysis rather than the data itself and other researchers could immediately analyse the same data, the scope for these practices would be reduced.
While many organisations would consider the notion of distributing ownership to the creators chilling, there are cogent arguments for changing the status quo.
1. A personal data marketplace would provide visibility to data transactions. People would be aware of what their data were being used for and it would be easier to formulate and enforce regulatory frameworks to curb undesirable exploitation.
2. Currently, people’s understanding of the relative value of data and the privacy implications of allowing others to access it is rudimentary. If people retain ownership of their data and participate in a data marketplace, they will come to understand collectively which kinds of data are most valuable. The promise then is that a more nuanced understanding of privacy will emerge.
3. Participants would be incentivised to curate their data to ensure it is as complete as possible as this would increase its value. Data quality is a fundamental challenge for big data enterprises, so anything that encourages people to make sure it is both accurate and complete would be helpful.
4. Participant ownership of data may lead to increased engagement in and understanding of political and scientific processes - citizen science on steroids. Public trust in institutions is waning with potentially damaging long term implications. Increased transparency is a precondition for winning that trust back.
meeting the big data challenge
Change is upon us. New EU privacy laws take effect in May and will significantly tighten the way in which data is managed in Europe. At the same time, Australia is rolling out “My Health Record” with the objective of storing the health data of 98 per cent of the populace by the end of 2018.
In the light of the Cambridge Analytica story, governments, companies and individuals are all reconsidering their positions on data management. What is the value of different kinds of data? Who should own it? Who should be able to access it and how? If we are to take advantage of the historic opportunity that big data provides, we need answers to these questions now.
Disclosure statement: Professor Dennis is the CEO of a start up company specialising in providing privacy preserving experience sampling collection and analysis services.
Banner Image: Getty Images