Why it’s relevant to use private data for public interest purposes
No, the Spanish National Office of Statistics does not know where you are right now … and does not need to know it. But it needs to use the advantage of private data in an aggregate manner. By Nuria Oliver
In October 2019 the Spanish National Office of Statistics (INE) made public a project that leverages information derived from data captured by the mobile network infrastructure to better understand the mobility of Spanish society on a large scale. The project has provoked controversy, raised suspicion and led to alarmism while it has placed the data and its use in the front line of social debate.
We certainly live in a world of data that we call Big Data. Data that is largely generated by each of us as a result of the digital footprint left by our interactions with our mobile phones and digital services. In addition, the digitalization of the physical world through all kinds of sensors both in science (for example, particle accelerators or DNA sequencing) and in our environment with the so-called Internet-of-Things devices (for example, pollution sensors , temperature or noise in cities, connected cars …) generates large amounts of additional data. According to Statista, next year we will have more than 6 connected devices per each person on the planet. In the last two years, we have generated more data than in the previous 5000 years of our existence.
Together with the data, we need the ability to analyze, interpret and use it to generate value. Therefore, the “Data Revolution” has been possible thanks not only to the existence of data, but also and more importantly thanks to the development of technological and human capabilities to capture, store, analyze and take advantage of all this data using Artificial Intelligence algorithms. Consequently, the so-called Data Economy will exceed 700 billion euro in Europe this year. Note that these figures have been computed from a business perspective, since most of the data are in the hands of private companies that monetize it in their products and services.
Beyond companies, the Data Revolution has also reached the public sector, opening an unprecedented range of opportunities. In 2012, the United Nations published a report called “Big Data for Development” that marked the beginning of an intense worldwide activity on the value that data has to help us improve the world: both by enabling us to better measure the 17 Sustainable Development Goals and by accelerating their achievement.
Thinking about the future, a question inevitably arises: how could this data – and especially the knowledge derived from them – that today is mostly privately held be shared to maximize its potential for positive social impact? What opportunities are we missing for not taking advantage of using this data for social good? Answering these questions has been part of my work for more than a decade. How we share and use the data will largely determine the future of democracy and human progress.
The aspiration is to make better decisions, decisions based on evidence (reflected in the data) that allow us to overcome some of the limitations of human decisions, including corruption, cognitive biases, conflicts of interest or selfishness. The goal is to achieve a fairer, healthier and more egalitarian world thanks to the use of data.
When we talk about data, we need to specify what kind of data we are talking about. The data can be personal, that is, data that allows the identification of a person or non-personal, that is, data from which it is impossible to identify anyone. Personal data is protected in Europe by the European Data Protection Regulation (GDPR) which establishes, among other things, the need for informed consent before any personal data can be analyzed. Personal data may cease to be when it is aggregated, that is, when data from many people is combined, so that it is impossible to identify any individual in the set. In addition, in many of the scenarios of data use for the social good, what is shared is the knowledge derived from the data, and not the data itself. For example, rather than sharing individual mobility – which is personal data, one could share the aggregate mobility of thousands of people or the inference of what is known as origin-destination matrices, that is, an estimate of the percentage of people that move between different parts of a country or region.
In this context, statistical offices around the world are exploring opportunities to produce official statistics and censuses more accurately, more economically and more frequently through data analysis. Instead of deploying costly questionnaires, these new sources of data could be used to assess the state of our society: how many we are, how we move, our socio-economic level, etc … We are talking about aggregate statistics, that is, information that reflects the average values of a subset of the population. We should not forget that it is our right to live in an informed society, where citizens know the impact of public policies and where there is a clear accountability. Thus, the value and the importance of official statistics.
For example, UK’s Office of National Statistics is exploring the use of alternative data sources to elaborate the census as of 2021. There are already projects that take advantage of Internet data and credit card purchases to develop aggregate measures of consumer prices, or projects to model mobility patterns based on data from the mobile phone network infrastructure.
Which brings us to the INE project in Spain. A project that could have been communicated and interpreted as a pioneering initiative in Europe, which takes advantage of private data in an aggregate manner and always respecting the privacy of people to develop more accurate mobility statistics and therefore help us make better public decisions, has been perceived as a direct attack on people’s privacy. In no case would the INE access individual data or personal data. In no case would even access the data itself, but knowledge derived from the data, such as estimates of the number of people in areas with at least 5000 people or estimates of aggregate mobility flows. In no case may the INE identify any person. However, the headlines of most media outlets underline the opposite, generating unnecessary alarmism and fostering a loss of confidence towards a project from which we would all benefit.
Most of us do not want our data to be used – unfortunately sometimes in an unethical, yet legal way – for the profit of a few. We do not want our data to be sold without our consent or our knowledge. But I imagine that many would be open to contribute with our data to help make better public decisions, decisions that affect us all. What has failed then?
From my perspective, there has been a communication failure in time, form and substance; a lack of education to society about the context of the project and about its details: it does not use data itself, but of information derived from it, always aggregated to thousands of people, without any connection with any particular citizen. I wish there had been more transparency and citizen involvement, not just as passive subjects but as active participants in this new way of making decisions. Also, I have missed offering tools for control, so that citizens may decide whether to contribute with our data to improve public decision making. It is important to find the balance between the right to an informed society, where decisions are made based on evidence, with transparency and accountability, and the fear of a surveillance society where powerful organizations use (abuse?) data as a tool to exercise their power.
Undoubtedly, the use of private data for public interest purposes is a topic of current relevance and immense potential. It is important to understand that the Data Revolution is not simply about data but also poses great technological, legal, economic, ethical and social challenges, as illustrated by the INE example. For this reason, the European Commission formed a group of high-level experts a year ago to identify the existing barriers that prevent us from taking advantage of data for the social good and proposing solutions to them. This group – to which I belong – will publish its recommendations in early 2020.
Moreover, given the timeliness, complexity and importance of this topic, the Vodafone Institute, in collaboration with Data-Pop Alliance, published in December 2019 a paper that extensively discusses the multi-faceted challenges that we will need to tackle to turn this opportunity into a reality.
Hopefully our work serves to inspire and educate. Hopefully it contributes to realize the dream of achieving a more just society thanks to the use of evidence-based decision making. I invite you to join this worldwide movement related to the use of data for social good, in a transparent and reliable manner and with a people-centered approach. Because as we say in the paper, sharing is caring.