Should the private sector ever share data with the public sector, even if for public good?
Well, it depends.
Most of us are in agreement that when companies collect our data as a natural by-product of providing a service – like our movement data or spending patterns – the company shouldn’t sell these data to a third party for profit, especially without transparency. We want to know what is being sold. How was our data anonymized? And does that anonymization method even work?
But there is less agreement around not-for-profit data sharing. Most colleagues are comfortable with epidemiologists using private hospital records to monitor disease outbreaks, emergency response teams using customer call records to manage evacuations, and credit card companies sharing data with each other to identify fraud. But there are lingering questions of privacy and responsibility. In what cases is data sharing acceptable, and under what terms? Since these data are often generated by global firms, who decides?
The World Bank Data Council is working with technology companies to answer these questions – but not over pastries in conference rooms – through projects, with real stakeholders, working with colleagues across many disciplines – data engineers, data scientists, geospatial specialists, economists, legal counsel, procurement specialists, and sector specialists.
We must collaborate across disciplines, on concrete activities that we can learn from, to be able to create meaningful data sharing principles and, critically, to build the technical architecture that can meet these principles.
More specifically, to know how to responsibly work with proprietary data, we need to know where and how such data have value for public good. We need to know what the challenges are, especially with respect to counterpart technical capacity, anonymization techniques, sustainability, potential conflict of interest, and policy needs.
In April 2018, we launched Data Collaboratives, a pilot activity that resulted in 46 new data sharing ideas and 11 projects under implementation, in collaboration with Waze, Google, Where is My Transport, Mobike, Mapillary, and Digital Globe. For example:
- Building Urban Resilience by Mapping Formal/Informal Transit for Freetown, Sierra Leone: Transport Specialist Fatima Arroyo Arroyo and her team are receiving pro-bono data collection and processing support from Where Is My Transport, for creating the first complete map of the Freetown transit system.
- Data Fusion for Outbreak Prediction Data Scientist Sam Fraiberger is using the Google Health Trends API, combined with data from Twitter and news outlet APIs to improve prediction of outbreaks in fragile states.
- Closing the Data Gap in Bicycle Planning and Evaluation Efforts: Toolkit for Mexico City Senior Transport Specialist Felipe Targa and his team are using dockless bikeshare trip data from Mobike to improve cycling and pedestrian access and safety in Mexico City.
- Identifying Crash Hotspots in Nairobi and Dakar Economist Sveta Milusheva and Research Analyst Rob Marty are using the Waze API to analyze traffic speed and user-reported crash data in Nairobi and Dakar as part of their road safety monitoring and impact evaluation programs.
Building upon the successes and lessons learned through this pilot work, we are methodically scaling the initiative, inviting new data partners, as well as other international organizations facing similar challenges, starting with our DC neighbors, the IMF and the IDB. We have formed a nascent International Consortium for Data Collaboratives. By learning through joint project implementation, the Consortium aims to transform proprietary datasets into ethically and sustainably generated, sharable insights for improving public sector services and infrastructure in emerging economies.