Introduction to Data Goods

Introduction to Data Goods#

Data Goods are comprised of data, reproducible methods (code), documentation, and sample insights. Unlike a traditional data analysis, which results in a single-use report or visualization, Data Goods are designed to be re-used for future updates and projects, thereby building the capacity of the World Bank and partner organizations to quickly and effectively deliver complex data science solutions to pressing global challenges.

Data Goods packages include:

  1. Foundational Datasets. Foundational Datasets comprise all datasets used to prepare the Data Goods. To support replication and re-use of the Data Goods, the documetation includes a description of each datasource, including data type, update frequency, access links (including to the World Bank’s data catalogue, the Development Data Hub) and contact information.

  2. Data Products. These are analytical products derived from the Foundational Datasets, which can be further used to generate indicators and insights. All data products include documentation, links to original data sources (and/or information on how to access them), and a description of their limitations. Reference resources are also cited, where relevant. In the documentation, each Data Product has it’s own “chapter”, generated through use of a Jupyter notebook.

  3. Insights and Indicators. Each Data Goods package may also include additional analytical work, such as dynamic maps, data visualizaations, and/or sample indicators. Indicators can be derived from a combination of Foundational Datasets and Data Products. By combining these two inputs, teams are empowered to develop a large array of indicators to meet their project needs. Indicators can be presented side-by-side in an Excel workbook – a format that is generally accessible to the widest audience. Because all indicators are based on the same underlying data, they are comparable with each other, across geographies and across time.

  4. Data Lab Team. For each project, the World Bank Data Lab recruits colleagues from throughout our organization, pooling our collective great talents in support of our lending and technical assistance operations. Data Goods documentation includes a list and contact information for the unique team that prepared the Goods.

How Data Goods are Managed#

  1. Dynamic, Web-Hosted Documentation. Unless specified otherwise, all code and documentation used to produce the Data Goods is hosted in a project GitHub repository, to facilitate reuse for future updates and projects, as well as to support collaboration and capacity building activities.

  2. Data Catalogue. Where possible, all datasets used in the production of Data Goods are added as entries to the World Bank’s Development Data Hub, where they are tagged with meta data, license attributes, and access information.

  3. Internal Project Management and File Sharing System. To facilitate project management across teams, the Lab creates a Project SharePoint, which includes project management information (work plan, milestones, check-in slides, log of hours charged, final report), related literature, data files, indicator tables, and links to resources, such as this documentation. The advantage of SharePoint for World Bank usage is that all contents are automatically encrypted and tagged as Official Use Only. The project SharePoint is accessible to project team members and, with permission, can be replicated as a basis for future project updates or for similar projects.