This program develops a scalable framework for unlocking high-quality proprietary content through ethical licensing agreements. The initiative bridges the data gap that prevents development of high-performing AI models for underrepresented languages and domains by facilitating access to datasets for researchers and organizations working on AI for social good.
Explore our curated collection of datasets designed to advance AI research and development for public good.
A comprehensive dataset of Chichewa language text and audio, including literature, news articles, household surveys, and radio broadcasts. Data sources include the National Statistical Office, Malawi Institute of Education, the Malawi Times, Malawi Capital Radio, and others.
Learn more →
Minimmaly Invasive Tissue Sampling (MITS) data including clinical records, imaging data, and patient demographics. Data cover samples taken from five countries: Nepal, Rwanda, Sierra Leone, Uganda, and Bangladesh.
Learn more →Researchers, academic institutions, and organizations working on AI for social good can apply for access to these datasets. The application process involves submitting a research proposal, demonstrating ethical data handling practices, and committing to open-source publication of derived insights. We prioritize applications that show potential for significant impact on global development challenges and align with the Gates Foundation's mission.
Submit ApplicationApplications are reviewed on a rolling basis. Typical response time is 2-3 weeks.