pingkit

pingkit — Mobile location data for transport

Workshop materials for Development Data Partnership Day, Washington D.C., 5 June 2026. Event page: https://datapartnership.org/updates/partnership-day/

pingkit is a small Python toolkit and a pair of teaching notebooks that walk through the end-to-end workflow for working with mobile-location (GPS ping) data in a transport-analysis setting: loading raw ping tables, quality-checking the panel, detecting activity stops and building a trip-based OD matrix.

Presenters

Learning objectives

By the end of the workshop an attendee can:

  1. Explain what a GPS ping record contains and how it is collected.
  2. Name 2–3 transport use cases for ping data and articulate the key biases.
  3. Describe how Irys and Quadrant differ on coverage, sampling, and access.
  4. List the main re-identification risks and the standard mitigations (aggregation, k-anonymity).
  5. (Part 2 attendees) Load a ping dataset, run quality-control checks, and build a simple OD matrix with a map.

Agenda — 90 minutes (45 + 45)

Part 1 — Theory (45 min, stands alone)

Brief Q&A

Part 2 — Hands-On (45 min, for those who stay)

Audience and prerequisites

What’s in this repository

The table of contents below is generated from docs/_toc.yml:


A flat map of the key files:

Path What it is
docs/training.md Part 1 theory chapter — slide-ready Markdown with speaker notes
notebooks/01_explore.ipynb Part 2 hands-on 1 — load the sample dataset, run QC
notebooks/02_od_matrix.ipynb Part 2 hands-on 2 — detect stops, build a trip-based (time-resolved) OD matrix with k-anonymity, render a flow map
src/pingkit/ Small library: io, quality, od, viz
data/sample_pings_dc.parquet Synthetic dataset (~2.75M pings, 5,000 devices, 7 days, Washington D.C.; heavy-tailed panel, employment-centre commutes) — see data/README.md
scripts/generate_sample.py Reproducible generator for the sample dataset (fixed seed)

Getting started

  1. Open this repository on GitHub.
  2. Click Code → Codespaces → Create codespace on main.
  3. Wait for the devcontainer to build. The post-create command (uv pip install --system -e .) installs pingkit and all dependencies from pyproject.toml — typically under two minutes.
  4. Open notebooks/01_explore.ipynb, select the Python 3 kernel when prompted, and run cells top to bottom.

See docs/github-codespaces-setup.md for a step-by-step guide, including how to avoid charges on a paid GitHub plan.

Option B — Local install

Requires Python ≥ 3.10.

git clone https://github.com/datapartnership/pingkit.git
cd pingkit
pip install -e .
jupyter lab notebooks/

GeoPandas brings in GDAL, GEOS, and PROJ; if pip install -e . fails locally, install those system libraries first (brew install gdal geos proj on macOS; the system Python on Linux usually already has them via libgdal-dev, libgeos-dev, libproj-dev).

Re-generating the sample dataset

The committed Parquet at data/sample_pings_dc.parquet is reproducible from a fixed seed:

python scripts/generate_sample.py

Pass --n-devices, --seed, or --output to vary it. See data/README.md for the full schema, generation method, and known limitations of the synthetic data.

Data use and privacy

Follow-up

License

Mozilla Public License 2.0.