Workshop materials for Development Data Partnership Day, Washington D.C., 5 June 2026. Event page: https://datapartnership.org/updates/partnership-day/
pingkit is a small Python toolkit and a pair of teaching notebooks that walk through the end-to-end workflow for working with mobile-location (GPS ping) data in a transport-analysis setting: loading raw ping tables, quality-checking the panel, detecting activity stops and building a trip-based OD matrix.
By the end of the workshop an attendee can:
Part 1 — Theory (45 min, stands alone)
Brief Q&A
Part 2 — Hands-On (45 min, for those who stay)
pandas literacy. No prior mobile-data experience required.The table of contents below is generated from docs/_toc.yml:
A flat map of the key files:
| Path | What it is |
|---|---|
docs/training.md |
Part 1 theory chapter — slide-ready Markdown with speaker notes |
notebooks/01_explore.ipynb |
Part 2 hands-on 1 — load the sample dataset, run QC |
notebooks/02_od_matrix.ipynb |
Part 2 hands-on 2 — detect stops, build a trip-based (time-resolved) OD matrix with k-anonymity, render a flow map |
src/pingkit/ |
Small library: io, quality, od, viz |
data/sample_pings_dc.parquet |
Synthetic dataset (~2.75M pings, 5,000 devices, 7 days, Washington D.C.; heavy-tailed panel, employment-centre commutes) — see data/README.md |
scripts/generate_sample.py |
Reproducible generator for the sample dataset (fixed seed) |
uv pip install --system -e .) installs pingkit and all dependencies from pyproject.toml — typically under two minutes.notebooks/01_explore.ipynb, select the Python 3 kernel when prompted, and run cells top to bottom.See docs/github-codespaces-setup.md for a step-by-step guide, including how to avoid charges on a paid GitHub plan.
Requires Python ≥ 3.10.
git clone https://github.com/datapartnership/pingkit.git
cd pingkit
pip install -e .
jupyter lab notebooks/
GeoPandas brings in GDAL, GEOS, and PROJ; if pip install -e . fails locally, install those system libraries first (brew install gdal geos proj on macOS; the system Python on Linux usually already has them via libgdal-dev, libgeos-dev, libproj-dev).
The committed Parquet at data/sample_pings_dc.parquet is reproducible from a fixed seed:
python scripts/generate_sample.py
Pass --n-devices, --seed, or --output to vary it. See data/README.md for the full schema, generation method, and known limitations of the synthetic data.