Createp Longitudinal Panels#
In this step, we create sub-panels A (formal) and B (informal) as described in the methodological notes of this pilot study. The sub-panels are composed of longitudinal mobility data generated by GPS-enabled devices based on whether they were detected within the proximity of Region A or Region B throughout the time horizon.
The A (formal) and B (informal) sub-panels are respectively defined as follows.
Sub-Panel A:
Devices seen within 1 Km radius of 6 border checkpoints (Region A)
Jdeidet Yabbous (Al- Masnaa)
Tartous (Al-Arida)
Al-Dabbousieh (Al- Abboudiyeh)
Tel Kalakh (Al-Buqayaa)
Joussieh (Al-Qaa)
Matraba
Sub-panel B:
Devices seen within a 500-meter buffer strip along the Lebanese Syrian border, except the 6 checkpoints. (Region B)
Finally, we generate a working dataset from the raw mobility data by drawing devices out to compose sub-panels A and B. Finally, we persist the data in an efficient partitioning layout for further analysis.
Data#
Show code cell content
# Parameters
# https://papermill.readthedocs.io/en/latest/usage-parameterize.html
DASK_SCHEDULER_ADDRESS = None
AOI = "id=7&name=A"
NAME = "A"
Area of Interest#
On the previous step, we defined the area(s) of interest. Here, we selected it.
AOI = geopandas.read_file(f"../../data/interim/aoi/{AOI}.geojson")
AOI.explore()
Mobility Data#
The Syria Economic Monitor team obtained access to mobility data through the proposal Syria Economic Monitor (Outlogic) of the Development Data Partnership. For more information on the data ingestion and preprocessing, please see the Mobility data collection documentation.
Important
It is worth noting that mobility data has been ingested and preprocessed on AWS EC2 instance prior to being imported on this notebook. For more information, please reach out to the project team.
PATH = [
"../../data/external/outlogic/LB/date=*/*.parquet",
"../../data/external/outlogic/SY/date=*/*.parquet",
]
ddf = dd.read_parquet(
PATH,
columns=[
"uid",
"latitude",
"longitude",
"h3_10",
"datetime",
"country",
"year",
"date",
],
)
Removing rows without uid
.
ddf = ddf[~ddf["uid"].isnull()]
Sampling Strategy#
We select sub-panels of devices using convenience sampling, a non-probability form of sampling. The sampling method is a key limitation of this approach.
First, let’s calculate the total number of devices.
ddf["uid"].nunique().compute()
639233
Drawing out Devices#
Now, let’s calculate the sub-panel by drawing out devices detected within the AOI
at least once throughout the time horizon.
devices = ddf[ddf["hex_id"].isin(AOI["hex_id"])]["uid"].unique().compute()
devices = devices.to_frame()
len(devices)
3349
Finally, select the mobility panel based on devices
through the time horizon.
ddf = ddf[ddf["uid"].isin(devices["uid"])]
Repartitioning#
Let’s repartition on country
, year
and quarter
(to reduce the overhead and improve performance).In this step, we convert datetime
to the Asia/Damascus timezone and calculate the quarter.
ddf["datetime"] = ddf["datetime"].dt.tz_convert("Asia/Damascus")
ddf["date"] = dd.to_datetime(ddf["date"])
ddf["quarter"] = ddf["datetime"].dt.quarter
# reorder columns
ddf = ddf[
[
"uid",
"latitude",
"longitude",
"h3_10",
"datetime",
"country",
"year",
"quarter",
"date",
]
]
Saving to disk,
ddf.repartition(1).to_parquet(
f"../../data/interim/panels/_{NAME}",
write_index=False,
partition_on=["country", "year", "quarter"],
)