Estimating Population Displacement based on Mobility Data#

In this exploratory analysis, we aim to estimate the location and dwelling period in Lebanon of GPS-devices seen previously in Syria.

Important

This documentation outlines an experimental and exploratory process of creating such maps using mobility data. While this approach may address data scarcity, offer timeliness and other advantages, the results are not exempt from important Limitations.

Data#

Area of Interest#

In this explotatory analysis, we employ the Lebanon’s subnational administrative boundaries maintained by UNOCHA Middle East and North Africa (ROMENA) on the Humanitarian Data Exchange.

Hide code cell content
LBN = geopandas.read_file(
    "../../data/shapefiles/lbn_adm_cdr_20200810/lbn_admbnda_adm3_cdr_20200810.shp",
    crs="EPSG:4326",
)
Make this Notebook Trusted to load map: File -> Trust Notebook

Mobility Data#

Through the Development Data Partnership, the project team obtained a longitudinal panel of human mobility. The panel consisted anonymized timestamped geographical points generated by GPS-enabled devices, located in Syria spanning the period between January 1, 2020 to August 31, 2023.

The mobility data panel was provided pro-bono by Veraset to the proposal Understanding Lebanon’s Economy through Alternative Data through the Development Data Partnership. During the project’s execution, Veraset Movement’s global daily data feed was ingested and processed through the Geolocation Data Pipeline maintained by the Development Data Partnership. For additional information, please refer to the Mobility Documentation accessible to all World Bank staff.

Hide code cell source
ddf = dd.read_parquet(
    "../../data/interim/panels/v2023.9.4",
    filters=[("country", "=", "SY")],
    dtype_backend="pyarrow",
)

Determining panel of devices seen in Syria#

From the mobility data panel, we select a (sub) panel of devices. This panel will be composed by the set of devices seen in Syria during the periods of: January/2020, July/2020, January/2021, July/2021, January/2022, July/2022 and January-August/2023.

DEVICES_SYR = ddf["uid"].unique().compute()

Importing Stay Locations in Lebanon#

Based on Estimating Population Density based on Mobility Data, we import the stay locations in Lebanon for the period of August 1, 2023 to August 31, 2023.

STOPS = pd.read_parquet("../../data/locations.pq")

Now, we select only stay locations of devices that are in the panel of devices that were detected in Syria.

STOPS = STOPS[STOPS["uid"].isin(DEVICES_SYR["uid"])]
Number of stay locations: 3,391

Methodology#

The methodology is second take on Estimating Population Density based on Mobility Data, looking specifically at the panel of devices detected in Syria. For additional details, please refer to that reference.

Implementation#

locations = STOPS[(STOPS["isHome"]) & (STOPS["home_delta_count"] > 0.25)]
Number of eligible devices: 170

Now, let’s get transform the home locations into a (sample) population density at administrative boundaries.

locations = geopandas.GeoDataFrame(
    locations,
    geometry=geopandas.points_from_xy(locations["lng_medoid"], locations["lat_medoid"]),
    crs="EPSG:4326",
)

DENSITY_LOCATIONS = geopandas.GeoDataFrame(
    locations.sjoin(LBN, predicate="within")
    .groupby(["admin3Pcod"])["uid"]
    .count()
    .to_frame("count")
    .merge(LBN, on="admin3Pcod"),
    geometry="geometry",
    crs="EPSG:4326",
)

And finally mapping,

Hide code cell source
DENSITY_LOCATIONS[["count", "geometry"]].dropna(subset="geometry").explore(
    column="count",
    cmap="plasma",
    style_kwds={"weight": 0.4},
    legend_kwds={"max_labels": 5},
    vmin=1,
    vmax=DENSITY_LOCATIONS["count"].max(),
)
Make this Notebook Trusted to load map: File -> Trust Notebook

Estimated dwelling locations of devices detected in Syria previously (2020-2023). The metrics are present in reference to the month August/2023 in Lebanon. Source: Veraset Movement.

Findings#

  • The analysis identified potential \(170\) dwelling locations of devices previously detected in Syria in reference to the month August/2023 in Lebanon.

  • Even though this approach may address data scarcity, offer timeliness and other advantages, the results suffer from significant limitations, including coarse sample size, sample bias and incomplete coverage.

Limitations#

Caution

  • Sample Bias: The sampled population is composed of GPS-enabled devices drawn out from a longituginal mobility data panel. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.

  • Incomplete Coverage: Mobility data is typically collected from sources such as mobile phone networks, GPS devices, or transportation systems. These sources may not be representative of the entire population or all economic activities, leading to sample bias and potentially inaccurate estimations.Not all individuals or businesses have access to devices or services that generate mobility data. This can result in incomplete coverage and potential underrepresentation of certain demographic groups or economic sectors.

  • Lack of Contextual Information: Mobility data primarily captures movement patterns and geolocation information. It may lack other crucial contextual information, which are essential for accurate estimation of economic activity.