Estimating Activity Through Point of Interest Visits Using Mobility Data#
Analyzing the frequency of visits within or near points of interest has the potential to provide insights into the economic ramifications of conflicts and disasters. In a manner akin to the now discontinued Google Community Mobility Reports, the following methodology aims to monitor fluctuations in mobility, quantified by visit counts, within a set of OpenStreetMap points of interest relative to a baseline.
Additionally, it’s important to note the inherent limitations associated with using mobility data. Notably, mobility data is typically collected through convenience sampling methods and lacks the controlled methodology of randomized trials. The adoption and usage of mobile devices differ greatly among urban, suburban, and rural dwellers. Moreover, the popularity of specific mobile apps can fluctuate over time with the introduction of new apps and the decline of older ones or the exclusion of important apps in the panel. One major challenge in estimating activity volumes through mobility data is differentiating between changes in observed activity due to market shifts and genuine shifts in on-the-ground mobility patterns.
Data#
In this section, we import from the data sources, available either publicly or through data sharing agreements. When using non-public data, please carefully abide by your organization’s data privacy policy, data classification and the terms and conditions.
Area of Interest#
In this study, the area of interest is Egypt’s Red Sea Riviera emcompassing the Red Sea and South Sinai governorates as shown below.
Show code cell source
EGY = geopandas.read_file(
"../../data/egy_admbnda_adm2_capmas_20170421/egy_admbnda_adm2_capmas_20170421.shp",
)
AOI = EGY[
EGY["ADM1_PCODE"].isin(
[
"EG31",
"EG35",
]
)
]
AOI.explore(color="blue")
Points of Interest#
Using the Humanitarian OpenStreetMap via HDX , the project team acquired OpenStreetMap points of interest within a defined boundary encompassing the area of interest as specified.
Show code cell source
POI = (
geopandas.read_file(
"https://s3.us-east-1.amazonaws.com/production-raw-data-api/ISO3/EGY/points_of_interest/points/hotosm_egy_points_of_interest_points_shp.zip"
)
.set_crs("EPSG:4326")
.sjoin(AOI, predicate="within")
)
osm_id | name | tourism | ADM0_PCODE | ADM1_PCODE | ADM2_PCODE | ADM1_EN | ADM2_EN | geometry | |
---|---|---|---|---|---|---|---|---|---|
Loading... (need help?) |
After importing the points of interest tagged as OpenStreetMap Key:tourism, we create a 100m buffer centered at the coordinates informed.
Show code cell source
POI = POI[
[
"osm_id",
"name",
TAG,
"ADM0_PCODE",
"ADM1_PCODE",
"ADM2_PCODE",
"ADM1_EN",
"ADM2_EN",
"geometry",
]
]
# filter tag
POI = POI[~POI[TAG].isnull()]
# create buffer
POI["geometry"] = POI["geometry"].to_crs("EPSG:3857").buffer(100).to_crs("EPSG:4326")
To illustrate, we visualize below the points of interest.
Based on OpenStreetMap points of interest, we visualize the breakdown by the categories below,
tourism
hotel 245
viewpoint 102
attraction 91
information 28
apartment 20
hostel 17
camp_site 15
guest_house 15
artwork 12
chalet 12
picnic_site 7
museum 7
wilderness_hut 4
alpine_hut 2
yes 2
motel 2
theme_park 1
ruins 1
caravan_site 1
gallery 1
Name: count, dtype: int64
Mobility Data#
The project team acquired longitudinal human mobility data. Veraset Movement provides a panel of human mobility data, based on data collection of GPS-enabled devices location. The data consisted of anonymized timestamped geographical points generated by GPS-enabled devices, located in Egypt and spanning the period shown below.
Show code cell source
ddf = dd.read_parquet(
["s3://wbgggscecovid19dev-mobility/country=EG/year=*/date=*/*.parquet"],
)
First, we calculate the cardinality,
1131632964
Now, we calculate the temporal extent,
From Oct 01, 2022 to Mar 15, 2024
And visualize the mobility data panel’s spatial density.
Methodology#
In parallel with the now discontinued Google Community Mobility Reports, the outlined methodology aims to monitor variations in mobility, measured by the frequency of visits, within points of interest sourced from OpenStreetMap compared to a baseline. It’s important to note that the mobility data reflects a subset of the overall population within an area, specifically individuals who have activated the Location Services setting on their mobile devices. It is crucial to understand that this data does not represent total population density. Additionally, we highlight that this calculation is based on a spatial join approach, which determines whether a device has been detected within an area of interest at least once. This method, while straightforward, represents a simplified approach compared to more advanced techniques such as estimating stay locations and visits.
Utilizing Dask-GeoPandas, we execute a spatial join to intersect the mobility data with points of interest specified previously.
Show code cell source
gddf = dask_geopandas.from_dask_dataframe(
ddf[["uid", "hex_id", "date"]],
geometry=dask_geopandas.points_from_xy(ddf, "longitude", "latitude"),
).set_crs("EPSG:4326")
Next, we see the spatial join being executed to calculate the device density for each spatial and for each temporal bin. Finally, the results joined into administrative divisions.
Show code cell source
result = (
gddf.sjoin(POI, predicate="within")
.groupby([TAG, "ADM2_PCODE", "date"], observed=True)["uid"]
.nunique()
.to_frame(name="count")
.reset_index()
.compute()
)
Results#
In this section, we will visualize the time series of the count of devices detected within each of the following points of interest categories.
['hotel', 'viewpoint', 'attraction', 'other', 'information']
Through the aggregation of visit counts, we present a smoothed tally indicating the number of detected users within the entire area for each 1-day period.
Show code cell source
data = result.pivot_table(
values=["count"], index=["date"], columns=[TAG], aggfunc="sum"
)
data.columns = [c[1] for c in data.columns]
By administrative divisions#
Through the aggregation of visit counts, we present a smoothed tally indicating the number of detected devices within each first-level administrative division and for each 1-day time period.
After this initial exercise, we conclude that the available mobility data is not adequate to estimate general population movement and the impact on Egypt’s tourism inflicted by the 2023 Red Sea Shipping Crisis. Although the mobility data panel provides billions of locations on a daily basis throughout Egypt, the Red Sea and South Sinai governorates only account for a fraction of this volume, of which only a small fraction can be identified within points of interest. For example, based on the analysis, hotels would see about 100 daily visitors.
Limitations#
Limitations of using mobility data to estimate economic activity#
Warning
Sample Bias: The sampled population is composed of GPS-enabled devices drawn out from a longituginal mobility data panel. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.
Incomplete Coverage: Mobility data is typically collected from sources such as mobile phone networks, GPS devices, or transportation systems. These sources may not be representative of the entire population or all economic activities, leading to sample bias and potentially inaccurate estimations.Not all individuals or businesses have access to devices or services that generate mobility data. This can result in incomplete coverage and potential underrepresentation of certain demographic groups or economic sectors.
Lack of Contextual Information: Mobility data primarily captures movement patterns and geolocation information. It may lack other crucial contextual information, such as transactional data, business types, or specific economic activities, which are essential for accurate estimation of economic activity.
Limitations of using points of interest database from OpenStreetMap#
Warning
Data Quality: OpenStreetMap (OSM) relies on user contributions, which can vary in quality and may not always be up-to-date. The accuracy and completeness of the points of interest (POI) database in OSM can vary significantly across different regions and categories.
Bias and Incompleteness: OSM data can be biased towards areas or categories that attract more active contributors. Certain regions or types of businesses may be underrepresented, leading to incomplete or skewed data, especially in less-populated or less-developed areas.
Lack of Standardization: OSM does not enforce strict data standards, resulting in variations in the format, categorization, and attribute information of POIs. This lack of standardization can make it challenging to compare and analyze data consistently across different regions or time periods.
Verification and Validation: While OSM relies on community-driven efforts for data verification, the absence of a centralized authority or rigorous validation process can introduce errors and inaccuracies. It may be difficult to ascertain the reliability of the information contained in the POI database.
Limited Contextual Information: The OSM database primarily focuses on geospatial information, such as coordinates and basic attributes of POIs. It may lack additional contextual information, such as detailed business descriptions, operational hours, or transactional data, which can limit its usefulness for comprehensive economic analysis.