mobilyze package#

Submodules#

mobilyze.activity module#

mobilyze.activity.categorize(row)#
mobilyze.activity.compute_activity(ddf: DataFrame, start='2022-01-01', end='2023-12-31') DataFrame#

Compute activity levels based on GPS mobility data, returning both z-scores and percentage change relative to a baseline period. Activity is measured as the number of unique devices (uid) detected in each spatial area (hex_id) daily.

The z-score standardizes activity data, showing how much the activity deviates from the baseline mean, expressed in standard deviations. Percentage change indicates the relative increase or decrease in activity compared to the baseline.

Parameters
  • ddf (dask.dataframe.DataFrame) – A Dask DataFrame containing mobility data. The DataFrame must include at least the following columns: - datetime: Timestamps of device detections. - hex_id: Spatial index for the detection location. - uid: Unique identifier for each device.

  • start (str, optional) – The start date of the baseline period in “YYYY-MM-DD” format. The default is “2022-01-01”.

  • end (str, optional) – The end date of the baseline period in “YYYY-MM-DD” format. The default is “2023-12-31”.

Returns

A pandas DataFrame containing the calculated activity metrics with the following columns: - hex_id: The spatial area identifier. - date: Date of the activity. - nunique: Number of unique devices detected per hex_id and day. - weekday: Day of the week (0=Monday, 6=Sunday). - nunique.mean: Mean baseline device count for each hex_id and weekday. - nunique.std: Standard deviation of baseline device counts. - n_baseline: Baseline mean device count. - n_difference: Difference between the current and baseline device counts. - percent_change: Percentage change in device counts relative to the baseline. - z_score: Standardized activity level relative to the baseline.

Return type

pandas.DataFrame

mobilyze.activity.compute_ping_interval(ddf: DataFrame)#
mobilyze.activity.compute_stops(ddf: DataFrame, stay_locations_kwds={'minutes_for_a_stop': 5.0, 'spatial_radius_km': 0.25}, resolution: int = 7) DataFrame#

Calculate stop locations from a trajectory DataFrame using spatial and temporal parameters.

The function takes a DataFrame containing GPS trajectory data and calculates stop locations where an entity remains within a spatial radius for a given duration. The resulting DataFrame contains additional information about the stops, such as entry and exit times, as well as H3 hexagonal grid IDs representing the stop locations.

Parameters
  • ddf (DataFrame) – Input Dask DataFrame containing GPS trajectory data. It must have the columns ‘hex_id’, ‘latitude’, and ‘longitude’.

  • stay_locations_kwds (dict, optional) –

    A dictionary of parameters controlling the stop detection logic. The default is {“minutes_for_a_stop”: 5.0, “spatial_radius_km”: 0.25}, where: - “minutes_for_a_stop”: Minimum duration (in minutes) to consider a location as a stop. - “spatial_radius_km”: Spatial radius (in kilometers) to define the area around a point

    to consider it as a stop.

  • resolution (int, optional) – Resolution level for H3 hexagonal grid representation of stops. Higher values increase the resolution, meaning smaller hexagons. Default is 7.

Returns

A Dask DataFrame with the detected stops, containing columns: - “datetime”: The timestamp of entering the stop. - “leaving_datetime”: The timestamp of leaving the stop. - “date”: The date of the stop. - “hex_id”: The H3 hexagonal grid ID at the specified resolution level, representing the

stop location.

  • ”geometry”: Geopandas geometry of the stop points.

Return type

DataFrame

mobilyze.plotting module#

mobilyze.plotting.plot_activity(activity: DataFrame, variable='z_score', freq='D')#

Plots activity trends over time based on a specified variable and frequency, grouping by administrative divisions (shapeGroup) and shape names (shapeName).

The function generates interactive line plots for different groups of activity data, with options to visualize data trends, including legends, tooltips, and zooming tools. It returns these plots in a tabbed layout for comparison across different groups.

Parameters
  • activity (pd.DataFrame) – A pandas DataFrame containing activity data with at least the following columns: ‘date’ (datetime), ‘shapeGroup’ (categorical), ‘shapeName’ (categorical), and the variable (e.g., “z_score”) to plot.

  • variable (str, optional) – The column in the DataFrame to be used for plotting. By default, ‘z_score’.

  • freq (str, optional) – The frequency at which to group the data. Must be a valid pandas offset alias. For example, ‘W’ for weekly, ‘M’ for monthly. By default, ‘D’ (daily).

Returns

A Bokeh Tabs object containing one tab for each shapeGroup, with interactive line plots showing the activity trends for each shapeName within the group.

Return type

Tabs

Notes

  • The function groups the data by ‘date’, ‘shapeGroup’, and ‘shapeName’, calculates the mean for each group, and then pivots the data for plotting.

  • The y-axis range is set dynamically based on the minimum and maximum values of the variable.

  • The plot includes various Bokeh interactive tools such as pan, zoom, and hover with tooltips displaying the date and the corresponding value.

  • Tabs are used to display separate plots for each group (shapeGroup).

mobilyze.plotting.plot_boxplot(data, freq='W-SUN')#
mobilyze.plotting.plot_gini_curve(income_distribution)#

Plots the Lorenz curve and calculates the Gini coefficient.

mobilyze.plotting.plot_share_by_quantile(data, num_quantiles=10)#

Plot the share of the total for each quantile. This function divides the input data into specified quantiles and plots the proportion of the total sum that each quantile represents.

Parameters
  • data (array-like) – Input data to be divided into quantiles. Can be a list, NumPy array, or pandas Series.

  • num_quantiles (int, optional) – The number of quantiles to divide the data into. Default is 10.

Returns

Displays a bar chart showing the share of the total sum for each quantile

Return type

None

mobilyze.plotting.plot_spatial_distribution(ddf: DataFrame) Overlay#

Plots the spatial distribution of mobility data points using coordinates. The plot is rendered using datashader for efficient visualization of large datasets and overlayed with map tiles.

Parameters:#

ddfdask.dataframe.DataFrame

A Dask DataFrame containing the ‘longitude’ and ‘latitude’ columns representing the geographic coordinates of mobility data points.

Returns:#

hv.Overlay

A Holoviews overlay consisting of the CartoDark map tiles and the spatial distribution of the mobility data points.

mobilyze.plotting.plot_temporal_distribution(ddf: DataFrame)#

Plots the temporal distribution of mobility data points.

Parameters:#

ddfdask.dataframe.DataFrame

A Dask DataFrame containing the ‘longitude’ and ‘latitude’ columns representing the geographic coordinates of mobility data points.

mobilyze.plotting.plot_visits(data, title='Points of Interest Visit Trends')#

Creates a plot showing the number of visits to OpenStreetMap (OSM) points of interest (POI) over time.

Parameters:#

datapandas.DataFrame

DataFrame containing visit counts with dates as index and POI categories as columns.

titlestr, optional

Title of the plot (default is “Points of Interest Visit Trends”).

Returns:

mobilyze.plotting.plot_visits_by_group(df, group)#

mobilyze.tessellation module#

mobilyze.tessellation.tessellate(gdf: GeoDataFrame, columns=['shapeName'], resolution=7)#

Tessellates the geometries into H3 indexes in a GeoDataFrame

Parameters
  • gdf (geopandas.GeoDataFrame) – The GeoDataFrame ()containing the geometries to tessellate.

  • column (str, optional) – The column name in gdf to use in the resulting GeoDataFrame. Default is “shapeName”.

  • resolution (int, optional) – The H3 resolution level for tessellation. Higher resolution results in smaller hexagons. Default is 7.

Returns

A GeoDataFrame containing the tessellated hexagons with the specified index and geometry.

Return type

geopandas.GeoDataFrame

Raises

Exception – If a geometry type other than “Polygon” or “MultiPolygon” is encountered.

Examples

>>> import geopandas as gpd
>>> from shapely.geometry import Polygon
>>> gdf = gpd.GeoDataFrame({
...     'geometry': [Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])],
...     'shapeName': ['A']
... })
>>> tessellate(gdf)
      shapeName                                           geometry
8a69a0f7fffffff      POLYGON ((0 0, 0.5 0, 1 0.5, 1 1, 0.5 1, 0 1, 0 0))

Module contents#