# Createp Longitudinal Panels

In this step, we create sub-panels **A** (formal) and **B** (informal) as described in the [methodological notes](README.md) of this pilot study. The sub-panels are composed of longitudinal mobility data generated by GPS-enabled devices based on whether they were detected within the proximity of [Region A or Region B](01a-aoi-and-tessellation.ipynb#regions-a-b) throughout the time horizon. 

The **A** (formal) and **B** (informal) sub-panels are respectively defined as follows.

- Sub-Panel **A**: 
    > Devices seen within 1 Km radius of 6 border checkpoints (Region A) 
    - Jdeidet Yabbous (Al- Masnaa) 
    - Tartous (Al-Arida) 
    - Al-Dabbousieh (Al- Abboudiyeh) 
    - Tel Kalakh (Al-Buqayaa) 
    - Joussieh (Al-Qaa) 
    - Matraba 
    
- Sub-panel **B**: 
    > Devices seen within a 500-meter buffer strip along the Lebanese Syrian border, except the 6 checkpoints. (Region B)

Finally, we generate a working dataset from the raw mobility data by drawing devices out to compose sub-panels **A** and **B**. Finally, we persist the data in an efficient partitioning layout for further analysis.

In [1]:
import dask.dataframe as dd
import geopandas

In [3]:
from dask.distributed import Client

client = Client(DASK_SCHEDULER_ADDRESS)

## Data

In [2]:
# Parameters
# https://papermill.readthedocs.io/en/latest/usage-parameterize.html
DASK_SCHEDULER_ADDRESS = None

AOI = "id=7&name=A"
NAME = "A"

### Area of Interest

On the previous step, we defined the area(s) of interest. Here, we selected it.

In [4]:
AOI = geopandas.read_file(f"../../data/interim/aoi/{AOI}.geojson")

In [5]:
AOI.explore()

### Mobility Data

The *Syria Economic Monitor* team obtained access to mobility data through the proposal [Syria Economic Monitor (Outlogic)](https://portal.datapartnership.org/readableproposal/407) of the [Development Data Partnership](https://datapartnership.org). For more information on the data ingestion and preprocessing, please see the [Mobility](https://docs.datapartnership.org/collections/mobility/README.html) data collection documentation.

```{important}
It is worth noting that mobility data has been ingested and preprocessed on AWS EC2 instance prior to being imported on this notebook. For more information, please reach out to the project team.
```


In [6]:
PATH = [
    "../../data/external/outlogic/LB/date=*/*.parquet",
    "../../data/external/outlogic/SY/date=*/*.parquet",
]

In [7]:
ddf = dd.read_parquet(
    PATH,
    columns=[
        "uid",
        "latitude",
        "longitude",
        "h3_10",
        "datetime",
        "country",
        "year",
        "date",
    ],
)

In [8]:
ddf.npartitions

32576

Removing rows without `uid`.

In [9]:
ddf = ddf[~ddf["uid"].isnull()]

## Sampling Strategy

We select sub-panels of devices using convenience sampling, a non-probability form of sampling. The sampling method is a **key limitation** of this approach.

First, let's calculate the total number of devices.

In [10]:
ddf["uid"].nunique().compute()

639233

### Drawing out Devices

Now, let's calculate the **sub-panel** by drawing out devices detected within the `AOI` at least **once** throughout the time horizon.

In [None]:
devices = ddf[ddf["hex_id"].isin(AOI["hex_id"])]["uid"].unique().compute()
devices = devices.to_frame()

In [12]:
devices.to_parquet(f"../../data/interim/devices/devices_{NAME}.snappy.parquet")

# alternatively, reading for disk
# devices = pd.read_parquet(
#     f"../../data/interim/devices/devices_{NAME}.snappy.parquet"
# )

In [13]:
len(devices)

3349

Finally, select the mobility panel based on `devices` through the time horizon.

In [14]:
ddf = ddf[ddf["uid"].isin(devices["uid"])]

## Repartitioning

Let's repartition on `country`, `year` and `quarter` (to reduce the overhead and improve performance).In this step, we convert `datetime` to the **Asia/Damascus** timezone and calculate the quarter. 

In [15]:
ddf["datetime"] = ddf["datetime"].dt.tz_convert("Asia/Damascus")

ddf["date"] = dd.to_datetime(ddf["date"])
ddf["quarter"] = ddf["datetime"].dt.quarter

In [None]:
# reorder columns
ddf = ddf[
    [
        "uid",
        "latitude",
        "longitude",
        "h3_10",
        "datetime",
        "country",
        "year",
        "quarter",
        "date",
    ]
]

Saving to disk,

In [None]:
ddf.repartition(1).to_parquet(
    f"../../data/interim/panels/_{NAME}",
    write_index=False,
    partition_on=["country", "year", "quarter"],
)