Activity proxies in Myanmar#

Introduction#

This document presents an assessment of Veraset data to be used as an estimation of Mobility in Myanmar.

About Veraset Movement Data#

Veraset Movement is an unfiltered stream of location and proximity data that Veraset has aggregated from data suppliers.

How does Veraset source its data?

Veraset sources data from direct relationships with third-party applications and software development kits (SDKs) as well as data aggregators. It is then cleansed for data sinks and other anomalies and deduplicated by event date. Source

Can Veraset reveal the sources of its data?

While Veraset obtains data from a variety of sources, it does not typically disclose the specific sources of its data. This is to protect the privacy and confidentiality of its data providers. However, Veraset is committed to transparency and ensures that all its data is sourced in accordance with applicable laws and ethical guidelines.Source

Veraset Sample#

This step creates an assessment of Veraset Movement Data for Maynmar between January 2020 to March 2024.

Number of Users across the years#

The first step in the analysis consists on determining how many users were registered across the years in the region. From 2020 to March 2021, we observe considerable daily variations in number of users. Around March 2021, we observe a stark decline in number of users that might be related to the begining of internet restrictions. In October of 2021, we observe an increase in number of users that falls again around mid March of 2022, which might be related to changes policies about how location data is collected. After that, the number of users stayed stable and considerably low with respect to 2020.

../../_images/ff00c91adcb8597348c0ae901642c46655b856049da5fae072e8a10870a03b3f.png
../../_images/c7e7e1172eef9084629530b705d578f3f77d18db04f47fae69d8653d1e7e8df3.png
../../_images/af81b9249b114ba9e26fb7f2b75615428ca2680661b182eac1d6dc167a2ec319.png

Pings per user distribution#

We also studied the number of pings (connections) a user had across the years to understand whether the data can be used for longitudinal analysis. The two boxplots below are showing the same information but the boxplot on the right has limited its x-axis to 100 so that we can observe with greater detail users that had less than a 100 pings across the 4 years. We can observe that 75% of users had less than 28 pings across the 4 years.

count    7.132994e+06
mean     1.204889e+02
std      1.454482e+03
min      1.000000e+00
25%      2.000000e+00
50%      7.000000e+00
75%      2.800000e+01
max      5.237190e+05
dtype: float64
../../_images/7cecca86a54b063679ad7a5a6dc187c0114c3758c352133331ded1a8e8c725ea.png

Pings per user across each month#

Next, we analyze the distribution of pings per users across each month. We can observe that across the years, most of the users in the sample have less than 30 ping per month which we could translate to approximately, less than 1 ping per day. Similar to what we did above, the second boxplot shows the same information as the first one but limited the x-axis to have a gretare detail in the area where most of the sample is concentrated, low number of pings per month.

2024-05-03 19:01:02,099 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 87.28 GiB -- Worker memory limit: 124.46 GiB
../../_images/1ca46a25ac27dfa62b7fc3d53246bda3afed1ccc5fb5a846c6520ad4a89098d4.png

Users suitable for a longitudinal analysis#

In order to perform a longitudinal analysis, we are filtering the data to keep only useres that had at least 30 pings in a month, which are around 1,300,000. Then, we check for how many months these users keep these level of connection intensity. We observe that 75% of the users were able to keep the 30 pings per month for less than 2 months of the total 51 months. None of the users were able to keep the 30 pings per month across the 4 years.

Records where pings per user are greater than 30

2569312
agg_by_year_month[agg_by_year_month["pings/user"] > 30].groupby("uid").size().describe()
count    1.345974e+06
mean     1.908887e+00
std      1.840105e+00
min      1.000000e+00
25%      1.000000e+00
50%      1.000000e+00
75%      2.000000e+00
max      3.500000e+01
dtype: float64

Pings by date#

Another way to assess the data is by studying the number of pings registered by date. The plot seems to be correlated with number of users per date. The orange line plots the 2nd of April when internet restrictions started according to this Source

../../_images/7b29d95b62db4d0ad3a9698d436d84a1a8c4f6b7ea64a2cbc57f0b360549e1ac.png

By region#

Finally, we studied the number of pings by date and number of users by date across four regions:

  • The North: Kachin, Shan (North)

  • The Central: Sagaing, Magway, Mandalay

  • The South: Mon, Kayin, Kayah, Tanintharyi, Bago (East)

  • The West: Rakhine, Chin

  • The Other: Yangon, Ayeyarwady,Nay Pyi Taw, Bago (West), Shan (South), Shan (East) Then, we compared the two variables with the conflict index by region. At first sight, there does not seem to be a correlation between neither number of pings nor number of users with conflict index.

../../_images/28d5688d393a204da0a2fe31db6cfb2eb772356a3919e3cd7840723ad6b4e3be.png
../../_images/a954dd52752202bd34fbb108227ef4ce2f131be955aecf199f656a41bc4f8006.png
../../_images/7ab036a59831a0fab480831dc387b274863dde39f7c2fe7d11ebfc4cb005f2b9.png
../../_images/236912247976c1c95e25fd42c3c71a6414ca73fd0a304987a155ccefb945f116.png
../../_images/7be33e5db881e903daea776fb11181fcef613e136d9413be1f82fede1a92b9a4.png

Conclusions#

We observe that Veraset data varies considerably across the studied period. Unfortunately, we cannot attribute this to a cause, although we hypothesize that this could be related to restrictions to the internet access as well as changes in data collection policies.

In order to use this data for mobility analysis we need to have users that have several connections per day and maintain the connection intensity across the months. Across the four years, the number of users suitable for these type of longitudinal analysis is low. However, the sample might be useful for shorter analysis timespans, subject to the analysis of the data in different timespans.