Ambient Air Quality in Kochi and Trivandrum#

Air pollution is one of the greatest environmental risk to health, contributing to respiratory and cardiovascular diseases, cancer, and premature death. Tracking air pollution levels can help governments and policy makers make informed policies and decisions to improve public and environmental health.

There are a few common air pollutants that are frequently monitored as a proxy to achieving good air quality in cities.

Particulate Matter (PM)

  • RSPM/PM10/PM2.5: PM is a common proxy indicator for air pollution. There is strong evidence for the negative health impacts associated with exposure to this pollutant. The major components of PM are sulfates, nitrates, ammonia, sodium chloride, black carbon, mineral dust and water. PM10 are for particles with a diameter of 10 micrometers or less, while PM2.5 are for fine particles with a diameter of 2.5 micrometers or less. These are particularly harmful as they can penetrate deep into the lungs and even enter the bloodstream.

Gaseous Pollutants

  • SO2: SO2 is a colourless gas with a sharp odour. It is produced from the burning of fossil fuels (coal and oil) and the smelting of mineral ores that contain sulfur.

  • NO2: NO2 is a gas that is commonly released from the combustion of fuels in the transportation and industrial sectors.

Data#

Air Pollutant Data#

The data was downloaded from www.kerala.data.gov and is available for the years 1987 to 2002 and 2005 to 2015. Data for the years 2003 and 2004 are not available. Each year is recorded in a separate downloaded EXCEL file.
Each file generally included:

  • The station code where the data was recorded (Stn Code)

  • Date of record (Sampling Date)

  • Place of record (City/Town/Village Area)

  • Agency

  • Type of location

  • SO2 values

  • NO2 values

  • RSPM/PM10 values

  • SPM values

  • PM2.5 values

The completeness of data varies by year and city.

Methodology#

Data for 27 years were concatenated and pre-processed. Columns were standardized whenever possible. However, as the recording format varies from year to year, some human errors may have occurred in the process of standardizing the data. Due to lack of data for some columns, only SO2, NO2 and RSPM/PM10 values were used for analysis and visualization. It is also important to note that data quality varies across cities and years.

A look at the data#

Columns within the data (First five entries)#

Stn Code Sampling Date Month Year State City/Town/Village/Area Agency Type of Location SO2 NO2 RSPM/PM10 SPM Location of Monitoring Station PM 2.5
0 29.0 10/3/1987 3.0 1987 Kerala Cochin Kerala Pollution Control Board Industrial NaN 2.9 NaN NaN NaN NaN
1 30.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 79.0 NaN NaN
2 31.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 88.0 NaN NaN
3 32.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 98.0 NaN NaN
4 33.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board NaN NaN NaN NaN 88.0 NaN NaN

All cities available in dataset#

Some cities are duplicated (E.g. ‘Cochi’ and ‘Kochi’ are the same cities)
Data preprocessing:

  • Renaming duplicated cities to unified names

  • Removing null values

  • Standardizing date format

  • Aggregating data by yearly average

['Cochin',
 'Kotttayam',
 'Kottayam',
 'Kozhikode',
 'Trivendrum',
 'Palakkad',
 'Kochi',
 'Trivandrum',
 'Alappuzha',
 'Kollam',
 'Malappuram',
 'Thiruvananthapuram',
 'Pathanamthitta',
 'Thissur',
 'Wayanad']

Observations#

  • Kochi has been managing PM 10 and SO2 levels well, but NO2 levels are on the rise, especially since mid-2015.

  • Air Quality is generally worse in Trivandrum than Kochi.

Using Machine Learning to predict air pollution levels (NO2) for future years in Kochi#

Random Forest Regressor#

Stn Code Sampling Date Month Year State City/Town/Village/Area Agency Type of Location SO2 NO2 RSPM/PM10 SPM Location of Monitoring Station PM 2.5
0 29.0 10/3/1987 3.0 1987 Kerala Cochin Kerala Pollution Control Board Industrial NaN 2.9 NaN NaN NaN NaN
1 30.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 79.0 NaN NaN
2 31.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 88.0 NaN NaN
3 32.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board Industrial Area NaN NaN NaN 98.0 NaN NaN
4 33.0 4/5/1988 5.0 1988 Kerala Cochin Kerala Pollution Control Board NaN NaN NaN NaN 88.0 NaN NaN
../_images/1244029286212effc9ddbd45dc44f5ef6778dcb7ab62e7a9e8542de317111da3.png

Model Performance#

R2 is a measure of how well the independent variables explain the variability of the dependent variable. It ranges from 0 to 1 (and can be negative in certain cases).
RMSE is the square root of the average of the squared differences between predicted and actual values. It measures the average prediction error in the same units as the dependent variable.

R-squared: -0.13495601528962808
RMSE: 2.417655303596738

R2 of -0.13495 indicates that the model if performing worse than the mean of the dependent variable.
RMSE of 2.417 indicates that on average, the model’s predictions deviate from the actual NO2 levels by 2.417 units.

Possible Reasons for Poor Performance#

  1. Dataset is too small, or data is too noisy.

  2. Model not the best fit.

  3. Feature (year) might not be sufficient, more features might be helpful.

  4. Overfitting or Underfitting of data present.

Future work for improvements#

  1. Adding more features to the model (such as weather, traffic data etc.)

  2. Testing a few more models and compare the results.

  3. Further data preprocessing.

  4. Model tuning - experimenting with different hyper parameters for the model.