India Air Quality Data

India Air Quality Data

This post describes how to use India Air Quality Data to forecast nitrogen dioxide levels through AI learning.

Table of Contents

  1. Data Introduction
  2. Data Visualization with Machbase Neo
  3. Table Creation and Data Upload in Machbase Neo
  4. Experimental Methodology
  5. Experiment Code
  6. Experimental Results

1. Data Introduction


  • DataHub Serial Number: 2024-13.
  • Data Name: India Air Quality Data.
  • Data Collection Methods: Data collection on air quality in 453 cities in India from 2010 to 2023 at the Central Control Room for Air Quality Management.
STATE CITIES
Andhra Pradesh Amaravati, Anantapur, Chittoor, Kadapa, Rajamahendravaram, Tirupati, Vijayawada, Visakhapatnam
Arunachal Pradesh Naharlagun
Assam Byrnihat, Guwahati, Nagaon, Nalbari, Silchar, Sivasagar
Bihar Araria, Arrah, Aurangabad, Begusarai, Bettiah, Bhagalpur, Bihar Sharif, Buxar, Chhapra, Darbhanga, Gaya, Hajipur, Katihar, Kishanganj, Manguraha, Motihari, Munger, Muzaffarpur, Patna, Purnia, Rajgir, Saharsa, Samastipur, Sasaram, Siwan
Chandigarh Chandigarh
Chhattisgarh Bhilai, Bilaspur, Chhal, Korba, Kunjemura, Milupara, Raipur, Tumidih
Delhi Delhi
Gujarat Ahmedabad, Ankleshwar, Gandhinagar, Nandesari, Surat, Vapi, Vatva
Haryana Ambala, Bahadurgarh, Ballabgarh, Bhiwani, Charkhi Dadri, Dharuhera, Faridabad, Fatehabad, Gurugram, Hisar, Jind, Kaithal, Karnal, Kurukshetra, Mandikhera, Manesar, Narnaul, Palwal, Panchkula, Panipat, Rohtak, Sirsa, Sonipat, Yamuna Nagar
Himachal Pradesh Baddi
Jammu and Kashmir Srinagar
Jharkhand Dhanbad, Jorapokhar
Karnataka Bagalkot, Belgaum, Bengaluru, Bidar, Chamarajanagar, Chikkaballapur, Chikkamagaluru, Davanagere, Dharwad, Gadag, Hassan, Haveri, Hubballi, Kalaburagi, Kolar, Koppal, Madikeri, Mangalore, Mysuru, Raichur, Ramanagara, Shivamogga, Tumakuru, Udupi, Vijayapura, Yadgir
Kerala Eloor, Ernakulam, Kannur, Kochi, Kollam, Kozhikode, Thiruvananthapuram, Thrissur
Madhya Pradesh Bhopal, Damoh, Dewas, Gwalior, Indore, Jabalpur, Katni, Maihar, Mandideep, Pithampur, Ratlam, Sagar, Satna, Singrauli, Ujjain
Maharashtra Aurangabad, Chandrapur, Kalyan, Mumbai, Nagpur, Nashik, Navi Mumbai, Pune, Solapur, Thane
Manipur Imphal
Meghalaya Shillong
Mizoram Aizawl
Nagaland Kohima
Odisha Baripada, Bileipada, Brajrajnagar, Keonjhar, Nayagarh, Rairangpur, Rourkela, Suakati, Talcher, Tensa
Puducherry Puducherry
Punjab Amritsar, Bathinda, Jalandhar, Khanna, Ludhiana, Mandi Gobindgarh, Patiala, Rupnagar
Rajasthan Ajmer, Alwar, Banswara, Barmer, Bharatpur, Bhiwadi, Bikaner, Chittorgarh, Churu, Dausa, Dholpur, Hanumangarh, Jaipur, Jaisalmer, Jhalawar, Jhunjhunu, Jodhpur, Karauli, Kota, Pali, Pratapgarh, Rajsamand, Sawai Madhopur, Sikar, Sirohi, Sri Ganganagar, Udaipur
Sikkim Gangtok
Tamil Nadu Ariyalur, Chengalpattu, Chennai, Coimbatore, Cuddalore, Dindigul, Gummidipoondi, Hosur, Kanchipuram, Ooty, Palkalaiperur, Ramanathapuram, Salem, Thoothukudi, Tirupur, Vellore
Telangana Hyderabad
Tripura Agartala
Uttar Pradesh Agra, Baghpat, Bareilly, Bulandshahr, Firozabad, Ghaziabad, Gorakhpur, Greater Noida, Hapur, Jhansi, Kanpur, Khurja, Lucknow, Meerut, Moradabad, Muzaffarnagar, Noida, Prayagraj, Varanasi, Vrindavan
Uttarakhand Dehradun, Kashipur, Rishikesh
West Bengal Asansol, Durgapur, Haldia, Howrah, Kolkata, Siliguri
  • Data Source: Link
  • Raw data size and format: 1.57GB, CSV.
  • Number of tags: 35 (The column structure varies for each city).
TAG DESCRIPTION
PM10 Particulate Matter 10 (ug/m³)
PM2.5 Particulate Matter 2.5 (ug/m³)
CO Carbon Monoxide (mg/Nm³, mg/m³, ng/m³, ug/m³)
CO2 Carbon Dioxide (mg/m³)
NO Nitric Oxide (mg/m³, ppb, ppm, ug/m³)
NO2 Nitrogen Dioxide (ug/m³)
NOx Nitrogen Oxides (ppb, ppm, ug/m³)
NH3 Ammonia (ppb, ug/m³)
SO2 Sulfur Dioxide (ug/m³)
Temp Temperature (degrees Celsius or ug/m³)
AT Air Temperature (degrees Celsius or ug/m³)
BP Barometric Pressure (W/mt², mg/m³, mmHg)
Benzene Concentration of Benzene in the air (mg/m³ or ug/m³)
CH4 Methane (ug/m³)
Eth-Benzene Concentration of Ethylbenzene in the air (ug/m³)
Gust Wind Gust (kl/h, km/hr, m/s)
HCHO Formaldehyde (ug/m³)
Hg Mercury (ug/m³)
MH Mixing Height (meters (m))
MP-Xylene Concentration of Meta-Para Xylene in the air (ug/m³)
NMHC Non-Methane Hydrocarbons (ug/m³)
O Xylene Concentration of Ortho-Xylene in the air (ug/m³)
Ozone Ozone Concentration (ppb, ug/m³)
Power Power Consumption (Watts (W))
RF Rainfall (m/s, mm)
RH Relative Humidity (%, W/mt², degree)
SPM Suspended Particulate Matter (ug/m³)
SR Solar Radiation (W/mt², ug/m³)
THC Total Hydrocarbons (ug/m³)
Toluene Concentration of Toluene in the air (ug/m³)
VWS Wind Speed (degree, m/s)
Variance Variance (n)
WD Wind Direction (deg, degree C, degree)
WS Wind Speed (m/s, ug/m³)
Xylene Concentration of Xylene in the air (ug/m³)

2. Data Visualization with Machbase Neo


  • Data visualization is possible through the Tag Analyzer in Machbase Neo.
  • Select desired tag names and visualize them in various types of graphs.
  • Below, access the 2024-13 DataHub in real-time, select the desired tag names from the data of 35 tags, visualize them, and preview the data patterns.
DataHub Viewer

3. Table Creation and Data Upload in Machbase Neo


  • In the DataHub directory, use setup.wrk located in the India Air Quality Dataset folder to create tables and load data, as illustrated in the image below.

1) Table Creation

  • The table is created immediately upon pressing the "Run" button in the menu.
  • If the india_air_quality table exists, execute the first line and then the second. If it does not exist, start from the second line.

2) Data Upload


  • Loading tables in two different ways.
Method 1) Table loading method using TQL in Machbase Neo (since machbase-neo v8.0.29-rc1

  • Pros

    • Markbase Neo loads as soon as you hit the launch button.
  • Cons

    • Slower table loading speed compared to other method.
Method 2) Loading tables using commands

  • Pros

    • Fast table loading speed.
  • Cons

    • The table loading process is cumbersome.
    • Run cmd window - Change machbase-neo path - Enter command in cmd window.
  • If run the below script from the command shell, the data will be entered at high speed into the india_air_quality table.
curl http://data.yotahub.com/2024-13/datahub-2024-13-India-Air-Quality.csv.gz | machbase-neo shell import --input - --compress gzip --header --method append --timeformat ns india_air_quality
  • If specify a separate username and password, use the --user and --password options (if not sys/manager) and add the options as shown below.
curl http://data.yotahub.com/2024-13/datahub-2024-13-India-Air-Quality.csv.gz | machbase-neo shell import --input - --compress gzip --header --method append --timeformat ns india_air_quality --user USERNAME --password PASSWORD

4. Experimental Methodology


  • Model Objective: Nitrogen Dioxide Levels Forecasting.
  • Tags Used: AP001 tags.
  • Model Configuration: Temporal Convolutional Network(TCN).
  • Learning Method: supervised Learning.
    • Train: Model Training.
    • Test: Model Performance Evaluation Based on Nitrogen Dioxide Levels Forecasting.
  • Model Optimizer: Adam.
  • Model Loss Function: Mean Squared Error.
  • Model Performance Metric: Mean Squared Error & R2 Score.
  • Data Loading Method
    • Loading the Entire Dataset.
    • Loading the Batch Dataset.
  • Data Preprocessing
    • MinMax Scaling.

5. Experiment Code


  • Below is the code for each of the two ways to get data from the database.
  • If all the data can be loaded and trained at once without causing memory errors, then method 1 is the fastest and simplest.
  • If the data is too large, causing memory errors, then the batch loading method proposed in method 2 is the most efficient.

Method 1) Loading the Entire Dataset


  • The code below is implemented in a way that loads all the data needed for training from the database all at once.
  • It is exactly the same as loading all CSV files (The only difference is that the data is loaded from Machbase Neo).
  • Pros
    • Can use the same code that was previously utilizing CSVs (Only the loading process is different).
  • Cons
    • Unable to train if trainable data size exceeds memory size.

Method 2) Loading the Batch Dataset


  • Method for loading data from the Machbase Neo for a single batch size.
  • The code below is for fetching a time range sequentially for a single batch size.
  • Pros
    • It is possible to train the model regardless of the data size, no matter how large it is.
  • Cons
    • It takes longer to train compared to method 1.

6. Experimental Results


Method 1) Loading the Entire Dataset Result


Method 2) Loading the Batch Dataset Result


  • The R2 score for loading the entire dataset resulted in 0.86, loading the batch dataset resulted in same 0.833.




※ Various datasets and tutorial codes can be found in the GitHub repository below.

datahub/dataset/2024 at main · machbase/datahub
All Industrial IoT DataHub with data visualization and AI source - machbase/datahub

Back to Top