Smoke Sensor Data

Smoke Sensor Data

This post describes how to use Smoke Sensor Data to Anomaly Detection through AI learning.

Table of Contents

  1. Data Introduction
  2. Data Visualization with Machbase Neo
  3. Table Creation and Data Upload in Machbase Neo
  4. Experimental Methodology
  5. Experiment Code
  6. Experimental Results

1. Data Introduction


  • DataHub Serial Number: 2025-3.
  • Data Name: Smoke Sensor Data.
  • Data Collection Methods:Collected through IoT sensors.
  • To secure a quality dataset, sampling was conducted in various environments and fire scenarios:
    • Standard indoor environment
    • Standard outdoor environment
    • Indoor wood fire (firefighter training area)
    • Indoor gas fire (firefighter training area)
    • Outdoor wood, coal, and gas grill fires
    • Outdoor high-humidity environment
  • Data Source: Link
  • Raw data size and format: 5.56MB, CSV.
  • Number of tags: 14.
Tag Name Description Unit
Temperature [°C] Air temperature measured in degrees Celsius. °C
Humidity [%] Air humidity expressed as a percentage. %
TVOC [ppb] Total Volatile Organic Compounds concentration measured in parts per billion. ppb
eCO2 [ppm] Equivalent carbon dioxide concentration measured in parts per million. ppm
Raw H2 Raw molecular hydrogen levels, uncalibrated and not compensated for factors like temperature or bias. -
Raw Ethanol Raw gaseous ethanol levels, uncalibrated and uncompensated. -
Pressure [hPA] Atmospheric pressure measured in hectopascals. hPA
PM1.0 Concentration of particulate matter smaller than 1.0 µm. µg/m³
PM2.5 Concentration of particulate matter larger than 1.0 µm but smaller than 2.5 µm. µg/m³
NC0.5 Numerical concentration of particulate matter smaller than 0.5 µm. #/cm³
NC1.0 Numerical concentration of particulate matter between 0.5 µm and 1.0 µm. #/cm³
NC2.5 Numerical concentration of particulate matter between 1.0 µm and 2.5 µm. #/cm³
CNT Sample counter, which tracks the number of collected samples. Count
Fire Alarm Indicates fire presence: 1 if there is a fire, and 0 otherwise. Binary (0 or 1)

2. Data Visualization with Machbase Neo


  • Data visualization is possible through the Tag Analyzer in Machbase Neo.
  • Select desired tag names and visualize them in various types of graphs.
  • Below, access the 2025-3 DataHub in real-time, select the desired tag names from the data of 14 tags, visualize them, and preview the data patterns.
DataHub Viewer

3. Table Creation and Data Upload in Machbase Neo


  • In the DataHub directory, use setup.wrk located in the Smoke Sensor Dataset folder to create tables and load data, as illustrated in the image below.

1) Table Creation

  • The table is created immediately upon pressing the "Run" button in the menu.
  • If the smoke table exists, execute the first line and then the second. If it does not exist, start from the second line.

2) Data Upload


  • Loading tables in two different ways.
Method 1) Table loading method using TQL in Machbase Neo (since machbase-neo v8.0.29-rc1

  • Pros

    • Markbase Neo loads as soon as you hit the launch button.
  • Cons

    • Slower table loading speed compared to other method.
Method 2) Loading tables using commands

  • Pros

    • Fast table loading speed.
  • Cons

    • The table loading process is cumbersome.
    • Run cmd window - Change machbase-neo path - Enter command in cmd window.
  • If run the below script from the command shell, the data will be entered at high speed into the smoke table.
curl http://data.yotahub.com/2025-3/datahub-2025-3-Smoke.csv.gz | machbase-neo shell import --input - --compress gzip --header --method append --timeformat ns smoke
  • If specify a separate username and password, use the --user and --password options (if not sys/manager) and add the options as shown below.
curl http://data.yotahub.com/2025-3/datahub-2025-3-Smoke.csv.gz | machbase-neo shell import --input - --compress gzip --header --method append --timeformat ns smoke --user USERNAME --password PASSWORD

4. Experimental Methodology


  • Model Objective: Smoke Anomaly Detection.
  • Tags Used: CNT, Fire Alarm, Humidity[%], PM1.0, Pressure[hPa], Raw Ethanol, Raw H2, TVOC[ppb], Temperature[C], eCO2[ppm].
  • Model Configuration: ResNet1d.
  • Learning Method: supervised Learning.
    • Train: Model Training.
    • Test: Model Performance Evaluation Based on Smoke Anomaly Detection.
  • Model Optimizer: Adam.
  • Model Loss Function: CrossEntropyLoss.
  • Model Performance Metric: F1 Score.
  • Data Loading Method
    • Loading the Entire Dataset.
    • Loading the Fetch Dataset.
  • Data Preprocessing
    • MinMax Scaling.

5. Experiment Code


  • Composed of three methods.
  • Data Information: Outputs general information about the data.
  • Visual Information: correlation heatmap, plot, Decomposition about the data.
  • Statistical Test: ADF Test, KPSS Test, PP Test, ljung box Test, Arch Test, VIF Test about the data.

Smoke Sensor Anomaly Detection

  • Below is the code for each of the two ways to get data from the database.
  • If all the data can be loaded and trained at once without causing memory errors, then method 1 is the fastest and simplest.
  • If the data is too large, causing memory errors, then the batch loading method proposed in method 2 is the most efficient.

Method 1) Loading the Entire Dataset


  • The code below is implemented in a way that loads all the data needed for training from the database all at once.
  • It is exactly the same as loading all CSV files (The only difference is that the data is loaded from Machbase Neo).
  • Pros
    • Can use the same code that was previously utilizing CSVs (Only the loading process is different).
  • Cons
    • Unable to train if trainable data size exceeds memory size.

Method 2) Loading the Fetch Dataset


  • Method for loading data from the Machbase Neo for a buffer size.
  • Pros
    • It is possible to train the model regardless of the data size, no matter how large it is.
  • Cons
    • It takes longer to train compared to method 1.

6. Experimental Results


Method 1) Loading the Entire Dataset Result


Method 2) Loading the Fetch Dataset Result


  • The F1 score shows high performance above 0.85 in both methods.

※ Various datasets and tutorial codes can be found in the GitHub repository below.

datahub/dataset at main · machbase/datahub
All Industrial IoT DataHub with data visualization and AI source - machbase/datahub

Back to Top