Kaggle Smart Home Data

Last updated on Mar 25, 2025

This post describes how to use smart home energy usage data and weather data to train AI and predict future energy usage.

Data Introduction
Data Visualization with Machbase Neo
Table Creation and Data Upload in Machbase Neo
Experimental Methodology
Experiment Code
Experimental Results

1. Data Introduction

DataHub Serial Number: 2024-1.
Data Name: Smart Home Dataset with weather Information
Data Collection Methods: Collecting the one-minute power consumption data of household appliances measured by sensors and the weather conditions of the corresponding area, including temperature, humidity, and other relevant factors.
Data Source: Link
Raw data size and format: 130 MB, CSV.
Number of tags: 28.

Tag	Description
TAG-Barn [kW]	Power consumption in the barn.
TAG-Dishwasher [kW]	Power consumption of the dishwasher.
TAG-Fridge [kW]	Power consumption of the refrigerator.
TAG-Furnace 1 [kW]	Power consumption of Furnace 1.
TAG-Furnace 2 [kW]	Power consumption of Furnace 2.
TAG-Garage door [kW]	Power consumption for operating the garage door.
TAG-Home office [kW]	Power consumption in the home office.
TAG-House overall [kW]	Total power consumption of the house.
TAG-Kitchen 12 [kW]	Power consumption in Kitchen 12.
TAG-Kitchen 14 [kW]	Power consumption in Kitchen 14.
TAG-Kitchen 38 [kW]	Power consumption in Kitchen 38.
TAG-Living room [kW]	Power consumption in the living room.
TAG-Microwave [kW]	Power consumption of the microwave.
TAG-Solar [kW]	Power generation from the solar power system.
TAG-Well [kW]	Power consumption for the well.
TAG-Wine cellar [kW]	Power consumption of the wine cellar.
TAG-gen [kW]	Power generation from the generator.
TAG-use [kW]	Power consumption from the use of appliances.
TAG-apparentTemperature	Apparent temperature (perceived temperature).
TAG-dewPoint	Dew point (temperature at which vapor condenses).
TAG-humidity	Humidity.
TAG-precipIntensity	Precipitation intensity (rain strength).
TAG-precipProbability	Probability of precipitation.
TAG-pressure	Atmospheric pressure.
TAG-temperature	Temperature.
TAG-visibility	Visibility distance (clarity of sight in the atmosphere).
TAG-windBearing	Wind direction.
TAG-windSpeed	Wind speed.

Data Time Range: 2016-01-01 05:00:00 to 2016-01-07 09:58:29.
Number of data records collected: 14,109,480.
CSV data URL: https://data.yotahub.com/2024-1/datahub-2024-1-home.csv.gz
Data Migration: Kaggle Smart Home Data Migration

2. Data Visualization with Machbase Neo

Data visualization is possible through the Tag Analyzer in Machbase Neo.
Select desired tag names and visualize them in various types of graphs.

Below, access the 2024-1 DataHub in real-time, select the desired tag names from the data of 28 tags, visualize them, and preview the data patterns.

DataHub Viewer

Click the link above to see the data

3. Table Creation and Data Upload in Machbase Neo

In the DataHub directory, use setup.wrk located in the Smart Home Dataset folder to create tables and load data, as illustrated in the image below.

1) Table Creation

The table is created immediately upon pressing the "Run" button in the menu.
If the home table exists, execute the first line and then the second. If it does not exist, start from the second line.

2) Data Upload

Loading tables in two different ways.

Method 1) Table loading method using TQL in Machbase Neo (since machbase-neo v8.0.29-rc1

Pros
- Machbase Neo loads as soon as you hit the launch button.
Cons
- Slower table loading speed compared to other method.

Method 2) Loading tables using commands

Pros
- Fast table loading speed.
Cons
- The table loading process is cumbersome.
- Run cmd window - Change machbase-neo path - Enter command in cmd window.

If run the below script from the command shell, the data will be entered at high speed into the home table.

curl http://data.yotahub.com/2024-1/datahub-2024-1-home.csv.gz | machbase-neo shell import --input -  --compress gzip --header --method append --timeformat ns home

If specify a separate username and password, use the --user and --password options (if not sys/manager) and add the options as shown below.

curl http://data.yotahub.com/2024-1/datahub-2024-1-home.csv.gz | machbase-neo shell import --input -  --compress gzip --header --method append --timeformat ns home --user USERNAME --password PASSWORD

4. Experimental Methodology

Model Objective: Anomaly detection for a specific sensor.
Tags Used: TAG-windBearing, TAG-windSpeed.
Model Configuration: LSTM AutoEncoder.

Learning Method: Unsupervised Learning.
- Train: Model Training.
- Validation: Threshold Calculation.
- Test: Model Performance Evaluation Based on Threshold.
Model Optimizer: Adam.
Model Loss Function: Mean Squared Error.
Setting Thresholds:
- Mean + Standard Deviation.
- Maximum Value.
- 99th Percentile - Standard Deviation.
Model Performance Metric: F1 Score.
Data Loading Method
- Loading the Entire Dataset.
- Loading the Batch Dataset.
Data Preprocessing
- MinMax Scaling.

5. Experiment Code

Below is the code for each of the two ways to get data from the database.
If all the data can be loaded and trained at once without causing memory errors, then method 1 is the fastest and simplest.
If the data is too large, causing memory errors, then the batch loading method proposed in method 2 is the most efficient.

Method 1) Loading the Entire Dataset

The code below is implemented in a way that loads all the data needed for training from the database all at once.
It is exactly the same as loading all CSV files (The only difference is that the data is loaded from Machbase Neo).
Pros
- Can use the same code that was previously utilizing CSVs (Only the loading process is different).
Cons
- Unable to train if trainable data size exceeds memory size.

The entire code can be run through 1. Smart_Home_General.ipynb.

Method 2) Loading the Batch Dataset

Method for loading data from the Machbase Neo for a single batch size.
The code below is for fetching a time range sequentially for a single batch size.
Pros
- It is possible to train the model regardless of the data size, no matter how large it is.
Cons
- It takes longer to train compared to method 1.

The entire code can be run through 1.Smart_Home_New_batch.ipynb.

6. Experimental Results

Evaluate performance for each methhod and threshold.

Method 1) Loading the Entire Dataset Result

Method 2) Loading the Batch Dataset Result

The F1 Score for both methods, using Maximum Value as the threshold, shows the highest performance at 1.0.

Kaggle Smart Home Data

Table of Contents

1. Data Introduction

2. Data Visualization with Machbase Neo

3. Table Creation and Data Upload in Machbase Neo

1) Table Creation

2) Data Upload

Method 1) Table loading method using TQL in Machbase Neo (since machbase-neo v8.0.29-rc1

Method 2) Loading tables using commands

4. Experimental Methodology

5. Experiment Code

Method 1) Loading the Entire Dataset

Method 2) Loading the Batch Dataset

6. Experimental Results

Method 1) Loading the Entire Dataset Result

Method 2) Loading the Batch Dataset Result

※ Various datasets and tutorial codes can be found in the GitHub repository below.