NYC Yellow Taxi Data Migration
Introducing the NYC Yellow Taxi Data Migration
1. Check raw data
The handling of basic data cleaning seems necessary.

2. Set the Preprocessing Plan
- Data Cleaning.
- Drop Unnecessary Columns.
- Calculate the time difference and convert to minutes.
- Remove cases where the trip duration is more than 1 hour or the distance is 0 ~ 60.
- Remove the dropoff time.
- Convert to the format: Name, Time, Value & Convert to UTC time.
Data Cleaning
- Drop Unnecessary Columns.
- Calculate the time difference and convert to minutes.
- Remove cases where the trip duration is more than 1 hour or the distance is 0 ~ 60.
- Remove the dropoff time.
Convert to the format: Name, Time, Value & Convert to UTC time
Once the data frame is restructured as shown in the image below, it will be ready for upload to Machbase Neo.

3. Data Upload
Finally, the data can be uploaded to Machbase Neo using the command below.
machbase-neo shell import --input ./datahub-2025-1-taxi.csv.gz --compress gzip --header --method append --timeformat ns taxi
Check the entire code.
datahub/dataset/2025/01.NYC Yellow Taxi/conv/convert.py at main · machbase/datahub
All Industrial IoT DataHub with data visualization and AI source - machbase/datahub
4. Check the results after uploading

Output when executing the following code in the Machbase Neo internal shell.
select * from v$taxi_stat;
※ Subsequent link to the AI training process: NYC Taxi Data