A Model for Outlier Detection and Missing Data Imputation in Traffic Time Series Using Temporal Factors
This study proposes an integrated correction method that effectively handles outliers and missing values in real-time traffic data, using data from 1,569 roads in Incheon between 2022 and 2024. The proposed method first removes outliers empirically, then constructs an integrated pipeline by combining "hourly Z-score" with "hourly average imputation." To validate this approach, we assembled 35 models by combining seven outlier-detection techniques and five missing-value imputation methods, including those commonly used in practice. We then conducted experiments involving artificially generated outliers and missing values, as well as performance comparisons using an LSTM prediction model. The results demonstrate that the proposed method outperforms all other combinations in both verification tests. This suggests that a simple, statistically based preprocessing strategy incorporating hourly characteristics is highly effective for improving urban traffic flow forecasts and has significant potential for real-time environments.