Abstract

The chemical composition of aerosol particles in the air is successfully used to determine their origins, e.g., traffic emissions, biomass burning, or ship emissions. Single-Particle Mass Spectrometry (SPMS) is a sensitive measurement technique to analyze the chemical composition in real-time. The current mainstream classification methods in the SPMS community for handling these data require intensive manual post-processing, making an online analysis impossible. A few studies have demonstrated that supervised learning can perform automated classification of SPMS data with high accuracy, enabling selective air quality monitoring in real-time. However, the generalizability and reliability of those algorithms using SPMS data from different sources (e.g., different SPMS instruments, sampling locations, or weather conditions) are still key issues to be solved. This work investigates the classification generalization capacity (or robustness) of a multilayer perceptron network using two different datasets of SPMS data. The results show that the model trained on one dataset is sensitive to the disparate characteristic features of the other dataset, causing its prediction accuracy to decrease significantly. On the contrary, the model trained with data from both datasets performs strong robustness and adaptation to both datasets, with over 96 % correct classifications. The presented results underscore the feasibility and practicability of a uniform approach for automated profiling of data from different sources.