Lightweight Convolutional Recurrent Neural Networks for Sound Event Classification
DOI:
https://doi.org/10.52731/liir.v006.355Keywords:
Sound Event Classification, Convolutional Neural Network, Convolutional Recurrent Neural Network, TransformerAbstract
Sound Event Classification (SEC) is essential for applications like urban noise monitoring and smart home automation, but modern models often struggle with efficiency and deployability. This study evaluated lightweight SEC architectures namely CNN, CRNN, and Transformer using the UrbanSound8K dataset, considering both accuracy and resource consumption. CRNN emerged as the top performer, achieving around 90% accuracy with only 175,754 parameters, surpassing the efficiency of CNNs and Transformers. These results underscore the CRNN's potential for scalable and cost-effective SEC solutions, making it ideal for smart city infrastructure and resource-limited IoT applications.
References
Alsina-Pagès, RM., Benocci, R., Brambilla, G., Zambon, G.: Methods for Noise Event Detection and Assessment of the Sonic Environment by the Harmonica Index, Appl. Sci.11(17), 8031 (2021).
Diez, I., Saratxaga, I., Salegi, U., Navas, E., Hernaez, I.: NoisenSECB: An Urban Sound Event Database to Develop Neural Classification Systems for Noise-Monitoring Applica-tions, Applied Sciences, 13(16), 9358 (2023).
Çakır, E., Parascandolo, G., Heittola, T., Huttunen H., Virtanen, T.: Convolutional Recur-rent Neural Networks for Polyphonic Sound Event Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1291-1303 (2017).
Sabour, S., Frosst, H., Hinton, G. E.: Dynamic routing between capsules, In NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Sys-tems, pp. 3859 – 3869, Long Beach California, USA (2017).
Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: BERT: Pre-Training of Deep Bi-direc-tional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan-guage Technologies, pp. 4171-4186, Minneapolis, Min-nesota (2019).
Ye, Z., Wang X., Liu, H., Qian, Y., Tao, R., Yan, L., Ouchi, K.: Sound Event Classi-fica-tion Transformer: An Event-baSEC End-to-End Model for Sound Event Classi-fication, arXiv:2110.02011, https://arxiv.org/abs/2110.02011, last accesSEC 2025/1/31.
Khan, MS., Shah, M., Khan, A., Aldweesh, A., Ali, M., Eldin, ET., Ishaq, W., Hussain, L.: Improved Multi-Model Classification Technique for Sound Event Classification in Ur-ban Environments, Applied Sciences, 12(19), 9907 (2022).
Filippov, SN., Heinosaari, T., Leppäjärvi, L.: A necessary condition for incompati-bility of observables in general probabilistic theories, Phys. Rev. A 95, 032127 (2017).