Error Prevalence in NIDS datasets: A Case Study on CIC-IDS-2017 and CSE-CIC-IDS-2018 (G. Engelen)
PIRAT Research Team PIRAT Research Team
175 subscribers
2,019 views
39

 Published On Dec 5, 2022

Network Intrusion Detection Systems play a critical role in protecting network architectures from harm. In the past decade, Machine Learning has moved to the forefront of research in this field, with many approaches resulting in great performance on benchmark NIDS datasets. The relevance of these performance results is however directly tied to the quality of the benchmark datasets used for training, which have so far only been subjected to limited analysis. In this presentation, I will dig deeper into the numerous errors we uncovered in two important and widely used NIDS benchmark datasets, CIC-IDS2017 and CSE-CIC-IDS2018, with errors ranging from issues in data pre-processing, attack simulation and documentation to faulty ground-truth of the underlying labelling logic. I will also talk about how we went about rectifying these errors, and what the field of NIDS needs in terms of dataset quality in order to move forward.

Gints Engelen is currently a 3rd year PhD student at the ‘DistriNet’ research group, department of Computer Science at the university of KU Leuven (Belgium). His research focuses on Robust Machine Learning approaches for Network Intrusion Detection Systems (NIDS). During the past 2 years, he has collaborated with the University of New-South Wales and the University of Edinburgh on in-depth analyses of the most widespread and important NIDS datasets.

show more

Share/Embed