
It's time to wake up, folks. A 10-year-old data set for intrusion detection is utterly worthless, as your conclusions will be if you use it. I will never again read further than "benchmark KDD '99 intrusion data set." There is no faster way to communicate to an informed audience that you just don't understand intrusions than by analyzing data that is this old. Such attacks are generations behind those that modern network defenders face today. Understand this: you are solving the problems exemplified by your data set. If your data is 11 years old, so is your problem, and your solution is only as effective as that problem is relevant. Few, if any, attacks from 1999 are relevant today.
Make no mistake about it, I understand the researcher's lament! There is no modern pre-classified data set like those relics of careers gone by. Finding a good corpus is excruciatingly difficult. But in legitimate, scientific, empirical studies, this is absolutely no excuse for using irrelevant data. In fact, without first establishing the relevancy of ANY data set, even those used in the past, one's findings fall apart.
To pick but one example, in the last two issues of IEEE Transactions on Dependable and Secure Computing, two of the three IDS-related articles
The data commonly considered the "gold standard" by academics has not been relevant for at least half a decade. Research done in that period whose findings relied on 2001 and prior data is not in any way conclusive, in my professional opinion.
3 comments:
Totally agree... Do you have any practical advice for research who want to acquire realistic data sets for their IDS prototypes?
Thanks for the question, I should have covered this in my post.
If you're having trouble collecting a good corpus of data through analysis of your university's perimeter, or are looking to detect attacks typically directed at victims other than in an academic setting, find a partner in industry. Many large companies are desperate to find new methods for detecting sophisticated attacks and will be willing to work with you to get access to data under an NDA or in an obfuscated manner. Your university or professor(s) should have contacts to help grease the skids here.
Yet another paper published in 2011 using 1999 DARPA data. Sigh.
Perhaps reviewers, not just the authors, are also at fault.
Post a Comment