
It's time to wake up, folks. A 10-year-old data set for intrusion detection is utterly worthless, as your conclusions will be if you use it. I will never again read further than "benchmark KDD '99 intrusion data set." There is no faster way to communicate to an informed audience that you just don't understand intrusions than by analyzing data that is this old. Such attacks are generations behind those that modern network defenders face today. Understand this: you are solving the problems exemplified by your data set. If your data is 11 years old, so is your problem, and your solution is only as effective as that problem is relevant. Few, if any, attacks from 1999 are relevant today.
Make no mistake about it, I understand the researcher's lament! There is no modern pre-classified data set like those relics of careers gone by. Finding a good corpus is excruciatingly difficult. But in legitimate, scientific, empirical studies, this is absolutely no excuse for using irrelevant data. In fact, without first establishing the relevancy of ANY data set, even those used in the past, one's findings fall apart.
To pick but one example, in the last two issues of IEEE Transactions on Dependable and Secure Computing, two of the three IDS-related articles
The data commonly considered the "gold standard" by academics has not been relevant for at least half a decade. Research done in that period whose findings relied on 2001 and prior data is not in any way conclusive, in my professional opinion.