Cyber-Defense Datasets for Artificial Intelligence, Machine Learning, Land Deep Learning Training
Image Credit: NSWC Crane Corporate Communications

Posted on March 4, 2019 | Completed on March 1, 2019 | By: Scott E. Armistead, Duane Wilson, Ph.d, Paul B. Losiewicz, Roderick A. Nettles

Are there representative computer network traffic data sets openly available to industry to help with refining machine learning/deep learning detection and alert techniques?

DSIAC and the Cyber Security and Information Analysis Center (CSIAC) cyber-defense and survivability subject matter experts (SMEs) collaborated to discuss the issue with various cyber-defense related organizations to determine if a canonical dataset was available that could be accessed by U.S. government (USG), industry, and/or academia agencies. Although these datasets likely exist within industry as proprietary information, an openly accessible one was not found. However, open sources such as Google Dataset Search and Open Source AWS were searched to provide a list of datasets that could provide partial solutions. Of these, the best suited for the inquirer was the BATtle of the Attack Detection ALgorithms (BATADAL) dataset. With the rise in both domestic and partner nation cyber attacks, including AI-based ones, CSIAC is carrying the issue of lacking a canonical dataset for training cyber-defense systems forward to various USG cyber agencies and will brief the issue at the upcoming NATO cyber working group covering this area.

Want to find out more about this topic?

Request a FREE Technical Inquiry!