It takes an average 200+ days to detect a cyber breach while network data is quietly siphoned. Fear not, machine learning is here—the hot new cyber defense. Unfortunately, it is often misunderstood and harder yet to determine its effectiveness. Read this white paper to understand the two ways machines learn and how to measure their success at detecting unknown malware.
Machine learning is the process of computer programs becoming more accurate at a task due to exposure to data, called training instances. What task is being performed and what type of data is provided for training is critical to deciding how and what techniques to use in the machine learning process
The most important aspect of the data is whether it is “labeled”. Labeled means that someone has assigned a category of interest to each training instance. The best label is dependent on what task the machine learning program is seeking to accomplish. For example, if one wants a program to distinguish between cats and dogs labels of “cat” and “dog” are sufficient. If, however, one wishes to distinguish between breeds of cat or dog labels such as; Pug, Dalmatian, American Shorthair, Siamese, Collie, German Shepard are required
If, however, we instead want to distinguish between young and old animals we would use labels related to age; newborn, adolescent, adult, senior. The labels, the training instances, and the desired task are inextricably linked. Labeling may seem trivial for something as familiar as dogs and cats but in general it can be a difficult, expensive and time consuming process to attain enough training instances of each label to produce highly accurate machine learning models