Confusion Matrix..?The Role of ML in Cyber Security.

Vamsi Mathala
9 min readJun 7, 2021

In this Story, I want to convey my views on confusion matrix a metric tool that comes after Binary Classification Model’s output in ML and does it give any solution in cyber world.

Let’s start from basic learning…

The Undeniable Evolution of AI and Machine Learning…

Once upon a time, the idea of interacting with Artificial Intelligence (AI) was a far off and potentially terrifying concept. Images of killer robot armies and a world run by cyborgs stealing our jobs often came to mind. But, with recent developments, a future with AI actually seems plausible and much closer than we humans think.

Rather than the typical image of a human-like being, types of AI are already in use throughout the world. From the virtual assistant feature in iPhones, SIRI, to self-driving cars and intuitive chatbots, the reality of AI is in the here and now. And it’s more advanced than you think.

Artificial Intelligence..

According to Stanford Researcher, John McCarthy, “Artificial Intelligence is the science and engineering of making intelligent machines, especially intelligent computer programs. Artificial Intelligence is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.”

Simply put, AI’s goal is to make computers/computer programs smart enough to imitate the human mind behaviour.

Knowledge Engineering is an essential part of AI research. Machines and programs need to have bountiful information related to the world to often act and react like human beings. AI must have access to properties, categories, objects and relations between all of them to implement knowledge engineering. AI initiates common sense, problem-solving and analytical reasoning power in machines, which is much difficult and a tedious job.

Machine Learning..

Artificial Intelligence and Machine Learning are much trending and also confused terms nowadays. Machine Learning (ML) is a subset of Artificial Intelligence. ML is a science of designing and applying algorithms that are able to learn things from past cases. If some behaviour exists in past, then you may predict if or it can happen again. Means if there are no past cases then there is no prediction.

ML can be applied to solve tough issues like credit card fraud detection, enable self-driving cars and face detection and recognition. ML uses complex algorithms that constantly iterate over large data sets, analyzing the patterns in data and facilitating machines to respond different situations for which they have not been explicitly programmed. The machines learn from the history to produce reliable results. The ML algorithms use Computer Science and Statistics to predict rational outputs.

Three major areas of ML :
Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

In supervised learning, training datasets are provided to the system. Supervised learning algorithms analyse the data and produce an inferred function. The correct solution thus produced can be used for mapping new examples. Credit card fraud detection is one of the examples of Supervised Learning algorithm.

Unsupervised Learning

Unsupervised Learning algorithms are much harder because the data to be fed is unclustered instead of datasets. Here the goal is to have the machine learn on its own without any supervision. The correct solution of any problem is not provided. The algorithm itself finds the patterns in the data. One of the examples of unsupervised learning is Recommendation engines which are there on all e-commerce sites or also on Facebook friend request suggestion mechanism.

Reinforcement Learning

This type of Machine Learning algorithms allows software agents and machines to automatically determine the ideal behaviour within a specific context, to maximise its performance. Reinforcement learning is defined by characterising a learning problem and not by characterising learning methods. Any method which is well suited to solve the problem, we consider it to be the reinforcement learning method. Reinforcement learning assumes that a software agent i.e. a robot, or a computer program or a bot, connect with a dynamic environment to attain a definite goal. This technique selects the action that would give expected output efficiently and rapidly.

Binary Classification and Confusion Matrix..

Classification: In machine learning, Classification, as the name suggests, classifies data into different parts/classes/groups. It is used to predict from which dataset the input data belongs to.

For example, if we are taking a dataset of scores of a cricketer in the past few matches, along with average, strike rate, not outs etc, we can classify him as “in form” or “out of form”.

Classification is the process of assigning new input variables (X) to the class they most likely belong to, based on a classification model, as constructed from previously labeled training data.

There are two types of classifications;

  • Binary classification
  • Multi-class classification

Binary Classification:

It is a process or task of classification, in which a given data is being classified into two classes. It’s basically a kind of prediction about which of two groups the thing belongs to.

Binary classification uses some algorithms to do the task, some of the most common algorithms used by binary classification are .

  • Logistic Regression
  • k-Nearest Neighbors
  • Decision Trees
  • Support Vector Machine
  • Naive Bayes

Terms Related…

Precision: Precision in binary classification (Yes/No) refers to a model’s ability to correctly interpret positive observations. In other words, how often does a positive value forecast turn out to be correct? We may manipulate this metric by only returning positive for the single observation in which we have the most confidence.

Recall: The recall is also known as sensitivity. In binary classification (Yes/No) recall is used to measure how “sensitive” the classifier is to detecting positive cases. To put it another way, how many real findings did we “catch” in our sample? We may manipulate this metric by classifying both results as positive.

F1 Score: The F1 score can be thought of as a weighted average of precision and recall, with the best value being 1 and the worst being 0. Precision and recall also make an equal contribution to the F1 ranking.

Confusion Matrix..

Confusion matrix , the popular metric in evaluating binary classification results posses a lot more confusion in understanding and significantly as the word suggests so but actually it is very cool and of course an interesting way to evaluate your results ..let’s see further in detail.

Four outcomes of the confusion matrix

The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes. The binary confusion matrix is composed of squares:

Confusion Table
  • TP: True Positive: Predicted values correctly predicted as actual positive
  • FP: Predicted values incorrectly predicted an actual positive. i.e., Negative values predicted as positive
  • FN: False Negative: Positive values predicted as negative
  • TN: True Negative: Predicted values correctly predicted as an actual negative

You can compute the accuracy test from the confusion matrix:

Example of Confusion Matrix:

Confusion Matrix is a useful machine learning method which allows you to measure Recall, Precision, Accuracy, and AUC-ROC curve. Below given is an example to know the terms True Positive, True Negative, False Negative, and True Negative.

True Positive: You projected positive and its turn out to be true. For example, you had predicted that France would win the world cup, and it won.

True Negative: When you predicted negative, and it’s true. You had predicted that England would not win and it lost.

False Positive: Your prediction is positive, and it is false. You had predicted that England would win, but it lost.

False Negative: Your prediction is negative, and result it is also false. You had predicted that France would not win, but it won.

You should remember that we describe predicted values as either True or False or Positive and Negative.

Type 1 and Type 2 errors in Confusion Matrix with example:

Type I Error (False Positive) and Type II Error (False Negative)

False Positive:

Blue cross mark → Red cross mark (Actual → Predicted)

Observe the cross mark symbol in the figure where Blue cross mark represents Actual Value and Red cross mark represents Predicted Value. At the Age of around 25–30 years, a person named Mr. A is not married in reality. But, Prediction says Mr. A is Married which is a wrong prediction (i.e. False) and predicted values are Yes or 1 (i.e. Positive). So it is known as “false positive” or Type I Error.

False Negative:

Blue circle mark → Red circle mark (Actual → Predicted)

Here In this scenario, Blue circle mark and Red circle mark represent Actual Value and Predicted Value respectively. In reality, a person named Mr.B is Married at the age of around 20–25 years. But, predicted value says that Mr.B is not Married. So, in this case, also prediction is wrong (i.e. False) and predicted values are No or 0(i.e. Negative). It is known as “false negative” or Type II Error.

*False Positive- when a prediction is wrong & predicted value is positive false Negative- when a prediction is wrong & predicted value is Negative*

Type I Error (false positive): — Fire alarm rings when there is no fire.

Type II Error(false negative): — Fire alarm fails to ring when there is fire.

What is Cybersecurity?

Cybersecurity refers to practices designed to protect networks, devices, and data from attacks and unauthorized access.

Cybersecurity is critical because it encompasses everything that pertains to protecting sensitive data, personally identifiable information, intellectual property, data, and governmental and industry information systems from password theft, spoofing, phishing, spamming, and other cyber-attacks.

Cybersecurity attacks are growing at an alarming speed and getting more sophisticated with IoT attacks, spam and phishing, crypto-jacking, mobile malware, and ransomware.

How AI and ML Affect CyberSecurity?

With the advancement in the field of AI and ML, new methodologies are being introduced to make the cybersecurity domain automated and risk-free.

Anomaly Detection:

We all know that antivirus software is crucial, but the major issue is that they are reliant on security upgrades for the traditional antivirus software when new viruses are detected. Let’s talk about AI-based antivirus software. Unlike traditional software, they perform anomaly detection to monitor program behavior.

AI and ML techniques can use anomaly detection and can detect suspicious behavior from unrecognized devices joining a network, unusual network traffic, etc. Moreover, AI and ML can detect host-based anomalies, such as excessive CPU utilization as well, indicating the presence of malware.

Detects Malicious Attacks:

With the help of machine learning, cybersecurity systems can examine patterns and learn to help prevent similar attacks and respond accordingly. Unsupervised learning can be helpful in detecting malicious attacks that have not been noticed before. Thus, attacks can be recognized at a very early stage and then neutralized so that they don’t affect the system further.

Unlike humans, AI has the ability to make reasoned decisions in highly complex data environments and has the ability to change when they gather new data. Artificial Intelligence and ML can thus find hidden figures without being explicitly programmed where to look, making it easier for companies and businesses to adapt their security systems as technology unfolds. Furthermore, automatic updates to existing software based on complex analysis by these technologies can prevent cybersecurity attacks to a great extent.

Similarly, there are many cyber crimes that are being able to solve by using the Machine Learning using Binary classification and we are utilizing the metric tool named confusion matrix which we discussed above along with its possible errors to analyze….

And Here I’m attaching one link where you can find a study that describes the 👉🏻Classification model for accuracy and intrusion detection using machine learning approach 👈🏻that comes under the cyber security threat and the solution and consequences found using confusion matrix is described…

Finally…done with my story …Hope you find something interesting..😇😇..Connect me on my LinkedIn for more…

Thanks for reading….signing off👋🏻👋🏻👋🏻

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vamsi Mathala
Vamsi Mathala

Written by Vamsi Mathala

CSE undergraduate, highly interested to work on industry technologies.

No responses yet

Write a response