Education logo

K Nearest Neighbour

This is my YouTube channel for data science and Machine Learning for Beginners in English , हिंदी and తెలుగు. Please Subscribe and share https://www.youtube.com/channel/UCtcEuGgCTWzzXBGvwCKm5VA.

By Meeraj KanaparthiPublished 3 years ago 4 min read
Like
K Nearest Neighbour
Photo by Kevin Ku on Unsplash

This is Simple and effective classify the classes

For example, to identify the fruit viz., by colour and density.

Two Dimension data of two class classification

we gave a new sample, given a sample, come under the graph.The new sample will be near the apple, then it means it is apple.

If another sample is near orange in the new sample, it means it is orange

If the sample is middle, we are not clear, whether this should be apple or orange.

If it is near the orange, then we say it is orange.

K Nearest Neighbour

we look at 3 NN, we will see the majority and classify the sample

Nearest Neighbour says Apple

3 Nearest Neighbour say Orange

Voting the class of the sample within 3 nearest known class

What is K? we don't know what is right K, it is parameter we need to fiddle it. We need to identify K, basing on experience and adjust it by appropriation.

We take odd, number so we will get majority voting

Distance

Calculating the distance from the sample to nearest known class

We take Euclid distance, (or absolute distance), is this distance is mm or KM, this changes the base. With no extra knowledge, we calculate Euclid distance and find out KNN algorithm.

Linear Regression classification

We have 2 class classification problem, represented in 2 dimensions, samples in the first class and second class. Given a new sample, we need to identify whether the class is class 1 or class 2.

Linear classification, express this decision rule in extremely simple and effective with help of a line.

Linear Classification

This line is Decision boundary represented as W(t)X, i.e., w(1)x(1)+w(2)x(2) = 0, x1 is first dimensions and x2 is second dimensions. Lest us assume that x1-x2 = 0, it means (graph)

First coefficient is 1, second efficient is -1

if W(t)X is less than 0, then come on downside, if greater than 0otherwise upside.

Decide as class 1, if W(t)x >=0

Decide as class 2, if W(t)x <0

This gives us an effective way and extremely simple way of classifying with help of few coefficients W. W is coefficients, that is the parameters we are learning.

Given the data { (x, y)}, find W

So we write this decision boundary as W(t)X = 0, we kept the vector notation because same equation is valid for 2 dimensions or D- dimensions. In practical problem we will work in D dimension.

Our decision rule will be sign (W(t)X) +/-

we can say one class if + or other as -, so we don't implication, that which fruit is considered as + or -

Demo

Please check the description for the demo, reference for the material.

Lets load libraries from numpy, matplotlib, pandas

We are loading iris data and we are gathering x and y values. Here y values are the classes for the class.

We are loading iris data and we are gathering x and y values. Here y values are the classes for the class.

We are splitting the test data by test size of 0.4, which means from the data, it split into 2 pieces. 60 percent rows are for training data, and 40 percent rows are for testing.

We are scaling the data using Standard Scaler, to normalise the data. In order to scale the data between 0 and 1, we use fit and transform function. Normalise data is good for building a model to reduce the ambiguity of the high and low values of data.

We can implement KNN algorithm from sklean library and by using the predict function, we can predict the class of the of the input data.

Here we are passing the train data for modelling.

Confusion Matrix

In machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout

that allows visualization of performing an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix).

In abstract terms, the confusion matrix is:

Where: P = Positive; N = Negative; TP = True Positive; FP = False Positive; TN = True Negative; FN = False Negative.

Accuracy will yield misleading results if the data set is unbalanced; When the numbers of observations in distinct classes vary.

 For example, if there were 95 cats and only 5 dogs in the data, a particular classifier might classify all the observations as cats.

The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cat class but a 0% recognition rate for the dog class.

Here, from predicted output and actual output (data separated from the split), we can see the confusion matrix.

With the confusion matrix, we can see true positive results of the test.

With similar data we can see classification report of the predicted data with actual and see other parameters of the confusion matrix such as precision, recall, score and support.

Finally, we can Accuracy of the KNN algorithm by calling accuracy score method of sklearn.

Below is the demo in English, हिंदी (Hindi), తెలుగు(Telugu)

English

(Hindi) — हिंदी

(Telugu) — తెలుగు

Please don't forget to Subscribe my channel Meeraj Kanaparthi. please press the subscribe button. I also upload latest trends and research in live videos, so please click on the bell icon to get the alert.

Medium: https://kmeeraj.medium.com/k-nearest-neighbour-431e0996a268

Github: https://github.com/kmeeraj/machinelearning/tree/develop

Github Demo: https://github.com/kmeeraj/machinelearning/blob/develop/algorithms/K%20Nearest%20Neighbour.ipynb

colab: https://colab.research.google.com/gist/kmeeraj/9c77ec63c31e3a6684be2d6035e292a7/k-nearest-neighbour.ipynb

Gist: https://gist.github.com/kmeeraj/9c77ec63c31e3a6684be2d6035e292a7

Reference : https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_knn_algorithm_finding_nearest_neighbors.htm

Wiki: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Confusion Matrix: https://en.wikipedia.org/wiki/Confusion_matrix

Sigmoid function: https://en.wikipedia.org/wiki/Sigmoid_function

credit

Music: https://www.bensound.com

Social Media:

https://www.linkedin.com/in/meeraj-k-69ba76189/

https://facebook.com/meeraj.k.35

https://www.instagram.com/meeraj.kanaparthi1/

https://twitter.com/MeerajKanapart2

courses
Like

About the Creator

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.