Distance-Based Methods in Machine Learning

Distance-Based Methods

Distance-based algorithms are machine algorithms that classify queries by computing distances between these queries and a number of internally stored exemplars. Exemplars that are closest to the query have the largest influence on the classification assigned to the query.

The buzz term similarity distance measure has a wide variety of definitions among math and data mining practitioners. As a result, those terms, concepts, and their usage went way beyond the head for the beginner, who started to understand them for the very first time.

1. Similarity

The similarity measure is the measure of how much alike two data objects are. A similarity measure in a data mining context is a distance with dimensions representing features of the objects. If the distance is small, it will be a high degree of similarity whereas a large distance will be a low degree of similarity.

2. Euclidean Space

Euclidean space encompasses the 3 nos. two-dimensional Euclidean planes perpendicular to each other and the three-dimensional space of Euclidean geometry. Every point in three-dimensional Euclidean space is determined by three coordinates.

Euclidean space can as one possible choice of representation be modeled using Cartesian coordinates. In this case, the Euclidean space is then modeled by the real coordinate space (Rⁿ of the same dimension. In one dimension, this is the real line. In two dimensions, it is the Cartesian plane, and in higher dimensions, it is a coordinate space with three or more real number coordinates.