Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification

Indonesian Journal of Electrical Engineering and Computer Science

Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification

Abstract

In data mining applications, multi-label classification is highly required in many modern applications. Meanwhile, a useful data mining approach is the k-nearest neighbour join, which has high accuracy but time-consuming process. With recent explosion of big data, conventional serial KNN join based multi-label classification algorithm needs to spend a lot of time to handle high volumn of data.  To address this problem, we first design a parallel MapReduce based KNN join algorithm for big data classification. We further implement the algorithm using Hadoop in a cluster with 9 vitual machines. Experiment results show that our MapReduce based KNN join exhibits much higher performance than the serial one. Several interesting phenomenon are observed from the experiment results.

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration