ON SPEEDING UP K-MEANS CLUSTERING USING GRAPHICS PROCESSING UNITS

  • Luciano Jose Senger UEPG
  • William Maukoski
  • Lilian Tais de Gouveia

Resumo

Due to advancements in technology, modern farms work differently than those from past. Over the years, there was an increase in the number of data collected by sensors, cameras and other systems. In this scenario, the implementation of data mining on parallel computing systems is crucial for ensuring system scalability and better performance as data continues to grow.  This paper presents a case study on a parallel implementation of the K-means clustering. Clustering has been used in many applications including image processing, information retrieval and climatology. However, k-means clustering is knowing to be computationally expensive when applied to obtain clusters from large datasets. The parallel k-means implementation is target to general purpose graphics processing units. In order to implement the k-means clustering to this parallel architecture and to provide a better software platform to data science research, the Weka k-means implementation were chosen and adapted. First, a profiler was used to identify the most time-consuming portions of code. By using a profiler, it was possible to verify a 95.57% reduction in the number of lines of code that would need to be analyzed to rewrite the code. After, a parallel k-means clustering was implemented and evaluated. The results show that using the parallel k-means clustering and graphics processing units, data mining results can be achieved in reduced times. The speedup achieved was up to 26 when using all available execution cores.
Publicado
2020-07-07
Seção
Papers