A parallel content-based image retrieval system using spark and tachyon frameworks

Citation:

Mezzoudj S, Behloul A, Seghir R, Saadna Y. A parallel content-based image retrieval system using spark and tachyon frameworks. Journal of King Saud University - Computer and Information SciencesJournal of King Saud University - Computer and Information Sciences. 2021.

Date Published:

2021

Abstract:

With the huge increase of large-scale multimedia over Internet, especially images, building Content-Based Image Retrieval (CBIR) systems for large-scale images has become a big challenge. One of the drawbacks associated with CBIR is the very long execution time. In this article, we propose a fast Content-Based Image Retrieval system using Spark (CBIR-S) targeting large-scale images. Our system is composed of two steps. (i) image indexation step, in which we use MapReduce distributed model on Spark in order to speed up the indexation process. We also use a memory-centric distributed storage system, called Tachyon, to enhance the write operation (ii) image retrieving step which we speed up by using a parallel k-Nearest Neighbors (k-NN) search method based on MapReduce model implemented under Apache Spark, in addition to exploiting the cache method of spark framework. We have showed, through a wide set of experiments, the effectiveness of our approach in terms of processing time.