Hadoop Distributed File System (HDFS) is widely used in
large-scale data storage and processing. HDFS uses
MapReduce programming model for parallel processing. The
work presented in this paper proposes a novel Hadoop plugin
to process image files with MapReduce model. The plugin
introduces image related I/O formats and novel classes for
creating records from input files. HDFS is especially
designed to work with small number of large size files.
Therefore, the proposed technique is based on merging
multiple small size files into one large file to prevent the
performance loss stemming from working with large number
of small size files. In that way, each task becomes capable of
processing multiple images in a single run cycle. The
effectiveness of the proposed technique is proven by an
application scenario for face detection on distributed image
files.
Go here
Büyük Veri, Paralel İşleme ve Akademisyenlik [Link]
Veri Analitiği & Büyük Veri [Link]