hadoop - HDFS concat operation: Does it lead to increased seek time? -


i trying go through how hdfs implements concat operation , drilled down following piece of code.

from implementation seems me concat meta operation on inode of target file , actual blocks not moved. thinking if lead fragmentation + increased seek time different blocks on different locations on disk (considering magnetic disk). assumption correct? if yes can avoid this?

after few experiments found answer own question. after frequent file concat operations (around 1k per minute) data node started complaining many blocks in around day lead me believe indeed lead fragmentation , increased number of blocks on disk. solution used write separate job concatenates (and compresses in case) these files single splittable archive (note gzip not splittable!).


Comments