i have dataset of userid , post related each userid. want count number of posts each user. want put posts of each userid (concat posts separation).
any suggestions how go it?
imho, can have mapper , reducer.
mapper:
- class postmapper extends mapper < object, text, text, text>
- map() can write key userid (text) , value post(text) context.
reducer:
class postreducer extends reducer < text, text, text, text >
reduce() can have iterable loop (i) counter counts every fetched post , (ii) text variable can used concatenate every fetched post suitable delimiter.
after completing loop, key / userid and, value / the
concatenated text can written reducer's context.
after job ran successfully, resulting file contain userid , concatenated posts, separated tab.
note: remove tab characters in posts before concatenate. prefix count followed tab , append concatenated posts if want count in output.
Comments
Post a Comment