hadoop - Encoding columns in Hive -


i'm importing table mysql hive using sqoop. columns latin1 encoded. there way either:

  1. set encoding columns latin1 in hive. or
  2. convert columns utf-8 while importing sqoop?

in hive --default-character-set used set character set whole database not specific few columns. not able find sqoop parameter convert tables columns utf-8 in fly rather columns expected set type fixed.

$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \ --direct -- --default-character-set=latin1 

i believe need convert latin1 columns utf-8 first in mysql , can import sqoop. can use following script convert columns utf-8, found here.

mysql --database=dbname -b -n -e "show tables" | \ awk '{print "alter table", $1, "convert character set utf8 collate \ utf8_general_ci;"}' | mysql --database=dbname & 

Comments