有关Kerberos认证相关的问题这里不展开讲,有需要的小伙伴可以自行百度,这里主要讲下使用datax抽数从个MySQL到Hive的kerberos认证问题处理解决办法。
datax(mysql2hive.json)(示例):
writer.parameter参数设置:
"haveKerberos":true, "kerberosKeytabFilePath":"/xxx/kerberos/app_prd.keytab", "defaultFS":"hdfs://nameservice1", "kerberosPrincipal":"app_prd@FAYSON.COM"代码如下(示例):
将 core-site.xml hdfs-site.xml hive-site.xml yarn-site.xml 相关的文件,分别放置在datax/hdfsreader/src/main/resources和datax/hdfswriter/src/main/resources目录下
这里直接将配置文件附在了json文件中更直观,不过看起来有些冗长。
{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "xxxx", "password": "xxxx", "column" : [ "id" ,"les_id" ,"grade_id" ,"edition_id" ,"subject_id" ,"course_system_first_id" ,"course_system_second_id" ,"course_system_third_id" ,"course_system_four_id" ,"custom_points" ,"deleted" ,"created_at" ,"tea_id" ,"stu_id" ,"les_uid" ,"updated_at" ,"pt" ], "connection": [ { "jdbcUrl": ["jdbc:mysql://xxxx:3306/test?useUnicode=true&characterEncoding=utf8"], "table": ["ods_lesson_course_content_rt_df_tmp"] } ] } }, "writer": { "name": "hdfswriter", "parameter": { "column": [ {"name":"id" , "type":"int"}, {"name":"les_id" , "type":"int"}, {"name":"grade_id" , "type":"int"}, {"name":"edition_id", "type":"int"}, {"name":"subject_id", "type":"int"}, {"name":"course_system_first_id" , "type":"int"}, {"name":"course_system_second_id", "type":"int"}, {"name":"course_system_third_id" , "type":"int"}, {"name":"course_system_four_id" , "type":"int"}, {"name":"custom_points", "type":"string"}, {"name":"deleted" ,"type":"TINYINT"}, {"name":"created_at" ,"type":"string"}, {"name":"tea_id" ,"type":"int"}, {"name":"stu_id", "type":"int"}, {"name":"les_uid" ,"type":"string"}, {"name":"updated_at" ,"type":"string"} ], "defaultFS": "hdfs://nameservice1", "hadoopConfig":{ "dfs.nameservices": "nameservice1", "dfs.ha.namenodes.nameservice1": "namenode286,namenode36", "dfs.namenode.rpc-address.nameservice1.namenode286": "xxxx:8020", "dfs.namenode.rpc-address.nameservice1.namenode36": "xxxx:8020", "dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" }, "haveKerberos": "true", "kerberosKeytabFilePath": "/home/xx/kerberos/xxx.keytab", "kerberosPrincipal":"xxx@FAYSON.COM", "encoding": "UTF-8", "fileType": "orc", "fileName": "ods_lesson_course_content_rt_df_orc_2", "path": "/user/hive/warehouse/ods.db/ods_lesson_course_content_rt_df_orc_2/pt=2020-01-20", "writeMode": "append", // append & overwrite "fieldDelimiter" :"\u0001" } } } ], "setting": { "speed": { "channel": "5" }, "errorLimit": { "record": 0 } } } }这里主要提供了有关datax抽数的kerberos认证问题解决方案,但是如果Hadoop集群相关的配置文件发生了修改比如NameNode机器、端口号发生修改都需要重新打包,如果不想这么做也可以在配置文件中直接指定,在最后附上解决方案,具体使用哪一种自行决定,我们之所以决定打包到hdfs reader和writer插件中是因为不想json看起来太长,而且像集群的配置文件基本上不会做大的变动,当然如果是升级集群那就需要重新打包了,但是第二种方式就不需要。