使用Sqoop将数据从MySQL导入HBase (二)

芒果6个月前技术文章151

创建hbase表
create_namespace 'data';
create 'data:data', {NAME => 'cf1'}, {NAME => 'cf2'}, {NAME => 'cf3'};
972DD8CA-50E1-4172-A60E-F3308A8B2903.png
使用sqoop导入(遇到了问题 查看问题2)
sqoop import \
--connect jdbc:mysql://172.16.121.194:3306/test?zeroDateTimeBehavior=CONVERT_TO_NULL \
--username root \
--password 123456 \
--table data2 \
--hbase-table data:data \
--column-family cf1 \
--hbase-row-key code \
-m 1
-----下面这个不用建立表
sqoop import --connect jdbc:mysql://172.16.121.194:3306/test?zeroDateTimeBehavior=CONVERT_TO_NULL --username root --password 123456 --table data2 --hbase-table data --column-family info --hbase-row-key code --hbase-create-table --columns "province,code,city,district,year,model,company,brand,type,use_type,seat_count,engine_no,displacement,power,fuel_type,total_weight,kerb_mass,max_mass,tire_size,tire_count,length,width,height,note,extra_company,original_model,production_company,vehicle_type,register_date,owner_sex" --hbase-create-table --split-by code
image.png
查看 hbase表
scan 'data:data'
image.png
问题1、sqoop从mysql导入hbase报错
2023-12-22 14:58:20,545 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.HBaseAdmin.<init>(Lorg/apache/hadoop/conf/Configuration;)V
    at org.apache.sqoop.mapreduce.HBaseImportJob.jobSetup(HBaseImportJob.java:163)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:268)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
    at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
解决方法
查看错误,因为对应HBase版本太高导致。Index of /dist/hbase/1.6.0
根据连接下载hbase1.6版本,本地解压,将lib文件夹中所有jar包上传至$SQOOP_HOME/lib文件夹中
问题2、sqoop从mysql导入hbase报错
因为sqoop导入不能出现空值所以需要转换
使用IFNULL()函数将空值转换为指定的替代值(例如"NA”):
创建daata2的表
CREATE TABLE data2 (
    province VARCHAR(255),
    code INT,
    city VARCHAR(255),
    district VARCHAR(255),
    year INT,
    model VARCHAR(255),
    company VARCHAR(255),
    brand VARCHAR(255),
    type VARCHAR(255),
    use_type VARCHAR(255),
    seat_count INT,
    engine_no VARCHAR(255),
    displacement FLOAT,
    power FLOAT,
    fuel_type VARCHAR(255),
    total_weight FLOAT,
    kerb_mass FLOAT,
    max_mass FLOAT,
    tire_size VARCHAR(255),
    tire_count INT,
    length FLOAT,
    width FLOAT,
    height FLOAT,
    note VARCHAR(255),
    extra_company VARCHAR(255),
    original_model VARCHAR(255),
    production_company VARCHAR(255),
    vehicle_type VARCHAR(255),
    register_date INT,
    owner_sex VARCHAR(255)
);
将空值转换并插入到另一个MySQL表中:
INSERT INTO data2 (province, code, city, district, year, model, company, brand, type, use_type, seat_count, engine_no, displacement, power, fuel_type, total_weight, kerb_mass, max_mass, tire_size, tire_count, length, width, height, note, extra_company, original_model, production_company, vehicle_type, register_date, owner_sex)
SELECT
    IFNULL(province, 'NA') AS province,
    IFNULL(code, -1) AS code,
    IFNULL(city, 'NA') AS city,
    IFNULL(district, 'NA') AS district,
    IFNULL(year, -1) AS year,
    IFNULL(model, 'NA') AS model,
    IFNULL(company, 'NA') AS company,
    IFNULL(brand, 'NA') AS brand,
    IFNULL(type, 'NA') AS type,
    IFNULL(use_type, 'NA') AS use_type,
    IFNULL(seat_count, -1) AS seat_count,
    IFNULL(engine_no, 'NA') AS engine_no,
    IFNULL(displacement, -1.0) AS displacement,
    IFNULL(power, -1.0) AS power,
    IFNULL(fuel_type, 'NA') AS fuel_type,
    IFNULL(total_weight, -1.0) AS total_weight,
    IFNULL(kerb_mass, -1.0) AS kerb_mass,
    IFNULL(max_mass, -1.0) AS max_mass,
    IFNULL(tire_size, 'NA') AS tire_size,
    IFNULL(tire_count, -1) AS tire_count,
    IFNULL(length, -1.0) AS length,
    IFNULL(width, -1.0) AS width,
    IFNULL(height, -1.0) AS height,
    IFNULL(note, 'NA') AS note,
    IFNULL(extra_company, 'NA') AS extra_company,
    IFNULL(original_model, 'NA') AS original_model,
    IFNULL(production_company, 'NA') AS production_company,
    IFNULL(vehicle_type, 'NA') AS vehicle_type,
    IFNULL(register_date, -1) AS register_date,
    IFNULL(owner_sex, 'NA') AS owner_sex
FROM data;


相关文章

Linux 文件锁

1、背景Linux 系统定时任务正在执行时,可能会遇到上个周期的任务还没有执行完,这样便会造成相同的任务同一时间有过个任务进程在执行。如果任务有对互斥资源操作时,有可能产生死锁。2、用法参考flock...

Flinksql Kafka 接收流数据并打印到控制台

Flinksql Kafka 接收流数据并打印到控制台

本文目的使用Flink SQL创建一个流处理作业,将来自Kafka主题"dahua_picrecord"的数据写入到另一个表”print_table”控制台中。使用sql-client前 需要启动ya...

Nginx性能优化

Nginx性能优化

       前言:Nginx作为高性能web服务器,即使不特意调整配置参数也可以处理大量的并发请求。 以下的Nginx配置参数作为参考,具体需根据线上业务情况进行调整。一、worker进程work...

MySQL运维实战(4.4) SQL_MODE之STRICT_TRANS_TABLES和STRICT_ALL_TABLES

如果设置STRICT模式,则如果数据写入时,如果数据不符合字段定义(字符串超出长度、数值类型数据超出范围、违反not null约束等),SQL会报错。如果不设置STRICT模式,会对异常数据进行截断处...

hadoop纠删码

hadoop纠删码

纠删码是CDH6/Hadop3引入的新功能,之前的HDFS都是副本方式容错,默认情况下,一个文件有三个副本,可以容忍任意2个副本(Datanode)不可用,是以牺牲空间的代价提供了数据的可用性,带来了...

开源大数据集群部署(十四)Ranger集成Hbase

开源大数据集群部署(十四)Ranger集成Hbase

在hd1.dtstack.com主机上执行在hmaster和back master上进行安装和执行Ø 解压ranger-2.3.0-hbase-plugin[root@hd1.dtstack.com ...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。