clickhouse集群对接hive（三）

九月2年前 (2023-12-16)技术文章793

前提：集群中已经部署了hive组件和clickhouse集群，clickhouse集群进行对接hive

1、设置hdfs文件系统本地缓存

<local_cache_for_remote_fs>
    <enable>true</enable>
    <root_dir>local_cache</root_dir>
    <limit_size>559096952</limit_size>
    <bytes_read_before_flush>1048576</bytes_read_before_flush>
</local_cache_for_remote_fs>

2、在hive中创建orc的表

user use test;
CREATE TABLE `test`.`test_orc`(
  `f_tinyint` tinyint, 
  `f_smallint` smallint, 
  `f_int` int, 
  `f_integer` int, 
  `f_bigint` bigint, 
  `f_float` float, 
  `f_double` double, 
  `f_decimal` decimal(10,0), 
  `f_timestamp` timestamp, 
  `f_date` date, 
  `f_string` string, 
  `f_varchar` String,
  `f_bool` boolean, 
  `f_binary` binary, 
  `f_array_int` array<int>, 
  `f_array_string` array<string>, 
  `f_array_float` array<float>, 
  `f_array_array_int` array<array<int>>, 
  `f_array_array_string` array<array<string>>, 
  `f_array_array_float` array<array<float>>)
  PARTITIONED BY ( 
  `day` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://mycluster/user/hive/warehouse/test.db/test_orc';
  
insert into test.test_orc partition(day='2021-09-18') select 1, 2, 3, 4, 5, 6.11, 7.22, 8.333, current_timestamp(), current_date(),'hello world', 'hello world',  true, 'hello world', array(1, 2, 3), array('hello world', 'hello world'), array(float(1.1), float(1.2)), array(array(1, 2), array(3, 4)), array(array('a', 'b'), array('c', 'd')), array(array(float(1.11), float(2.22)), array(float(3.33), float(4.44)));

在clickhouse中创建表

CREATE TABLE test.test_orc
(
    `f_tinyint` Int8,
    `f_smallint` Int16,
    `f_int` Int32,
    `f_integer` Int32,
    `f_bigint` Int64,
    `f_float` Float32,
    `f_double` Float64,
    `f_decimal` Float64,
    `f_timestamp` DateTime,
    `f_date` Date,
    `f_string` String,
    `f_varchar` String,
    `f_bool` Bool,
    `f_binary` String,
    `f_array_int` Array(Int32),
    `f_array_string` Array(String),
    `f_array_float` Array(Float32),
    `f_array_array_int` Array(Array(Int32)),
    `f_array_array_string` Array(Array(String)),
    `f_array_array_float` Array(Array(Float32)),
    `day` String
)
ENGINE = Hive('thrift://172.16.121.0:9083', 'test', 'test_orc')
PARTITION BY day

SELECT * FROM test.test_orc settings input_format_orc_allow_missing_columns = 1\G

返回列表

上一篇：clickhouse对接集群hdfs（二）

下一篇：trino组件对接hive（一）

InnoDB秘籍：MVCC机制与行锁的深度探索（1）

前言事务的起源可以追溯到 6000 年以前，当时苏美尔人（Sumerians）就发明了事务处理和记录的方法。已知最早的记录是写在土块上的，上面写了皇家的税收、土地、谷物、牲畜、奴隶和黄金，明确地记下了...

MySQL 在线开启 GTID

描述生产环境上也会遇到需要开启 GTID ，有什么风险？如何在线开启？本篇 SOP 将介绍。GTID 限制由于基于 GTID 复制依赖于事务，所有开启 GTID 时，有些 MySQL 特性不支持：事务...

flume性能调优

1.Source性能调优1.1 Spooldir Source使用Spooldir Source采集日志数据时，若每行日志数据<100bp，可以通过将多行合并传输来提升传输性能建议合并时根据数据...

虚拟机三种网络模式详解

在电脑里开一台虚拟机，是再常见不过的操作了。无论是用虚拟机玩只有旧版本系统能运行的游戏，还是用来学习Linux、跑跑应用程序都是很好的。而这其中，虚拟机网络是绝对绕不过去的。本篇文章通俗易懂的介绍了常...

基于commit命令创建docker镜像

创建docker容器```Plain Text sudo docker run -it centos:centos7 /bin/bash![https://teamo-md.oss-cn-shang...

CDH-集群节点下线

1、前期准备确认下线节点确认节点组件信息确认下线节点数据存储大小确定剩余节点存储大小如果下线节点数据存储大小大于剩余节点存储大小，则不能进行下线，可能存在数据丢失的情况2、操作首先确认待下线节点中是否...

clickhouse集群对接hive（三）

相关文章

InnoDB秘籍：MVCC机制与行锁的深度探索（1）

MySQL 在线开启 GTID

flume性能调优

虚拟机三种网络模式详解

基于commit命令创建docker镜像

CDH-集群节点下线

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号

clickhouse集群对接hive（三）

相关文章

InnoDB秘籍：MVCC机制与行锁的深度探索（1）

MySQL 在线开启 GTID

flume性能调优

虚拟机三种网络模式详解

基于commit命令创建docker镜像

CDH-集群节点下线

发表评论 取消回复

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?dcf8139ce75b768b71dccc5e589b983c"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号