莺时

东边日出西边雨,道是无晴却有晴

Open Source, Open Mind,
Open Sight, Open Future!
  menu
18 文章
3692 浏览
0 当前访客
ღゝ◡╹)ノ❤️

Spark SQL 操作hive过程rename过程时间长

Spark SQL 操作hive过程rename过程时间长

情况简介

hive版本:1.2.1,spark版本:2.3.0

2亿数据去重spark 任务时间:12.5h(4h(去重)+2.5h(不知道spark在干嘛,driver端没有日志,executor也没有日志)+6h(Rname操作))

部分Rename日志。

 12019-09-19 22:34:22,097 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00002-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00002-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 22019-09-19 22:34:22,111 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00003-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00003-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 32019-09-19 22:34:22,128 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00004-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00004-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 42019-09-19 22:34:22,143 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00005-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00005-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 52019-09-19 22:34:22,160 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00006-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00006-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 62019-09-19 22:34:22,175 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00007-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00007-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 72019-09-19 22:34:22,192 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00008-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00008-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 82019-09-19 22:34:22,207 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00009-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00009-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
 92019-09-19 22:34:22,223 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00010-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00010-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
102019-09-19 22:34:22,238 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00011-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00011-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
112019-09-19 22:34:22,253 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00012-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00012-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
122019-09-19 22:34:22,267 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00013-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00013-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
132019-09-19 22:34:22,281 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00014-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00014-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
142019-09-19 22:34:22,296 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00015-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00015-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
152019-09-19 22:34:22,315 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00016-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00016-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
162019-09-19 22:34:22,331 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00017-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00017-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
172019-09-19 22:34:22,345 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00018-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00018-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true
182019-09-19 22:34:22,361 [Driver] INFO  hive.ql.metadata.Hive  - Renaming src: hdfs://cluster/apps/hive/warehouse/partitioned/.hive-staging_hive_2019-09-19_19-06-31_561_3933642985231072924-1/-ext-10000/date=2018-02-24/part-00019-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, dest: hdfs://cluster/apps/hive/warehouse/partitioned/date=2018-02-24/part-00019-50d4ca62-4853-46d4-a1c9-2c15544290f4.c000, Status:true

spark sql执行hive sql任务

  1. 会现在目标表中(1.21版本之后是默认位置目标表的文件夹)生成一个以。hive-staging开头的临时文件夹,结果会在临时文件夹存放
  2. 执行完成后会,将临时文件夹rename,放到对应的目标表文件下。

企业微信截图15689482919715.png

从代码中可以看出,有两种策略:如果源目录和目标目录是同一个根目录,则会源目录下的每个文件执行复制操作。反之,执行remane操作(只涉及namenode元数据,不会有额外数据操作)。

解决方案

修改hive-site.xml配置文件参数:

 1<property>
 2	<name>hive.exec.stagingdir</name>  
 3	<value>/tmp/hive/.hive-staging</value>
 4	<description>hive任务生成临时文件夹地址</description>
 5</property>
 6<property>        
 7	<name>hive.insert.into.multilevel.dirs</name>
 8	<value>true</value>
 9	<description>hive.insert.into.mulltilevel.dirs设置成false的时候,insert 目标目录的上级目录必须存在;trued的时候允许不存在</description>
10</property>

参考资料

  1. hadoop,hive中的mv(rename)操作
  2. hive添加完hive.exec.stagingdir参数,有的SQL报FileNotFoundException错

标题:Spark SQL 操作hive过程rename过程时间长
作者:ludengke95
地址:http://xvhi.ludengke95.xyz/articles/2019/09/20/1568951046771.html