Hive入门到剖析（一）

1Hive简介

专注于为中小企业提供成都网站建设、做网站服务,电脑端+手机端+微信端的三站合一,更高效的管理,为中小企业开鲁免费做网站提供优质的服务。我们立足成都，凝聚了一批互联网行业人才，有力地推动了成百上千家企业的稳健成长，帮助中小企业通过网站建设实现规模扩充和转变。

1.1 Hive定义

Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供类SQL查询功能。

本质是将SQL转换为MapReduce程序。

1.2为什么使用Hive

1、面临的问题

人员学习成本太高

项目周期要求太短

我只是需要一个简单的环境

MapReduce 如何搞定

复杂查询好难

Join如何实现

2、为什么要使用Hive

操作接口采用类SQL语法，提供快速开发的能力

避免了去写MapReduce，减少开发人员的学习成本

扩展功能很方便

1.3 Hive特点

1、可扩展

Hive可以自由的扩展集群的规模，一般情况下不需要重启服务

2、延展性

Hive支持用户自定义函数，用户可以根据自己的需求来实现自己的函数

3、容错

良好的容错性，节点出现问题SQL仍可完成执行

1.4 Hive与Hadoop的关系

Hive入门到剖析（一）

1.5 Hive与传统数据库的关系

Hive入门到剖析（一）

1.6 Hive的历史

由FaceBook 实现并开源

2011年3月，0.7.0版本发布，此版本为重大升级版本，增加了简单索引，HAING等众多高级特性

2011年06月，0.7.1 版本发布，修复了一些BUG，如在Windows上使用JDBC的的问题

2011年12月，0.8.0版本发布，此版本为重大升级版本，增加了insert into 、HA等众多高级特性

2012年2月5日，0.8.1版本发布，修复了一些BUG，如使 Hive 可以同时运行在 Hadoop0.20.x 与 0.23.0

2012年4月30日，0.9.0版本发布，重大改进版本，增加了对Hadoop 1.0.0的支持、实现BETWEEN等特性。

1.7 Hive的未来发展

增加更多类似传统数据库的功能，如存储过程

提高转换成的MapReduce性能

拥有真正的数据仓库的能力

UI部分加强

2软件准备与环境规划

2.1 Hadoop环境介绍

Hadoop安装路径：/home/test/Desktop/hadoop-1.0.0/

Hadoop元数据存放目录：/home/test/data/core/namenode

Hadoop数据存放路径：/home/test/data/core/datanode

Hive安装路径：/home/test/Desktop/

Hive数据存放路径：/user/hive/warehouse

Hive元数据

第三方数据库：derby MySQL

2.2软件准备

ubuntu

JDK

java 1.6.0_27

Hadoop

hadoop-1.0.0.tar

Hive

hive-0.8.1.tar

2.3项目结构

Hive入门到剖析（一）

2.4 Hive配置文件介绍

1、Hive配置文件介绍

hive-site.xml hive的配置文件

hive-env.sh hive的运行环境文件

hive-default.xml.template 默认模板

hive-env.sh.template hive-env.sh默认配置

hive-exec-log4j.properties.template exec默认配置

hive-log4j.properties.template log默认配置

2、hive-site.xml

< property>
  javax.jdo.option.ConnectionURLjdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
  JDBC connectstring for a JDBC metastore


javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
  Driver classname for a JDBC metastore


javax.jdo.option.ConnectionUserName
    root
   username touse against metastore database


javax.jdo.option.ConnectionPassword
   test
   password touse against metastore database

3、hive-env.sh

配置Hive的配置文件路径：export HIVE_CONF_DIR= your path

配置Hadoop的安装路径：HADOOP_HOME=your hadoop home

2.5使用Derby数据库的安装方式

1、什么是Derby安装方式

ApacheDerby是一个完全用java编写的数据库，所以可以跨平台，但需要在JVM中运行

Derby是一个Open source的产品，基于Apache License 2.0分发

即将元数据存储在Derby数据库中，也是Hive默认的安装方式。

2、安装Hive

解压Hive：tar zxvf hive-0.8.1.tar /home/test/Desktop

建立软连接：ln –s hive-0.8.1 hive

添加环境变量

export HIVE_HOME=/home/test/Desktop/hive
export PATH=….HIVE_HOME/bin:$PATH:.

3、配置Hive

进入hive/conf目录

依据hive-env.sh.template，创建hive-env.sh文件

cp  hive-env.sh.template hive-env.sh

修改hive-env.sh

指定hive配置文件的路径

export HIVE_CONF_DIR=/home/test/Desktop/hive/conf

指定Hadoop路径

HADOOP_HOME=/home/test/Desktop/hadoop

4、hive-site.xml


 javax.jdo.option.ConnectionURL
jdbc:derby:;databaseName=metastore_db;create=true
  JDBCconnect string for a JDBC metastore

 

 javax.jdo.option.ConnectionDriverName
 org.apache.derby.jdbc.EmbeddedDriver
 Driver class name for a JDBC metastore


 javax.jdo.option.ConnectionUserName
 APP
 username to use against metastoredatabase

 

 javax.jdo.option.ConnectionPassword
  mine
 password to use against metastoredatabase

5、启动hive

命令行键入

Hive

显示

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Pleaseuse org.apache.hadoop.log.metrics.EventCounter in all the log4j.propertiesfiles.
Logging initialized using configuration injar:file:/home/test/Desktop/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive historyfile=/tmp/test/hive_job_log_test_201208260529_167273830.txt
hive>

5、测试语句

建立测试表test

createtable test  (key string);
showtables;

2.6使用MySQL数据库的安装方式

1、安装MySQL

Ubuntu 采用apt-get安装

sudo apt-get install mysql-server

建立数据库hive

create database hive

创建hive用户,并授权

grant all on hive.* to hive@'%'  identified by 'hive';
flush privileges;

2、安装Hive

解压Hive：

tar zxvf  hive-0.8.1.tar  /home/test/Desktop

建立软连接：

ln –s hive-0.8.1 hive

添加环境变量

exportHIVE_HOME=/home/test/Desktop/hive
exportPATH=….HIVE_HOME/bin:$PATH:.

3、修改hive-site.xml


  javax.jdo.option.ConnectionURL 
  jdbc:mysql://localhost:3306/hive 

 

  javax.jdo.option.ConnectionDriverName 
  com.mysql.jdbc.Driver 


  javax.jdo.option.ConnectionPassword 
  hive 

 

  hive.hwi.listen.port 
  9999 
  This is the port the Hive Web Interface will listenon 


  datanucleus.autoCreateSchema 
  false 

 

  datanucleus.fixedDatastore 
  true 


        hive.metastore.local 
        true 
         controls whether toconnect to remove metastore server or open a new metastore server in HiveClient JVM

4、启动Hive

命令行键入：Hive

显示

WARNING: org.apache.hadoop.metrics.jvm.EventCounter isdeprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all thelog4j.properties files.
Logging initialized using configuration injar:file:/home/test/Desktop/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive historyfile=/tmp/test/hive_job_log_test_201208260529_167273830.txt
hive>

5、测试语句

建立测试表test

create table test (key string);
show tables;

3 Hive内建操作符与函数开发

3.1关系运算符

等值比较: =

不等值比较: <>

小于比较: <

小于等于比较: <=

大于比较: >

大于等于比较: >=

空值判断: IS NULL

非空判断: IS NOT NULL

LIKE比较: LIKE

JAVA的LIKE操作: RLIKE

REGEXP操作: REGEXP

等值比较: =

语法：A=B

操作类型：所有基本类型

描述: 如果表达式A与表达式B相等，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1=1;

不等值比较: <>

语法: A <> B

操作类型: 所有基本类型

描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A与表达式B不相等，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1 <> 2;

小于比较: <

语法: A < B

操作类型: 所有基本类型

描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于表达式B，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1 < 2;

小于等于比较: <=

语法: A <= B

操作类型: 所有基本类型

描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于或者等于表达式B，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1 <= 1;

大于等于比较: >=

语法: A >= B

操作类型: 所有基本类型

描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于或者等于表达式B，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1 >= 1;

空值判断: IS NULL

语法: A IS NULL

操作类型: 所有类型

描述: 如果表达式A的值为NULL，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where null is null;

非空判断: IS NOT NULL

语法: A IS NOT NULL

操作类型: 所有类型

描述: 如果表达式A的值为NULL，则为FALSE；否则为TRUE

举例：hive> select 1 from dual where 1 is not null;

LIKE比较: LIKE

语法: A LIKE B

操作类型: strings

描述: 如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B 的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。

举例：hive> select 1 from dual where ‘key' like 'foot%';

hive> select 1 from dual where ‘key ' like'foot____';

注意：否定比较时候用 NOT A LIKE B

hive> select 1 from dual where NOT ‘key ' like 'fff%';

JAVA的LIKE操作: RLIKE

语法: A RLIKE B

操作类型: strings

描述: 如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。

举例：hive> select 1 from dual where 'footbar’ rlike'^f.*r$’;

注意：判断一个字符串是否全为数字：

hive>select 1 from dual where '123456' rlike '^\\d+$';

hive> select 1 from dual where '123456aa' rlike'^\\d+$';

REGEXP操作: REGEXP

语法: A REGEXP B

操作类型: strings

描述: 功能与RLIKE相同

举例：hive> select 1 from dual where ‘key' REGEXP'^f.*r$';

3.2逻辑运算与数学运算

加法操作: +

减法操作: -

乘法操作: *

除法操作: /

取余操作: %

位与操作: &

位或操作: |

位异或操作: ^

位取反操作: ~

逻辑与操作: AND

逻辑或操作: OR

逻辑非操作: NOT

取整函数: round

指定精度取整函数: round

向下取整函数: floor

向上取整函数: ceil

向上取整函数: ceiling

取随机数函数: rand

自然指数函数: exp

以10为底对数函数: log10

以2为底对数函数: log2

对数函数: log

幂运算函数: pow

幂运算函数: power

开平方函数: sqrt

二进制函数: bin

十六进制函数: hex

反转十六进制函数: unhex

进制转换函数: conv

绝对值函数: abs

正取余函数: pmod

正弦函数: sin

反正弦函数: asin

余弦函数: cos

反余弦函数: acos

positive函数: positive

negative函数: negative

UNIX时间戳转日期函数: from_unixtime

获取当前UNIX时间戳函数: unix_timestamp

日期转UNIX时间戳函数: unix_timestamp

指定格式日期转UNIX时间戳函数: unix_timestamp

日期时间转日期函数: to_date

日期转年函数: year

日期转月函数: month

日期转天函数: day

日期转小时函数: hour

日期转分钟函数: minute

日期转秒函数: second

日期转周函数: weekofyear

日期比较函数: datediff

日期增加函数: date_add

日期减少函数: date_sub

If函数: if

非空查找函数: COALESCE

条件判断函数：CASE

字符串长度函数：length

字符串反转函数：reverse

字符串连接函数：concat

带分隔符字符串连接函数：concat_ws

字符串截取函数：substr,substring

字符串转大写函数：upper,ucase

字符串转小写函数：lower,lcase

去空格函数：trim

左边去空格函数：ltrim

右边去空格函数：rtrim

正则表达式替换函数：regexp_replace

正则表达式解析函数：regexp_extract

URL解析函数：parse_url

json解析函数：get_json_object

空格字符串函数：space

重复字符串函数：repeat

首字符ascii函数：ascii

左补足函数：lpad

右补足函数：rpad

分割字符串函数: split

集合查找函数: find_in_set

Map类型构建: map

Struct类型构建: struct

array类型构建: array

array类型访问: A[n]

map类型访问: M[key]

struct类型访问: S.x

Map类型长度函数: size(Map)

array类型长度函数: size(Array)

类型转换函数

1、加法操作: +

语法: A + B

操作类型：所有数值类型

说明：返回A与B相加的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int + int 一般结果为int类型，而int + double 一般结果为double类型

举例：hive> select 1 + 9 from dual; 10

2、减法操作: -

语法: A – B

操作类型：所有数值类型

说明：返回A与B相减的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int – int 一般结果为int类型，而int – double 一般结果为double类型

举例：hive> select 10 – 5 from dual;5

3、乘法操作 : *

语法: A * B

操作类型：所有数值类型

说明：返回A与B相乘的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。注意，如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型

举例：hive> select 40 * 5 from dual;200

4、除法操作 : /

语法: A / B

操作类型：所有数值类型

说明：返回A除以B的结果。结果的数值类型为double

举例：hive> select 40 / 5 from dual;8.0

注意： hive 中最高精度的数据类型是 double, 只精确到小数点后 16 位，在做除法运算的时候要特别注意：

hive>select ceil(28.0/6.99999999999999) from dual limit 1; 4

hive>select ceil(28.0/6.99999999999999) from dual limit 1; 5

5、取余操作 : %

语法: A % B

操作类型：所有数值类型

说明：返回A除以B的余数。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：hive> select 41 % 5 from dual; 1

hive> select 8.4 % 4 from dual; 0.40000000000000036

注意：精度在 hive 中是个很大的问题，类似这样的操作最好通过round 指定精度

hive> select round(8.4 % 4 , 2) from dual;0.4

6、位与操作 : &

语法: A & B

操作类型：所有数值类型

说明：返回A和B按位进行与操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：hive> select 4 & 8 from dual;0

hive> select 6 & 4 from dual;4

7、位或操作 : |

语法: A | B

操作类型：所有数值类型

说明：返回A和B按位进行或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：hive> select 4 | 8 from dual; 12

hive> select 6 | 8 from dual; 14

8、位异或操作 : ^

语法: A ^ B

操作类型：所有数值类型

说明：返回A和B按位进行异或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：hive> select 4 ^ 8 from dual; 12

hive> select 6 ^ 4 from dual; 2

9、位取反操作 : ~

语法: ~A

操作类型：所有数值类型

说明：返回A按位取反操作的结果。结果的数值类型等于A的类型。

举例：hive> select ~6 from dual; -7

hive> select ~4 from dual; -5

10、逻辑与操作 : AND

语法: A AND B

操作类型：boolean

说明：如果A和B均为TRUE，则为TRUE；否则为FALSE。如果A为NULL或B为NULL，则为NULL

举例：hive> select 1 from dual where 1=1 and 2=2; 1

11、逻辑或操作 : OR

语法: A OR B

操作类型：boolean

说明：如果A为TRUE，或者B为TRUE，或者A和B均为TRUE，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where 1=2 or 2=2; 1

12、逻辑非操作 : NOT

语法: NOT A

操作类型：boolean

说明：如果A为FALSE，或者A为NULL，则为TRUE；否则为FALSE

举例：hive> select 1 from dual where not 1=2;

13、取整函数 : round

语法: round(double a)

返回值: BIGINT

说明: 返回double类型的整数值部分（遵循四舍五入）

举例：hive> select round(3.1415926) from dual; 3

hive> select round(3.5) from dual; 4

hive> create table dual as select round(9542.158) fromdual;

hive> describe dual; _c0 bigint

14、指定精度取整函数 : round

语法: round(double a, int d)

返回值: DOUBLE

说明: 返回指定精度d的double类型

举例： hive> selectround(3.1415926,4) from dual; 3.1416

15、向下取整函数 : floor

语法: floor(double a)

返回值: BIGINT

说明: 返回等于或者小于该double变量的最大的整数

举例：hive> select floor(3.1415926) from dual; 3

hive> select floor(25) from dual; 25

16、向上取整函数 : ceil

语法: ceil(double a)

返回值: BIGINT

说明: 返回等于或者大于该double变量的最小的整数

举例：hive> select ceil(3.1415926) from dual; 4

hive> select ceil(46) from dual; 46

17、向上取整函数 : ceiling

语法: ceiling(double a)

返回值: BIGINT

说明: 与ceil功能相同

举例：hive> select ceiling(3.1415926) from dual; 4

hive> select ceiling(46) from dual; 46

18、取随机数函数 : rand

语法: rand(),rand(int seed)

返回值: double

说明: 返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列

举例：hive> select rand() from dual; 0.5577432776034763

19、自然指数函数 : exp

语法: exp(double a)

返回值: double

说明: 返回自然对数e的a次方

举例：hive> select exp(2) from dual; 7.38905609893065

20、自然对数函数: ln

语法: ln(double a)

返回值: double

说明: 返回a的自然对数

21、以 10 为底对数函数 :log10

语法: log10(double a)

返回值: double

说明: 返回以10为底的a的对数

举例：hive> select log10(100) from dual;2.0

22、以 2 为底对数函数 :log2

语法: log2(double a)

返回值: double

说明: 返回以2为底的a的对数

举例：hive> select log2(8) from dual; 3.0

23、对数函数 : log

语法: log(double base, double a)

返回值: double

说明: 返回以base为底的a的对数

举例：hive> select log(4,256) from dual; 4.0

24、幂运算函数 : pow

语法: pow(double a, double p)

返回值: double

说明: 返回a的p次幂

举例：hive> select pow(2,4) from dual; 16.0

25、开平方函数 : sqrt

语法: sqrt(double a)

返回值: double

说明: 返回a的平方根

举例：hive> select sqrt(16) from dual; 4.0

26、二进制函数 : bin

语法: bin(BIGINT a)

返回值: string

说明: 返回a的二进制代码表示

举例：hive> select bin(7) from dual; 111

27、十六进制函数 : hex

语法: hex(BIGINT a)

返回值: string

说明: 如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示

举例：hive> select hex(17) from dual; 11

hive> select hex(‘abc’) from dual; 616263

28、反转十六进制函数 : unhex

语法: unhex(string a)

返回值: string

说明: 返回该十六进制字符串所代码的字符串

举例： hive> selectunhex(‘616263’) from dual; abc

hive> select unhex(‘11’) from dual; -

hive> select unhex(616263) from dual; abc

29、进制转换函数 : conv

语法: conv(BIGINT num, int from_base, int to_base)

返回值: string

说明: 将数值num从from_base进制转化到to_base进制

举例：hive> select conv(17,10,16) from dual; 11

hive> select conv(17,10,2) from dual; 10001

30、绝对值函数 : abs

语法: abs(double a) abs(int a)

返回值: double int

说明: 返回数值a的绝对值

举例：hive> select abs(-3.9) from dual; 3.9

hive> select abs(10.9) from dual; 10.9

31、正取余函数 : pmod

语法: pmod(int a, int b),pmod(double a, double b)

返回值: int double

说明: 返回正的a除以b的余数

举例：hive> select pmod(9,4) from dual; 1

hive> select pmod(-9,4) from dual; 3

32、正弦函数 : sin

语法: sin(double a)

返回值: double

说明: 返回a的正弦值

举例：hive> select sin(0.8) from dual; 0.7173560908995228

33、反正弦函数 : asin

语法: asin(double a)

返回值: double

说明: 返回a的反正弦值

举例：hive> select asin(0.7173560908995228) from dual; 0.8

34、余弦函数 : cos

语法: cos(double a)

返回值: double

说明: 返回a的余弦值

举例：hive> select cos(0.9) from dual; 0.6216099682706644

35、反余弦函数 : acos

语法: acos(double a)

返回值: double

说明: 返回a的反余弦值

举例：hive> select acos(0.6216099682706644) from dual; 0.9

36、positive 函数 : positive

语法: positive(int a), positive(double a)

返回值: int double

说明: 返回a

举例： hive> selectpositive(-10) from dual; -10

hive> select positive(12) from dual; 12

37、negative 函数 : negative

语法: negative(int a), negative(double a)

返回值: int double

说明: 返回-a

举例：hive> select negative(-5) from dual; 5

hive> select negative(8) from dual; -8

38、UNIX 时间戳转日期函数 : from_unixtime

语法: from_unixtime(bigint unixtime[, string format])

返回值: string

说明: 转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式

举例：hive> select from_unixtime(1323308943,'yyyyMMdd')from dual; 20111208

39、获取当前 UNIX 时间戳函数 : unix_timestamp

语法: unix_timestamp()

返回值: bigint

说明: 获得当前时区的UNIX时间戳

举例：hive> select unix_timestamp() from dual; 1323309615

40、日期转 UNIX 时间戳函数 : unix_timestamp

语法: unix_timestamp(string date)

返回值: bigint

说明: 转换格式为"yyyy-MM-ddHH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。

举例：hive> select unix_timestamp('2011-12-07 13:01:03')from dual; 1323234063

41、指定格式日期转 UNIX 时间戳函数 :unix_timestamp

语法: unix_timestamp(string date, string pattern)

返回值: bigint

说明: 转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。

举例：hive> select unix_timestamp('2011120713:01:03','yyyyMMdd HH:mm:ss') from dual; 1323234063

42、日期时间转日期函数 : to_date

语法: to_date(string timestamp)

返回值: string

说明: 返回日期时间字段中的日期部分。

举例：hive> select to_date('2011-12-08 10:03:01') fromdual;

43、日期转年函数 : year

语法: year(string date)

返回值: int

说明: 返回日期中的年。

举例：hive> select year('2011-12-08 10:03:01') fromdual;2011

hive> select year('2012-12-08') from dual; 2012

44、日期转月函数 : month

语法: month (string date)

返回值: int

说明: 返回日期中的月份。

举例：hive> select month('2011-12-08 10:03:01') fromdual;12

hive> select month('2011-08-08') from dual; 8

45、日期转天函数 : day

语法: day (string date)

返回值: int

说明: 返回日期中的天。

举例：hive> select day('2011-12-08 10:03:01') from dual; 8

hive> select day('2011-12-24') from dual; 24

46、日期转小时函数 : hour

语法: hour (string date)

返回值: int

说明: 返回日期中的小时。

举例：hive> select hour('2011-12-08 10:03:01') fromdual;10

47、日期转分钟函数 : minute

语法: minute (string date)

返回值: int

说明: 返回日期中的分钟。

举例：hive> select minute('2011-12-08 10:03:01') fromdual; 3

48、日期转秒函数 : second

语法: second (string date)

返回值: int

说明: 返回日期中的秒。

举例：hive> select second('2011-12-08 10:03:01') fromdual; 1

49、日期转周函数 : weekofyear

语法: weekofyear (string date)

返回值: int

说明: 返回日期在当前的周数。

举例：hive> select weekofyear('2011-12-08 10:03:01') fromdual;49

50、日期比较函数 : datediff

语法: datediff(string enddate, string startdate)

返回值: int

说明: 返回结束日期减去开始日期的天数。

举例：hive> select datediff('2012-12-08','2012-05-09')from dual; 213

51、日期增加函数 : date_add

语法: date_add(string startdate, int days)

返回值: string

说明: 返回开始日期startdate增加days天后的日期。

举例：hive> select date_add('2012-12-08',10) from dual;

52、日期减少函数 : date_sub

语法: date_sub (string startdate, int days)

返回值: string

说明: 返回开始日期startdate减少days天后的日期。

举例：hive> select date_sub('2012-12-08',10) from dual;

53、If 函数 : if

语法: if(boolean testCondition, T valueTrue, TvalueFalseOrNull)

返回值: T

说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull

举例：hive> select if(1=2,100,200) from dual; 200

hive> select if(1=1,100,200) from dual；100

54、非空查找函数 : COALESCE

语法: COALESCE(T v1, T v2, …)

返回值: T

说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

举例：hive> select COALESCE(null,'100','50′)from dual; 100

55、条件判断函数： CASE

语法 : CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END

返回值 : T

说明：如果 a 等于 b ，那么返回 c ；如果 a 等于 d ，那么返回 e ；否则返回 f

举例：hive> Select case 100 when 50 then 'tom' when 100then 'mary' else 'tim' end from dual; mary

56、字符串长度函数： length

语法: length(string A)

返回值: int

说明：返回字符串A的长度

举例：hive> select length('abcedfg') from dual; 7

57、字符串反转函数： reverse

语法: reverse(string A)

返回值: string

说明：返回字符串A的反转结果

举例：hive> select reverse(abcedfg’) from dual; gfdecba

58、字符串连接函数： concat

语法: concat(string A, string B…)

返回值: string

说明：返回输入字符串连接后的结果，支持任意个输入字符串

举例：hive> select concat(‘abc’,'def’,'gh’) from dual;

abcdefgh

59、带分隔符字符串连接函数： concat_ws

语法: concat_ws(string SEP, string A, string B…)

返回值: string

说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符

举例：hive> select concat_ws(',','abc','def','gh') fromdual;

abc,def,gh

60、字符串截取函数： substr,substring

语法: substr(string A, int start),substring(string A, intstart)

返回值: string

说明：返回字符串A从start位置到结尾的字符串

举例：hive> select substr('abcde',3) from dual; cde

hive> select substring('abcde',3) from dual; cde

hive> selectsubstr('abcde',-1) from dual; e

61、字符串截取函数： substr,substring

语法: substr(string A, int start, int len),substring(stringA, int start, int len)

返回值: string

说明：返回字符串A从start位置开始，长度为len的字符串

举例：hive> select substr('abcde',3,2) from dual; cd

hive> select substring('abcde',3,2) from dual; cd

hive>select substring('abcde',-2,2) from dual; de

62、字符串转大写函数： upper,ucase

语法: upper(string A) ucase(string A)

返回值: string

说明：返回字符串A的大写格式

举例：hive> select upper('abSEd') from dual; ABSED

hive> select ucase('abSEd') from dual; ABSED

63、字符串转小写函数： lower,lcase

语法: lower(string A) lcase(string A)

返回值: string

说明：返回字符串A的小写格式

举例：hive> select lower('abSEd') from dual; absed

hive> select lcase('abSEd') from dual; absed

64、去空格函数： trim

语法: trim(string A)

返回值: string

说明：去除字符串两边的空格

举例：hive> select trim(' abc ') from dual; abc

65、左边去空格函数： ltrim

语法: ltrim(string A)

返回值: string

说明：去除字符串左边的空格

举例：hive> select ltrim(' abc ') from dual; abc

64、右边去空格函数： rtrim

语法: rtrim(string A)

返回值: string

说明：去除字符串右边的空格

举例：hive> select rtrim(' abc ') from dual; abc

65、正则表达式替换函数： regexp_replace

语法: regexp_replace(string A, string B, string C)

返回值: string

说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。

举例：hive> select regexp_replace('foobar', 'oo|ar', '')from dual; fb

66、正则表达式解析函数： regexp_extract

语法: regexp_extract(string subject, string pattern, intindex)

返回值: string

说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。

举例：hive> select regexp_extract('foothebar','foo(.*?)(bar)', 1) from dual; the

hive> select regexp_extract('foothebar','foo(.*?)(bar)', 2) from dual; bar

hive> select regexp_extract('foothebar','foo(.*?)(bar)', 0) from dual; foothebar

注意，在有些情况下要使用转义字符，下面的等号要用双竖线转义，这是 java 正则表达式的规则。

select data_field,

regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1)as aaa,

regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1)as bbb,

regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1)as ccc

from pt_nginx_loginlog_st

where pt = '2012-03-26' limit 2;

67、URL 解析函数： parse_url

语法: parse_url(string urlString, string partToExtract [,string keyToExtract])

返回值: string

说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

举例：

hive>selectparse_url('http://facebook.com/path2/p.php?k1=v1&k2=v2#Ref1', 'HOST') fromdual; facebook.com

hive> selectparse_url('http://facebook.com/path2/p.php?k1=v1&k2=v2#Ref1', 'QUERY','k1') from dual; v1

68、json 解析函数： get_json_object

语法: get_json_object(string json_string, string path)

返回值: string

说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

举例：

hive> select get_json_object('{"store":

> {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],

> "bicycle":{"price":19.95,"color":"red"}

> },

> "email":"amy@only_for_json_udf_test.net",

> "owner":"amy"

> }

> ','$.owner') from dual;

amy

69、空格字符串函数： space

语法: space(int n)

返回值: string

说明：返回长度为n的字符串

举例：

hive> select space(10) from dual;

hive> select length(space(10)) from dual; 10

70、重复字符串函数： repeat

语法: repeat(string str, int n)

返回值: string

说明：返回重复n次后的str字符串

举例：hive> select repeat('abc',5) from dual;abcabcabcabcabc

71、首字符 ascii 函数： ascii

语法: ascii(string str)

返回值: int

说明：返回字符串str第一个字符的ascii码

举例：hive> select ascii('abcde') from dual; 97

72、左补足函数： lpad

语法: lpad(string str, int len, string pad)

返回值: string

说明：将str进行用pad进行左补足到len位

举例：hive> select lpad('abc',10,'td') from dual;tdtdtdtabc

注意：与 GP ， ORACLE 不同， pad 不能默认

73、右补足函数： rpad

语法: rpad(string str, int len, string pad)

返回值: string

说明：将str进行用pad进行右补足到len位

举例：hive> select rpad('abc',10,'td') from dual;abctdtdtdt

74、分割字符串函数 : split

语法: split(stringstr, string pat)

返回值: array

说明: 按照pat字符串分割str，会返回分割后的字符串数组

举例：

hive> select split('abtcdtef','t') from dual;

["ab","cd","ef"]

75、集合查找函数 : find_in_set

语法: find_in_set(string str, string strList)

返回值: int

说明: 返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0

举例：hive> select find_in_set('ab','ef,ab,de') fromdual;2

hive> select find_in_set('at','ef,ab,de') from dual;0

76、集合统计函数

语法: count(*), count(expr), count(DISTINCT expr[, expr_.])

返回值: int

说明: count(*)统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCTexpr[, expr_.])返回指定字段的不同的非空值的个数

举例：hive> select count(*) from dual; 20

hive> select count(distinct t) from dual； 10

77、总和统计函数 : sum

语法: sum(col), sum(DISTINCT col)

返回值: double

说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果

举例：hive> select sum(t) from dual; 100

hive> select sum(distinct t) from dual; 70

78、平均值统计函数 : avg

语法: avg(col), avg(DISTINCT col)

返回值: double

说明: avg(col)统计结果集中col的平均值；avg(DISTINCT col)统计结果中col不同值相加的平均值

举例：hive> select avg(t) from dual; 50

hive> select avg (distinct t) from dual; 30

79、最小值统计函数 : min

语法: min(col)

返回值: double

说明: 统计结果集中col字段的最小值

举例：hive> select min(t) from dual; 20

80、最大值统计函数 : max

语法: maxcol)

返回值: double

说明: 统计结果集中col字段的最大值

举例：hive> select max(t) from dual; 120

81、Map 类型构建 : map

语法: map (key1, value1, key2, value2, …)

说明：根据输入的key和value对构建map类型

举例：

hive> Create table alex_testas select map('100','tom','200','mary') as t from dual;

hive> describe alex_test;

t map

hive> select t from alex_test;

{"100":"tom","200":"mary"}

82、Struct 类型构建 : struct

语法: struct(val1, val2, val3, …)

说明：根据输入的参数构建结构体struct类型

举例：

hive> create table alex_test as selectstruct('tom','mary','tim') as t from dual;

hive> describe alex_test;

t struct

hive> select t from alex_test;

{"col1":"tom","col2":"mary","col3":"tim"}

83、array 类型构建 : array

语法: array(val1, val2, …)

说明：根据输入的参数构建数组array类型

举例：

hive> create table alex_test as selectarray("tom","mary","tim") as t from dual;

hive> describe alex_test;

当前题目：Hive入门到剖析（一）
链接分享：http://cqcxhl.com/article/igphij.html

重庆分公司，新征程启航

Hive入门到剖析（一）

其他资讯