Hive基本操作 (2)

pom文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 "> <modelVersion>4.0.0</modelVersion> <groupId>com.xiaojie.mm</groupId> <artifactId>my_hive</artifactId> <version>0.0.1-SNAPSHOT</version> <properties> <hadoop.version>2.6.5</hadoop.version> <hive.version>1.2.1</hive.version> </properties> <dependencies> <!-- Hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.5</version> </dependency> <!-- Hive --> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-metastore</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-pdk</artifactId> <version>0.10.0</version> </dependency> <dependency> <groupId>javax.jdo</groupId> <artifactId>jdo2-api</artifactId> <version>2.3-eb</version> </dependency> <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version>1.1.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.7</version> <scope>test</scope> </dependency> <dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> <version>1.7</version> <scope>system</scope> <systemPath>/home/miao/apps/install/jdk1.7.0_45/lib/tools.jar</systemPath> </dependency> </dependencies> </project>

自定义将大写转为小写的方法

package com.xiaojie.mm; import org.apache.hadoop.hive.ql.exec.UDF; public class ToLower extends UDF{ // 重载该方法 public String evaluate(String field) { return field.toLowerCase(); } }

导出jar包,并放到hive所在的机器上

scp tolower.jar mini1:/root/apps/

hive客户端添加自定义函数

#第一步 add JAR /root/apps/tolower.jar; #第二步 引号里是自定义方法的全名(临时方法,只在该回话窗口有效) create temporary function tolower as 'com.xiaojie.mm.ToLower'; #第三步使用 select * from a; +-------+---------+--+ | a.id | a.name | +-------+---------+--+ | 7 | AAAAA | | 1 | 张三 | | 2 | 李四 | | 3 | c | | 4 | a | | 5 | e | | 6 | r | +-------+---------+--+ select id,tolower(name) from a; +-----+--------+--+ | id | _c1 | +-----+--------+--+ | 7 | aaaaa | | 1 | 张三 | | 2 | 李四 | | 3 | c | | 4 | a | | 5 | e | | 6 | r | +-----+--------+--+

自定义获取手机归属地

package com.xiaojie.mm; import java.util.HashMap; import org.apache.hadoop.hive.ql.exec.UDF; public class GetProvince extends UDF{ public static HashMap<String,String> provinceMap = new HashMap<String,String>(); static { provinceMap.put("183", "hangzhou"); provinceMap.put("186", "nanjing"); provinceMap.put("187", "suzhou"); provinceMap.put("188", "ningbo"); } public String evaluate(int phonenumber) { String phone_num = String.valueOf(phonenumber); #取手机号码前三位 String phone = phone_num.substring(0, 3); return provinceMap.get(phone)==null?"未知":provinceMap.get(phone); } } 原数据: +----------------------+---------------------+--+ | flow_province.phone | flow_province.flow | +----------------------+---------------------+--+ | 1837878 | 12m | | 1868989 | 13m | | 1878989 | 14m | | 1889898 | 15m | | 1897867 | 16m | | 1832323 | 78m | | 1858767 | 88m | | 1862343 | 99m | | 1893454 | 77m | +----------------------+---------------------+--+ 调用自定义方法后: select phone,getpro(phone),flow from flow_province; +----------+-----------+-------+--+ | phone | _c1 | flow | +----------+-----------+-------+--+ | 1837878 | hangzhou | 12m | | 1868989 | nanjing | 13m | | 1878989 | suzhou | 14m | | 1889898 | ningbo | 15m | | 1897867 | 未知 | 16m | | 1832323 | hangzhou | 78m | | 1858767 | 未知 | 88m | | 1862343 | nanjing | 99m | | 1893454 | 未知 | 77m | +----------+-----------+-------+--+

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zywfpj.html