三.Hive数据分析与存储

 使用到的数据url:https://yunpan.cn/cPHQjv2zPtreC (提取码:fc50)
1.创建测试使用到的数据库,数据字段太多只统计ip
 
    
  1. hive> create table blog(ip STRING)
  2. > ROW FORMAT DELIMITED
  3. > FIELDS TERMINATED BY '\t'
  4. > STORED AS TEXTFILE;
  5. OK
  6. Time taken: 2.832 seconds

2.从HDFS导入数据到HIVE
 
    
  1. local data inpath 'HDFS路径' into table blog
本地导入
 
    
  1. local data local inpath '本地路径' into table blog
详情请跳转: http://blog.csdn.net/qq_26840065/article/details/51218637
3.分析数据并存储结果


    1. 分析结果存储到新的table表中
      
错误语句
 
   
  1. hive> create table hiveIpcount AS
  2. > SELECT blog.ip,COUNT(blog.ip) AS count FROM blog
  3. > GROUP BY blog.ip
  4. > ORDER BY count DESC;  
  5. BUG后修改语句
  6. hive> create table hivecount        > ROW FORMAT DELIMITED     > FIELDS TERMINATED BY ','      > STORED AS TEXTFILE     > AS     > SELECT blog.ip,COUNT(blog.ip) AS count FROM blog     > GROUP BY blog.ip     > ORDER BY count DESC;  
注:hive中可以创建带数据的table,所以将“blog”表条件查询出的结果直接导入到hiveIpcount表中


hive> desc hiveIpcount;
OK
ip                      string                  None                
count                   bigint                  None                
Time taken: 0.161 seconds, Fetched: 2 row(s)


可以看到分析结果的表已经创建好了,在HDFS中/user/hive/warehouse/ hiveipcount  表的目录存储的就是分析的结果元数据



Bug:
Error: java.io.IOException: Can't export data, please check failed map task logs
        at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
        at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: Can't parse input data: '183.60.214.2518'
        at test.__loadFromFields(test.java:249)
        at test.parse(test.java:192)
        at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
        ... 10 more
Caused by: java.util.NoSuchElementException
        at java.util.ArrayList$Itr.next(ArrayList.java:854)
        at test.__loadFromFields(test.java:244)
        ... 12 more



该Bug主要是由于在
hive> create table hiveIpcount AS            
    > SELECT blog.ip,COUNT(blog.ip) AS count FROM blog
    > GROUP BY blog.ip                                
    > ORDER BY count DESC;   
中没有设置分隔符导致在

HIVE中查看数据显示是空格,但是使用HDFS fs -cat 命令查看却是 使用“
三.Hive数据分析与存储_第1张图片

导致导入到MYSQL数据库中报错;



在使用HDFS fs -cat 命令查看元数据

三.Hive数据分析与存储_第2张图片

三.Hive数据分析与存储_第3张图片
三.Hive数据分析与存储_第4张图片
 
   

你可能感兴趣的:(apache,hive)