chimchim66

hive判断重复数据连续并分组

一、需求

二、测试案例

1.测试数据

2.实现步骤

1.判断同一班级进入班级的人是否连续

2.判断出连续的人同一班级同一人每个时间段的开始节点

3.将同一班级同一人每个时间段分组

4.取出同一班级同一人每个时间段的开始时间结束时间

5.按每个时间段按时间顺序拼接出id的值

6.每个时间段拼接好的结果

一、需求

想实现根据时间升序排序取出同班级下一个进入班级的时间，然后判断同一班级上一个人和下一个人是否连续,并生成符合分组条件的连续分组id。

（跟上一篇博文的区别是上一篇适合比较规范的数据，本篇数据质量不高，且数据有同一时间同一分组都重复且跳跃性连续的情况）

二、测试案例

1.测试数据

create table test_detail(
id   bigint comment '主键',
num  string comment '班级号码',
name string comment '名字',
start_timestamp bigint comment '进入班级时间',
end_timestamp   bigint comment '离开班级时间'
)comment '测试数据明细'
row format delimited fields terminated by '\t'
stored as textfile;



--同一班级同一时间戳有2位同学
insert into table test_detail values(1,'01','桑稚',1667516488000,1667516519035);
insert into table test_detail values(2,'01','桑稚',1667516519035,1667516529809);
insert into table test_detail values(3,'01','温以凡',1667516519035,1667516529809);
insert into table test_detail values(4,'01','桑稚',1667516529809,1667516533990);
insert into table test_detail values(5,'01','桑稚',1667516533990,1667516538492);

--同一同学连续进入班级时有2个时间段
insert into table test_detail values(6,'02','段嘉许',1667525190365,1667525196616);
insert into table test_detail values(7,'02','桑延',1667525190365,1667525196616);
insert into table test_detail values(8,'02','段嘉许',1667525196616,1667525203375);
insert into table test_detail values(9,'02','桑延',1667525203375,1667525207599);
insert into table test_detail values(10,'02','段嘉许',1667525207599,1667525224663);
insert into table test_detail values(11,'02','桑延',1667525224663,1667525229056);
insert into table test_detail values(12,'02','段嘉许',1667525224663,1667525229056);
insert into table test_detail values(13,'02','段嘉许',1667525229056,1667525232773);

2.实现步骤

1.判断同一班级进入班级的人是否连续

select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued' --开始时间等于上一条结束时间且名字等于上一条名字or结束时间等于下一条开始时间且
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
order by start_timestamp
;

2.判断出连续的人同一班级同一人每个时间段的开始节点

with is_continue as (
--判断出同一班级进入班级的人是否连续
select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued'
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
)

--判断出同一班级同一人每个时间段的开始节点
select 
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,case when lag(end_timestamp) over(partition by num,name order by start_timestamp) is null and 
               end_timestamp=lead(start_timestamp) over(partition by num,name order by start_timestamp) then 1
          when lag(end_timestamp) over(partition by num,name order by start_timestamp) is not null
               and start_timestamp<>lag(end_timestamp) over(partition by num,name order by start_timestamp) then 1 
          else 0
      end as start_point --同一班级同一人每个时间段的开始节点，标记为1
from is_continue
where is_continue='continued'  --连续
order by start_timestamp;

3.将同一班级同一人每个时间段分组

with is_continue as (
--判断出同一班级进入班级的人是否连续
select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued'
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
) ,
start_point as (
--判断出同一班级同一人每个时间段的开始节点
select 
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,case when lag(end_timestamp) over(partition by num,name order by start_timestamp) is null and 
               end_timestamp=lead(start_timestamp) over(partition by num,name order by start_timestamp) then 1
          when lag(end_timestamp) over(partition by num,name order by start_timestamp) is not null
               and start_timestamp<>lag(end_timestamp) over(partition by num,name order by start_timestamp) then 1 
          else 0
      end as start_point --同一班级同一人每个时间段的开始节点，标记为1
from is_continue
where is_continue='continued'  
)
--将同一班级同一人每个时间段分组
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,sum(start_point) over(partition by num,name order by start_timestamp,end_timestamp
       rows between unbounded preceding and current row ) as group_id --分组id
from start_point
order by start_timestamp;

4.取出同一班级同一人每个时间段的开始时间结束时间

with is_continue as (
--判断出同一班级进入班级的人是否连续
select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued'
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
),
start_point as (
--判断出同一班级同一人每个时间段的开始节点
select 
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,case when lag(end_timestamp) over(partition by num,name order by start_timestamp) is null and 
               end_timestamp=lead(start_timestamp) over(partition by num,name order by start_timestamp) then 1
          when lag(end_timestamp) over(partition by num,name order by start_timestamp) is not null
               and start_timestamp<>lag(end_timestamp) over(partition by num,name order by start_timestamp) then 1 
          else 0
      end as start_point --同一班级同一人每个时间段的开始节点，标记为1
from is_continue
where is_continue='continued'  
),
group_id as (
--将同一班级同一人每个时间段分组
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,sum(start_point) over(partition by num,name order by start_timestamp,end_timestamp
       rows between unbounded preceding and current row ) as group_id --分组id
from start_point
) 

select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,group_id        --分组id
    ,min(start_timestamp) over (partition by num,name,group_id) as speech_start --时间段开始时间
    ,max(end_timestamp) over (partition by num,name,group_id) as speech_end     --时间段结束时间
from group_id
order by start_timestamp
;

5.按每个时间段按时间顺序拼接出id的值

with is_continue as (
--判断出同一班级进入班级的人是否连续
select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued'
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
),
start_point as (
--判断出同一班级同一人每个时间段的开始节点
select 
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,case when lag(end_timestamp) over(partition by num,name order by start_timestamp) is null and 
               end_timestamp=lead(start_timestamp) over(partition by num,name order by start_timestamp) then 1
          when lag(end_timestamp) over(partition by num,name order by start_timestamp) is not null
               and start_timestamp<>lag(end_timestamp) over(partition by num,name order by start_timestamp) then 1 
          else 0
      end as start_point --同一班级同一人每个时间段的开始节点，标记为1
from is_continue
where is_continue='continued'  
),
group_id as (
--将同一班级同一人每个时间段分组
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,sum(start_point) over(partition by num,name order by start_timestamp,end_timestamp
       rows between unbounded preceding and current row ) as group_id --分组id
from start_point
),
min_max as (
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,group_id        --分组id
    ,min(start_timestamp) over (partition by num,name,group_id) as talk_start --时间段开始时间
    ,max(end_timestamp) over (partition by num,name,group_id)   as talk_end   --时间段结束时间
from group_id
) 

select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,talk_start      --时间段开始时间
    ,talk_end        --时间段结束时间
    ,concat_ws(' ',collect_set(cast(id as string)) over(partition by num,name,talk_start,talk_end order by start_timestamp asc)) as talk_ids 
from min_max
order by start_timestamp
;

6.每个时间段拼接好的结果

with is_continue as (
--判断出同一班级进入班级的人是否连续
select 
    id              --主键
   ,num             --班级号码
   ,name            --名字
   ,start_timestamp --进入班级时间
   ,end_timestamp   --离开班级时间
   --判断同一班级进入班级的人是否连续
   ,case when (start_timestamp=lag(end_timestamp) over(partition by num order by start_timestamp asc )
          and name=lag(name) over(partition by num order by start_timestamp asc )) or
          (end_timestamp=lead(start_timestamp) over (partition by num order by start_timestamp asc)
          and name=lead(name) over(partition by num order by start_timestamp asc )
          )
          then 'continued'
          when lag(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lag(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lag(name,1) over(partition by num order by start_timestamp asc )
          or name=lag(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          when lead(start_timestamp,1) over (partition by num order by start_timestamp asc)
          =lead(start_timestamp,2) over (partition by num order by start_timestamp asc)
          and (name=lead(name,1) over(partition by num order by start_timestamp asc )
          or name=lead(name,2) over(partition by num order by start_timestamp asc ))
          then 'continued'
          else 'discontinued' 
     end   as is_continue
from test_detail
),
start_point as (
--判断出同一班级同一人每个时间段的开始节点
select 
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,case when lag(end_timestamp) over(partition by num,name order by start_timestamp) is null and 
               end_timestamp=lead(start_timestamp) over(partition by num,name order by start_timestamp) then 1
          when lag(end_timestamp) over(partition by num,name order by start_timestamp) is not null
               and start_timestamp<>lag(end_timestamp) over(partition by num,name order by start_timestamp) then 1 
          else 0
      end as start_point --同一班级同一人每个时间段的开始节点，标记为1
from is_continue
where is_continue='continued'  
),
group_id as (
--将同一班级同一人每个时间段分组
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,sum(start_point) over(partition by num,name order by start_timestamp,end_timestamp
       rows between unbounded preceding and current row ) as group_id --分组id
from start_point
),
min_max as (
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,group_id        --分组id
    ,min(start_timestamp) over (partition by num,name,group_id) as talk_start --时间段开始时间
    ,max(end_timestamp) over (partition by num,name,group_id)   as talk_end   --时间段结束时间
from group_id
), 
talk_ids as (
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,talk_start      --时间段开始时间
    ,talk_end        --时间段结束时间
    ,concat_ws(' ',collect_set(cast(id as string)) over(partition by num,name,talk_start,talk_end order by start_timestamp asc)) as talk_ids 
from min_max
)
--每个时间段只取最后一条拼接好的数据
select
     id              --主键
    ,num             --班级号码
    ,name            --名字
    ,start_timestamp --进入班级时间
    ,end_timestamp   --离开班级时间
    ,talk_start      --时间段开始时间
    ,talk_end        --时间段结束时间
    ,talk_ids        --按时间段及时间升序拼接好的id
from talk_ids
where end_timestamp=talk_end
order by start_timestamp
;

mysql禁用远程登录 igotyback mysql
去mysql库中的user表里，将host都改成localhost之后刷新权限FLUSHPRIVILEGES;
SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
MYSQL面试系列-04 king01299 面试 mysql 面试
MYSQL面试系列-0417.关于redolog和binlog的刷盘机制、redolog、undolog作用、GTID是做什么的？innodb_flush_log_at_trx_commit及sync_binlog参数意义双117.1innodb_flush_log_at_trx_commit该变量定义了InnoDB在每次事务提交时，如何处理未刷入（flush）的重做日志信息（redolog）。它
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
SpringBlade dict-biz/list 接口 SQL 注入漏洞文章永久免费只为良心 oracle 数据库
SpringBladedict-biz/list接口SQL注入漏洞POC:构造请求包查看返回包你的网址/api/blade-system/dict-biz/list?updatexml(1,concat(0x7e,md5(1),0x7e),1)=1漏洞概述在SpringBlade框架中，如果dict-biz/list接口的后台处理逻辑没有正确地对用户输入进行过滤或参数化查询（PreparedSta
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
MongoDB知识概括 GeorgeLin98 持久层 mongodb
MongoDB知识概括MongoDB相关概念单机部署基本常用命令索引-IndexSpirngDataMongoDB集成副本集分片集群安全认证MongoDB相关概念业务应用场景：传统的关系型数据库（如MySQL），在数据操作的“三高”需求以及应对Web2.0的网站需求面前，显得力不从心。解释：“三高”需求：①Highperformance-对数据库高并发读写的需求。②HugeStorage-对海量数
JAVA·一个简单的登录窗口 MortalTom java 开发语言学习
文章目录概要整体架构流程技术名词解释技术细节资源概要JavaSwing是Java基础类库的一部分，主要用于开发图形用户界面（GUI）程序整体架构流程新建项目，导入sql.jar包（链接放在了文末），编译项目并运行技术名词解释一、特点丰富的组件提供了多种可视化组件，如按钮（JButton）、文本框（JTextField）、标签（JLabel）、下拉列表（JComboBox）等，可以满足不同的界面设计
入门MySQL——查询语法练习 K_un
前言：前面几篇文章为大家介绍了DML以及DDL语句的使用方法，本篇文章将主要讲述常用的查询语法。其实MySQL官网给出了多个示例数据库供大家实用查询，下面我们以最常用的员工示例数据库为准，详细介绍各自常用的查询语法。1.员工示例数据库导入官方文档员工示例数据库介绍及下载链接：https://dev.mysql.com/doc/employee/en/employees-installation.h
博客网站制作教程 2401_85194651 java maven
首先就是技术框架：后端：Java+SpringBoot数据库：MySQL前端：Vue.js数据库连接：JPA(JavaPersistenceAPI)1.项目结构blog-app/├──backend/│├──src/main/java/com/example/blogapp/││├──BlogApplication.java││├──config/│││└──DatabaseConfig.java
ubuntu安装wordpress lissettecarlr
1安装nginx网上安装方式很多，这就就直接用apt-get了apt-getinstallnginx不用启动啥，然后直接在浏览器里面输入IP:80就能看到nginx的主页了。如果修改了一些配置可以使用下列命令重启一下systemctlrestartnginx.service2安装mysql输入安装前也可以更新一下软件源，在安装过程中将会让你输入数据库的密码。sudoapt-getinstallmy
计算机毕业设计PHP仓储综合管理系统（源码+程序+VUE+lw+部署） java毕设程序源码王哥 php 课程设计 vue.js
该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程。欢迎交流项目运行环境配置：phpStudy+Vscode+Mysql5.7+HBuilderX+Navicat11+Vue+Express。项目技术：原生PHP++Vue等等组成，B/S模式+Vscode管理+前后端分离等等。环境需要1.运行环境：最好是小皮phpstudy最新版，我们在这个版本上开发的。其他版本理论上也可以。2.开发
MyBatis 详解阿贾克斯的黎明 java mybatis
目录目录一、MyBatis是什么二、为什么使用MyBatis（一）灵活性高（二）性能优化（三）易于维护三、怎么用MyBatis（一）添加依赖（二）配置MyBatis（三）创建实体类和接口（四）使用MyBatis一、MyBatis是什么MyBatis是一个优秀的持久层框架，它支持自定义SQL、存储过程以及高级映射。MyBatis免除了几乎所有的JDBC代码以及设置参数和获取结果集的工作。它可以通过简
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
You have an error in your SQL syntax； check the manual that corresponds to your MySQL server version 努力的菜鸟~ sql 数据库
YouhaveanerrorinyourSQLsyntax;checkthemanualthatcorrespondstoyourMySQLserverversionfortherightsyntaxtousenear‘IDENTIFIEDBY‘123456’WITHGRANTOPTION’atline1在mysql5.7之前GRANTALLPRIVILEGESON*.*TO'root'@'%'I
mysql学习教程，从入门到精通，TOP 和MySQL LIMIT 子句（15）知识分享小能手大数据数据库 MySQL mysql 学习 oracle 数据库开发语言 adb 大数据
1、TOP和MySQLLIMIT子句内容在SQL中，不同的数据库系统对于限制查询结果的数量有不同的实现方式。TOP关键字主要用于SQLServer和Access数据库中，而LIMIT子句则主要用于MySQL、PostgreSQL（通过LIMIT/OFFSET语法）、SQLite等数据库中。下面将分别详细介绍这两个功能的语法、语句以及案例。1.1、TOP子句（SQLServer和Access）1.1
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your †徐先森® Oracle数据库 Web相关错误集
createtablestudents(idintunsignedprimarykeyauto_increment,namevarchar(50)notnull,ageintunsigned,highdecimal(3,2),genderenum('男','女','中性','保密','妖')default'保密',cls_idintunsigned);在对数据库插入如上带有中文带有默认值的字段的时
鲲鹏 ARM 架构麒麟 Lylin v10 安装 Nginx (离线) 焚木灵 arm开发架构 nginx 服务器
最近做一个银行的项目，银行的服务器是鲲鹏ARM架构的服务器，并且是麒麟v10的系统，这里记录一下在无法访问外网安装Nginx的方法。其他文章：鲲鹏ARM架构麒麟Lylinv10安装Mysql8.3(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装Node和NVM(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装Pm2(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装P
【Golang】 Golang 的 GORM 库中的 Rows 函数不爱洗脚的小滕 golang 开发语言后端
文章目录前言一、Rows函数解释二、代码实现三、总结前言在使用Go语言进行数据库操作时，GORM（GoObject-RelationalMapping）库是一个常用的工具。它提供了一种简洁和强大的方式来处理数据库操作。本文将介绍GORM库中的Rows函数，这是一个用于执行原生SQL查询并返回结果的函数。一、Rows函数解释在GORM库中，Rows函数用于执行原生SQL查询并返回*sql.Rows结
Kubernetes部署MySQL数据持久化沫殇-MS Kubernetes MySQL数据库 kubernetes mysql 容器
一、安装配置NFS服务端1、安装nfs-kernel-server：sudoapt-yinstallnfs-kernel-server2、服务端创建共享目录#列出所有可用块设备的信息lsblk#格式化磁盘sudomkfs-text4/dev/sdb#创建一个目录：sudomkdir-p/data/nfs/mysql#更改目录权限：sudochown-Rnobody:nogroup/data/nfs
MySQL事务隔离级别和MVCC 简书徐小耳
MySQL事务隔离级别和MVCC参考：https://mp.weixin.qq.com/s/Jeg8656gGtkPteYWrG5_Nw1.MVCC只对读已提交和可重复的读有效果，而未提交读和串行则无意义。2.每条记录都会有trx_id(事务修改记录的id）和roll_pointer是一个指针指向旧版本的undo日志链表（row_id不是必必要的，如果有主键存在就不需要了）3.版本链的头结点就是记
Hadoop 傲雪凌霜，松柏长青后端大数据 hadoop 大数据分布式
ApacheHadoop是一个开源的分布式计算框架，主要用于处理海量数据集。它具有高度的可扩展性、容错性和高效的分布式存储与计算能力。Hadoop核心由四个主要模块组成，分别是HDFS（分布式文件系统）、MapReduce（分布式计算框架）、YARN（资源管理）和HadoopCommon（公共工具和库）。1.HDFS（HadoopDistributedFileSystem）HDFS是Hadoop生
【Death Note】网吧战神之7天爆肝渗透测试死亡笔记_sqlmap在默认情况下除了使用 char() 函数防止出现单引号 2401_84561374 程序员笔记
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！特殊服务端口2181zookeeper服务未授权访问
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
CentOS7 安装MySQL5.7.44 不要Null了 java centos mysql
1.下载mysql安装包，我放在百度网盘里(下方链接)链接：https://pan.baidu.com/s/1_Mn1XW_1mWdTV4mhnLG66A提取码：s31n2.首先看看以前是否安装过mysqlrpm-qa|grep-imysql如果已经安装过mysql会提示卸载mysqlrpm-emysql-…3.使用FinallShell或者Xftp进行上传放到/usr/local/mysql，没
非关系型数据库天秤-white nosql
一、为什么要用Nosql1.单机MySQL的时代。一个基本的网站访问量一般不会太大，单个数据库完全足够。那时候更多使用的静态网页html，服务器根本没有太大压力。这时候网站的瓶颈是什么？-数据量如果太大，一个机器放不下。-数据量太大需要建立数据的索引（B+Tree），一个服务器内存放不下。-访问量读写混合，一个服务器承受不了。2.memcached缓存+MySQL+垂直拆分（读写分离）。网站80%
六、全局锁和表锁：给表加个字段怎么有这么多阻碍 nieniemin
数据库锁设计的初衷是处理并发问题。作为多用户共享的资源，当出现并发访问的时候，数据库需要合理地控制资源的访问规则。而锁就是用来实现这些访问规则的重要数据结构。根据加锁的范围，MySQL里面的锁大致可以分成全局锁、表级锁和行锁三类。6.1全局锁全局锁就是对整个数据库实例加锁。MySQL提供了一个加全局读锁的方法，命令是Flushtableswithreadlock(FTWRL)。当你需要让整个库处于
[转载] NoSQL简介 weixin_30325793 大数据数据库运维
摘自“百度百科”。NoSQL，泛指非关系型的数据库。随着互联网web2.0网站的兴起，传统的关系数据库在应付web2.0网站，特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心，暴露了很多难以克服的问题，而非关系型的数据库则由于其本身的特点得到了非常迅速的发展。NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战，尤其是大数据应用难题。虽然NoSQL流行语
Kubernetes 自定义控制器开发 IT回忆录 Kubenetes kubernetes
目录前言一、CRD二、创建数据库表（Mysql）二、控制器开发1.使用kubernetes的examplecontroller模板2.在controller.go中新增数据表监听方法3.修改tools工具生成资源对象结构体定义这里记录开发k8s控制器的一般方式，controller开发主要使用k8s提供的client-go库进行。前言Controller监听集群内部资源对象的变化，编辑资源对象(增
jvm调优总结（从基本概念到深度优化） oloz java jvm jdk 虚拟机应用服务器
JVM参数详解：http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html Java虚拟机中，数据类型可以分为两类：基本类型和引用类型。基本类型的变量保存原始值，即：他代表的值就是数值本身；而引用类型的变量保存引用值。“引用值”代表了某个对象的引用，而不是对象本身，对象本身存放在这个引用值所表示的地址的位置。
【Scala十六】Scala核心十：柯里化函数 bit1129 scala
本篇文章重点说明什么是函数柯里化，这个语法现象的背后动机是什么，有什么样的应用场景，以及与部分应用函数(Partial Applied Function)之间的联系 1. 什么是柯里化函数 A way to write functions with multiple parameter lists. For instance def f(x: Int)(y: Int) is a
HashMap dalan_123 java
HashMap在java中对很多人来说都是熟的；基于hash表的map接口的非同步实现。允许使用null和null键；同时不能保证元素的顺序；也就是从来都不保证其中的元素的顺序恒久不变。 1、数据结构在java中，最基本的数据结构无外乎：数组和引用（指针），所有的数据结构都可以用这两个来构造，HashMap也不例外，归根到底HashMap就是一个链表散列的数据
Java Swing如何实时刷新JTextArea，以显示刚才加append的内容周凡杨 java 更新 swing JTextArea
在代码中执行完textArea.append("message")后，如果你想让这个更新立刻显示在界面上而不是等swing的主线程返回后刷新，我们一般会在该语句后调用textArea.invalidate()和textArea.repaint()。问题是这个方法并不能有任何效果，textArea的内容没有任何变化，这或许是swing的一个bug，有一个笨拙的办法可以实现
servlet或struts的Action处理ajax请求 g21121 servlet
其实处理ajax的请求非常简单，直接看代码就行了： //如果用的是struts //HttpServletResponse response = ServletActionContext.getResponse(); // 设置输出为文字流 response.setContentType("text/plain"); // 设置字符集 res
FineReport的公式编辑框的语法简介老A不折腾 finereport 公式总结
FINEREPORT用到公式的地方非常多，单元格（以=开头的便被解析为公式），条件显示，数据字典，报表填报属性值定义，图表标题，轴定义，页眉页脚，甚至单元格的其他属性中的鼠标悬浮提示内容都可以写公式。简单的说下自己感觉的公式要注意的几个地方： 1.if语句语法刚接触感觉比较奇怪，if(条件式子,值1,值2)，if可以嵌套，if(条件式子1，值1，if(条件式子2，值2，值3)
linux mysql 数据库乱码的解决办法墙头上一根草 linux mysql 数据库乱码
linux 上mysql数据库区分大小写的配置 lower_case_table_names=1 1-不区分大小写 0-区分大小写修改/etc/my.cnf 具体的修改内容如下: [client] default-character-set=utf8 [mysqld] datadir=/var/lib/mysql socket=/va
我的spring学习笔记6-ApplicationContext实例化的参数兼容思想 aijuans Spring 3
ApplicationContext能读取多个Bean定义文件，方法是： ApplicationContext appContext = new ClassPathXmlApplicationContext（ new String[]｛“bean-config1.xml”，“bean-config2.xml”，“bean-config3.xml”，“bean-config4.xml
mysql 基准测试之sysbench annan211 基准测试 mysql基准测试 MySQL测试 sysbench
1 执行如下命令，安装sysbench-0.5： tar xzvf sysbench-0.5.tar.gz cd sysbench-0.5 chmod +x autogen.sh ./autogen.sh ./configure --with-mysql --with-mysql-includes=/usr/local/mysql
sql的复杂查询使用案列与技巧百合不是茶 oracle sql 函数数据分页合并查询
本片博客使用的数据库表是oracle中的scott用户表; ------------------- 自然连接查询查询 smith 的上司(两种方法) &
深入学习Thread类 bijian1013 java thread 多线程 java多线程
一．线程的名字下面来看一下Thread类的name属性，它的类型是String。它其实就是线程的名字。在Thread类中，有String getName()和void setName(String)两个方法用来设置和获取这个属性的值。同时，Thr
JSON串转换成Map以及如何转换到对应的数据类型 bijian1013 java fastjson net.sf.json
在实际开发中，难免会碰到JSON串转换成Map的情况，下面来看看这方面的实例。另外，由于fastjson只支持JDK1.5及以上版本，因此在JDK1.4的项目中可以采用net.sf.json来处理。一.fastjson实例 JsonUtil.java package com.study; impor
【RPC框架HttpInvoker一】HttpInvoker：Spring自带RPC框架 bit1129 spring
HttpInvoker是Spring原生的RPC调用框架，HttpInvoker同Burlap和Hessian一样，提供了一致的服务Exporter以及客户端的服务代理工厂Bean，这篇文章主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中
【Mahout二】基于Mahout CBayes算法的20newsgroup的脚本分析 bit1129 Mahout
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information re
nginx三种获取用户真实ip的方法 ronin47
随着nginx的迅速崛起，越来越多公司将apache更换成nginx. 同时也越来越多人使用nginx作为负载均衡, 并且代理前面可能还加上了CDN加速，但是随之也遇到一个问题：nginx如何获取用户的真实IP地址,如果后端是apache,请跳转到<apache获取用户真实IP地址>，如果是后端真实服务器是nginx，那么继续往下看。实例环境：用户IP 120.22.11.11
java-判断二叉树是不是平衡 bylijinnan java
参考了 http://zhedahht.blog.163.com/blog/static/25411174201142733927831/ 但是用java来实现有一个问题。由于Java无法像C那样“传递参数的地址，函数返回时能得到参数的值”，唯有新建一个辅助类：AuxClass import ljn.help.*; public class BalancedBTree {
BeanUtils.copyProperties VS PropertyUtils.copyProperties 诸葛不亮 PropertyUtils BeanUtils
BeanUtils.copyProperties VS PropertyUtils.copyProperties 作为两个bean属性copy的工具类，他们被广泛使用，同时也很容易误用，给人造成困然；比如：昨天发现同事在使用BeanUtils.copyProperties copy有integer类型属性的bean时，没有考虑到会将null转换为0，而后面的业
[金融与信息安全]最简单的数据结构最安全 comsci 数据结构
现在最流行的数据库的数据存储文件都具有复杂的文件头格式，用操作系统的记事本软件是无法正常浏览的，这样的情况会有什么问题呢？从信息安全的角度来看，如果我们数据库系统仅仅把这种格式的数据文件做异地备份，如果相同版本的所有数据库管理系统都同时被攻击，那么
vi区段删除 Cwind linux vi 区段删除
区段删除是编辑和分析一些冗长的配置文件或日志文件时比较常用的操作。简记下vi区段删除要点备忘。 vi概述引文中并未将末行模式单独列为一种模式。单不单列并不重要，能区分命令模式与末行模式即可。 vi区段删除步骤： 1. 在末行模式下使用:set nu显示行号非必须，随光标移动vi右下角也会显示行号，能够正确找到并记录删除开始行
清除tomcat缓存的方法总结 dashuaifu tomcat 缓存
用tomcat容器，大家可能会发现这样的问题，修改jsp文件后，但用IE打开依然是以前的Jsp的页面。出现这种现象的原因主要是tomcat缓存的原因。解决办法如下: 在jsp文件头加上 <meta http-equiv="Expires" content="0"> <meta http-equiv="kiben&qu
不要盲目的在项目中使用LESS CSS dcj3sjt126com Web less
　如果你还不知道LESS CSS是什么东西，可以看一下这篇文章，是我一朋友写给新人看的《CSS——LESS》　　不可否认，LESS CSS是个强大的工具，它弥补了css没有变量、无法运算等一些“先天缺陷”，但它似乎给我一种错觉，就是为了功能而实现功能。　　比如它的引用功能 ? .rounded_corners{
[入门]更上一层楼 dcj3sjt126com PHP yii2
更上一层楼通篇阅读完整个“入门”部分，你就完成了一个完整 Yii 应用的创建。在此过程中你学到了如何实现一些常用功能，例如通过 HTML 表单从用户那获取数据，从数据库中获取数据并以分页形式显示。你还学到了如何通过 Gii 去自动生成代码。使用 Gii 生成代码把 Web 开发中多数繁杂的过程转化为仅仅填写几个表单就行。本章将介绍一些有助于更好使用 Yii 的资源：
Apache HttpClient使用详解 eksliang httpclient http协议
Http协议的重要性相信不用我多说了，HttpClient相比传统JDK自带的URLConnection，增加了易用性和灵活性（具体区别，日后我们再讨论），它不仅是客户端发送Http请求变得容易，而且也方便了开发人员测试接口（基于Http协议的），即提高了开发的效率，也方便提高代码的健壮性。因此熟练掌握HttpClient是很重要的必修内容，掌握HttpClient后，相信对于Http协议的了解会
zxing二维码扫描功能 gundumw100 android zxing
经常要用到二维码扫描功能现给出示例代码 import com.google.zxing.WriterException; import com.zxing.activity.CaptureActivity; import com.zxing.encoding.EncodingHandler; import android.app.Activity; import an
纯HTML+CSS带说明的黄色导航菜单 ini html Web html5 css hovertree
HoverTree带说明的CSS菜单:纯HTML+CSS结构链接带说明的黄色导航在线体验效果：http://hovertree.com/texiao/css/1.htm代码如下,保存到HTML文件可以看到效果： <!DOCTYPE html > <html > <head> <title>HoverTree
fastjson初始化对性能的影响 kane_xie fastjson 序列化
之前在项目中序列化是用thrift，性能一般，而且需要用编译器生成新的类，在序列化和反序列化的时候感觉很繁琐，因此想转到json阵营。对比了jackson，gson等框架之后，决定用fastjson，为什么呢，因为看名字感觉很快。。。网上的说法： fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器，来自阿里巴巴的工程师开发。
基于Mybatis封装的增删改查实现通用自动化sql mengqingyu DAO
1.基于map或javaBean的增删改查可实现不写dao接口和实现类以及xml，有效的提高开发速度。 2.支持自定义注解包括主键生成、列重复验证、列名、表名等 3.支持批量插入、批量更新、批量删除 <bean id="dynamicSqlSessionTemplate" class="com.mqy.mybatis.support.Dynamic
js控制input输入框的方法封装(数字，中文，字母，浮点数等) qifeifei javascript js
在项目开发的时候，经常有一些输入框，控制输入的格式，而不是等输入好了再去检查格式，格式错了就报错，体验不好。 /** 数字，中文，字母,浮点数(+/-/.) 类型输入限制，只要在input标签上加上 jInput="number,chinese,alphabet,floating" 备注：floating属性只能单独用*/ funct
java 计时器应用 tangqi609567707 java timer
mport java.util.TimerTask; import java.util.Calendar; public class MyTask extends TimerTask { private static final int
erlang输出调用栈信息 wudixiaotie erlang
在erlang otp的开发中，如果调用第三方的应用，会有有些错误会不打印栈信息，因为有可能第三方应用会catch然后输出自己的错误信息，所以对排查bug有很大的阻碍，这样就要求我们自己打印调用的栈信息。用这个函数：erlang:process_display (self (), backtrace).需要注意这个函数只会输出到标准错误输出。也可以用这个函数：erlang:get_s

hive判断重复数据连续并分组

一、需求

二、测试案例

1.测试数据

2.实现步骤

1.判断同一班级进入班级的人是否连续

2.判断出连续的人同一班级同一人每个时间段的开始节点

3.将同一班级同一人每个时间段分组

4.取出同一班级同一人每个时间段的开始时间结束时间

5.按每个时间段按时间顺序拼接出id的值

6.每个时间段拼接好的结果

你可能感兴趣的:(sql,hive,hadoop,数据仓库)