xrzs

Hive & Performance 学习笔记

注：本文来源于 Hortonworks 的 Adam Muise 在 July 23 2013 日的 Toronto Hadoop User Group 大会上的一次演讲，

本文只是稍作增删、整理，以备忘。

原文请见：http://www.slideshare.net/adammuise/2013-jul-23thughivetuningdeepdive

1、Hive – SQL 分析任意大小的数据集

2、Hive 关注的焦点

• Scalable SQL processing over data in Hadoop

• Scales to 100PB+

• Structured and Unstructured data

3、hive 与传统的关系型数据库比较

Hive	RDBMS
SQL Interface.	SQL Interface.
Focus on analytics.	May focus on online or analytics.
No transac1ons.	Transac1ons usually supported.
Partition adds, no random INSERTs. In-Place updates not na1vely supported (but are possible).	Random INSERT and UPDATE supported.
Distributed processing via map/reduce.	Distributed processing varies by vendor (if available).
Scales to hundreds of nodes.	Seldom scale beyond 20 nodes.
Built for commodity hardware.	OQen built on proprietary hardware (especially when scaling out).
Low cost per petabyte.	What’s a petabyte? （ ←_← 作者又调皮了 ‾◡◝）
注：文中某些地方由于 foxit 和 adobe 的bug，ti 会显示成 1，	如表格的第 5 行，na1vely 应是 natively，其实第 4 行的 transac1ons 也是，据说不影响阅读，就懒得修复了 ◠‿◠

4、Hive: 一个基于 hadoop 的 SQL 接口

5、SQL 覆盖范围: SQL 92 with Extensions

6、Hive 中的数据抽象

7、Join：“I heard you should avoid joins…”

• “Joins are evil” – Cal Henderson
– Joins should be avoided in online systems.
• Joins are unavoidable in analytics.
– Making joins fast is the key design point.

8、Hive 中的 Join 策略

8.1 reduce-side join：Shuffle Joins in Map Reduce

8.2 map-side join：Broadcast Join

•  Star schemas use dimension tables small enough to fit in RAM.
•  Small tables held in memory by all nodes.
•  Single pass through the large table.
•  Used for star-schema type joins common in DW.

8.3 SMB join：When both are too large for memory

Observa1on 1:
Sor1ng by the join key makes joins easy.
All possible matches reside in the same area on disk.
Observa1on 2:
Hash bucke1ng a join key ensures all matching values reside on the same node.
Equi-joins can then run with no shuffle.

注：在 mapreduce 中，几种常见的 join 方式以及示例代码：

http://my.oschina.net/leejun2005/blog/82523

http://my.oschina.net/leejun2005/blog/111963

http://my.oschina.net/leejun2005/blog/95186

9、控制 hive 中的数据位置

• Bucketing:
– Hash partition values into a configurable number of buckets.
– Usually coupled with sorting.
• Skews:
– Split values out into separate files.
– Used when certain values are frequently seen.
• Replication Factor:
– Increase replication factor to accelerate reads.
– Controlled at the HDFS layer.
• Sorting:
– Sort the values within given columns.
– Greatly accelerates query when used with ORCFilefilter
pushdown.

注：hive 本地化 mr，请参考：
http://superlxw1234.iteye.com/blog/1703546

10、hive 数据架构指南

11、Hive 数据持久化的几种格式

• Built-in Formats:
– ORCFile
– RCFile
– Avro
– Delimited Text
– Regular Expression
– S3 Logfile
– Typed Bytes
• 3rd
-Party Addons:
– JSON
– XML

PS：Hive allows mixed format.

• Use Case:
– Ingest data in a write-optimized format like JSON or delimited.
– Every night, run a batch job to convert to read-optimized ORCFile.

12、ORCFile– Efficient Columnar Layout

12.1 ORCFile Advantages

• High Compression
– Many tricks used out-of-the-box to ensure high compression rates.
– RLE, dictionary encoding, etc.
• High Performance
– Inline indexes record value ranges within blocks of ORCFiledata.
– Filter pushdown allows efficient scanning during precise queries.
• Flexible Data Model
– All Hive types including maps, structsand unions.

12.2 High Compression with ORCFile

12.3 Some ORCFile Samples

CREATE TABLE sale ( 
id int, timestamp timestamp, 
productsk int, storesk int, 
amount decimal, state string 
) STORED AS orc;

12.4 ORCFile Options and Defaults

12.5 No Compression: Faster but Larger

CREATE TABLE sale ( 
id int, timestamp timestamp, 
productsk int, storesk int, 
amount decimal, state string 
) STORED AS orc tblproperties ("orc.compress"="NONE");

12.6 Column Sorting to Facilitate Skipping

CREATE TABLE sale ( 
id int, timestamp timestamp, 
productsk int, storesk int, 
amount decimal, state string 
) STORED AS orc; 
INSERT INTO sale AS SELECT * FROM staging SORT BY productsk; 
ORCFile skipping speeds queries like 
WHERE productsk = X, productsk IN (Y, Z);

12.7 索引：Not Your Traditional Database

• Traditional solution to all RDBMS problems:
– Put an index on it!

• Doing this in Hadoop == #fail

索引可以加快GROUP BY查询语句的执行速度。
Hive从0.80开始，提供了一个Bitmap位图索引，它主要适用于在一个给定的列中只有几个值的场景。详情见：

http://flyingdutchman.iteye.com/blog/1869876

http://www.gemini5201314.net/big-data/hadoop-%E4%B8%AD%E7%9A%84%E6%95%B0%E6%8D%AE%E5%80%BE%E6%96%9C.html

13、Going Fast in Hadoop

• Hadoop:
– Really good at coordinated sequential scans.
– No random I/O. Traditional index pretty much useless.
• Keys to speed in Hadoop:
– Sorting and skipping take the place of indexing.
– Minimizing data shuffle the other key consideration.
• Skipping data:
– Divide data among different files which can be pruned out.
– Partitions, buckets and skews.
– Skip records during scans using small embedded indexes.
– Automatic when you use ORCFileformat.
– Sort data ahead of time.
– Simplifies joins and skipping becomes more effective.

13.1 Data Layout Considerations for Fast Hive

13.2 Partitioning and Virtual Columns

• Partitioning makes queries go fast.
• You will almost always use some sort of partitioning.
• When partitioning you will use 1 or more virtual
columns.

# Notice how xdate and state are not “real” column names.

CREATE TABLE sale ( 
id int, amount decimal, ... 
) partitioned by (xdate string, state string);

• Virtual columns cause directories to be created in
HDFS.
– Files for that partition are stored within that subdirectory.

列裁剪、分区裁剪请参考：
http://my.oschina.net/leejun2005/blog/82529
http://my.oschina.net/leejun2005/blog/82065

13.3 Loading Data with Virtual Columns

• By default at least one virtual column must be hardcoded.

INSERT INTO sale (xdate=‘2013-03-01’, state=‘CA’) 
SELECT * FROM staging_table
WHERE xdate = ‘2013-03-01’ AND state = ‘CA’;

• You can load all partitions in one shot:
– set hive.exec.dynamic.partition.mode=nonstrict;
– Warning: You can easily overwhelm your cluster this way.

set hive.exec.dynamic.partition.mode=nonstrict; 
INSERT INTO sale (xdate, state) 
SELECT * FROM staging_table;

13.4 You May Need to Re-Order Columns

• Virtual columns must be last within the inserted data set.
• You can use the SELECT statement to re-order.

INSERT INTO sale (xdate, state=‘CA’) 
SELECT 
id, amount, other_stuff, 
xdate, state 
FROM staging_table
WHERE state = ‘CA’;

13.5 Tune Split Size – Always

• mapred.max.split.size and mapred.min.split.size
• Hive processes data in chunks subject to these bounds.
• min too large -> Too few mappers.
• max too small -> Too many mappers.
• Tune variables un6l mappers occupy:
– All map slots if you own the cluster.
– Reasonable number of map slots if you don’t.
• Example:
– set mapred.max.split.size=100000000;
– set mapred.min.split.size=1000000;
• Manual today, automa6c in future version of Hive.
• You will need to set these for most queries.

注：控制hive任务中的map数和reduce数
http://superlxw1234.iteye.com/blog/1582880

13.6 Tune io.sort.mb – Sometimes

• Hive and Map/Reduce maintain some separate buffers.
• If Hive maps need lots of local memory you may need to
shrink map/reduce buffers.
• If your maps spill, try it out.
• Example:
– set io.sort.mb=100;

13.7 Other Settings You Need

• All the 6me:
– set hive.op1mize.mapjoin.mapreduce=true;
– set hive.op1mize.bucketmapjoin=true;
– set hive.op1mize.bucketmapjoin.sortedmerge=true;
– set hive.auto.convert.join=true;
– set hive.auto.convert.sortmerge.join=true;
– set hive.auto.convert.sortmerge.join.nocondi1onaltask=true;
• When bucke6ng data:
– set hive.enforce.bucke1ng=true;
– set hive.enforce.sor1ng=true;
• These and more are set by default in HDP 1.3.
– Check for them in hive-site.xml
– If not present, set them in your query script

• 防止 group by 数据倾斜

– hive.groupby.skewindata=true

• 增加reduce 的jvm内存，或者进行一些参数调优，如：
mapred.child.java.opts -Xmx 1024m

13.8 Check Your Settings

• In Hive shell:

14、流程示例

14.1 Define optimized table

CREATE TABLE fact_pos
(
txnid STRING,
txntime STRING,
givenname STRING,
lastname STRING,
postalcode STRING,
storeid STRING,
ind1 STRING,
productid STRING,
purchaseamount FLOAT,
creditcard STRING
) PARTITIONED BY (part_dt STRING)
CLUSTERED BY (txnid)
SORTED BY (txnid)
INTO 24 BUCKETS
STORED AS ORC tblproperties("orc.compress"="SNAPPY");

The part_dtfield is defined in the partition by clause and cannot be the same name as any other
fields. In this case, we will be performing a modification of txntimeto generate a partition key. The
cluster and sorted clauses contain the only key we intend to join the table on. We have stored as
ORCFilewith Snappy compression.

14.2 Load Data Into Optimized Table

set hive.enforce.sorting=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict; 
set mapreduce.reduce.input.limit=-1;

FROM pos_staging
INSERT OVERWRITE TABLE fact_pos
PARTITION (part_dt)
SELECT
txnid,
txntime,
givenname,
lastname,
postalcode,
storeid,
ind1,
productid,
purchaseamount,
creditcard,
concat(year(txntime),month(txntime)) as part_dt
SORT BY productid;

We use this commend to load data from our staging table into our
optimized ORCFileformat. Note that we are using dynamic partitioning with
the projection of the txntimefield. This results in a MapReduce job that will
copy the staging data into ORCFileformat Hive managed table.

14.3 Increase replication factor

hadoop fs-setrep-R –w 5 /apps/hive/warehouse/fact_pos

Increase the replication factor for the high performance table.
This increases the chance for data locality. In this case, the
increase in replication factor is not for additional resiliency.
This is a trade-off of storage for performance.
In fact, to conserve space, you may choose to reduce the
replication factor for older data sets or even delete them
altogether. With the raw data in place and untouched, you can
always recreate the ORCFilehigh performance tables. Most
users place the steps in this example workflow into an Oozie
job to automate the work.

14.4 Enabling Short Circuit Read

In hdfs-site.xml(or your custom Ambari settings for HDFS,
restart service after):

dfs.block.local-path-access.user=hdfs
dfs.client.read.shortcircuit=true
dfs.client.read.shortcircuit.skip.checksum=false

Short Circuit reads allow the mappers to bypass the overhead of opening a
port to the datanodeif the data is local. The permissions for the local
block files need to permit hdfsto read them (should be by default already)
See HDFS-2246 for more details.

14.5 Execute your query

set hive.mapred.reduce.tasks.speculative.execution=false;
set io.sort.mb=300;
set mapreduce.reduce.input.limit=-1;

select productid, ROUND(SUM(purchaseamount),2) as total 
from fact_pos 
where part_dtbetween ‘201210’ and ‘201212’
group by productid 
order by total desc 
limit 100;

…
OK
205353026.87
390792959.69
289702869.87
455942821.15
…
156492242.05
477042241.22
81402238.61
Time taken: 40.087 seconds, Fetched: 100 row(s)
In the case above, we have a simple query executed to test out our table. We have some
example parameters set before our query. The good news is that most of the parameters
regarding join and engine optimizations are already set for you in Hive 0.11 (HDP). The
io.sort.mbis presented as an example of one of the tunable parameters you may want to
change for this particular SQL (note this value assumes 2-3GB JVMs for mappers). We are
also partition pruning for the holiday shopping season, Oct to Dec.

15、学会查看执行计划

• “explain extended” in front of your query.
• Sections:
– Abstract syntax tree – you can usually ignore this.
– Stage dependencies – dependencies and # of stages.
– Stage plans – important info on how Hive is running the job.

16、Hive Fast Query Checklist

•Par11oned data along natural query boundaries (e.g. date).
•Minimized data shuffle by co-loca1ng the most commonly joined data.
•Took advantage of skews for high-frequency values.
•Enabled short-circuit read.
•Used ORCFile.
•Sorted columns to facilitate row skipping for common targeted queries.
•Verified query plan to ensure single scan through largest table.
•Checked the query plan to ensure par11on pruning is happening.
•Used at least one ON clause in every JOIN.

一些调优方式请参考：深入学习《Programing Hive》：Tuning

http://flyingdutchman.iteye.com/blog/1871983

17、For Even More Hive Performance

•Increased replica1on factor for frequently accessed data and dimensions.
•Tuned io.sort.mb to avoid spilling.
•Tuned mapred.max.split.size, mapred.min.split.size to ensure 1 mapper wave.
•Tuned mapred.reduce.tasks to an appropriate value based on map output.
•Checked jobtracker to ensure “row container” spilling does not occur.
•Gave extra memory for mapjoins like broadcast joins.
•Disabled orc.compress (file size will increase) and tuned •orc.row.index.stride.
•Ensured the job ran in a single wave of mappers.

18、Loading Data in Hive

• Sqoop
– Data transfer from external RDBMS to Hive.
– Sqoop can load data directly to/from HCatalog.
• Hive LOAD
– Load files from HDFS or local filesystem.
– Format must agree with table format.
• Insert from query
– CREATE TABLE AS SELECT or INSERT INTO.
• WebHDFS+ WebHCat
– Load data via REST APIs.

19、Handling Semi-Structured Data

• Hive supports arrays, maps, structsand unions.
• SerDesmap JSON, XML and other formats natively
into Hive.

20、Hive Authorization

• Hive provides Users, Groups, Roles and Privileges
• Granular permissions on tables, DDL and DML
operations.
• Not designed for high security:
– On non-kerberizedcluster, up to the client to supply their user
name.
– Suitable for preventing accidental data loss.

21、HiveServer2

• HiveServer2 is a gateway / JDBC / ODBC endpoint Hive clients can talk to.
• Supports secure and non-secure clusters.
• DoAssupport allows Hive query to run as the requester.
• (Coming Soon) LDAP authentication.

22、Roadmap to 100x Faster

22.1 Phase 1 Improvements

Path to Making Hive 100x Faster

(1)Join Optimizations

• Performance Improvements in Hive 0.11:
• New Join Types added or improved in Hive 0.11:
– In-memory Hash Join: Fast for fact-to-dimension joins.
– Sort-Merge-Bucket Join: Scalable for large-table to large-table
joins.
• More Efficient Query Plan Generation
– Joins done in-memory when possible, saving map-reduce steps.
– Combine map/reduce jobs when GROUP BY and ORDER BY use
the same key.
• More Than 30x Performance Improvement for Star
Schema Join

(2)Star Schema Join Improvements in 0.11

23、更多请参考：

http://my.oschina.net/leejun2005/blog/140462#OSC_h3_2

浅谈SQL on Hadoop系统

http://kan.weibo.com/con/3564924249512540

摘要：

强烈推荐此文，从大数据查询处理的本质分析了当前的SQL on Hadoop系统。

想起了之前关于数据库研究者＂MapReduce是历史倒退＂的争论！

数据库技术四十多年的发展，其对数据处理和优化的精髓如高级索引，

物化视图，基于代价的优化，各种嵌套查询已经有很深入的研究和经验了~

SQL on Hadoop系统的最新进展(1)、（2）

http://yanbohappy.sinaapp.com/?p=381

http://yanbohappy.sinaapp.com/?p=407

【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
语文主题教学学习笔记之87 东哥杂谈
“语文主题教学”学习笔记之八十七（0125）今天继续学习小学语文主题教学的实践样态。板块三：教学中体现“书艺”味道。作为四大名著之一的《水浒传》，堪称我国文学宝库之经典。对从《水浒传》中摘选的单元，教师就要了解其原生态，即评书体特点。这也要求教师要了解一些常用的评书行话术语，然后在教学时适时地加入一些，让学生体味其文本中原有的特色。学生也要尽可能地通过朗读的方式，而不单是分析讲解的方式进行学习。细
MongoDB知识概括 GeorgeLin98 持久层 mongodb
MongoDB知识概括MongoDB相关概念单机部署基本常用命令索引-IndexSpirngDataMongoDB集成副本集分片集群安全认证MongoDB相关概念业务应用场景：传统的关系型数据库（如MySQL），在数据操作的“三高”需求以及应对Web2.0的网站需求面前，显得力不从心。解释：“三高”需求：①Highperformance-对数据库高并发读写的需求。②HugeStorage-对海量数
《转介绍方法论》学习笔记小可乐的妈妈
一、高效转介绍的流程：价值观---执行----方案一）转介绍发生的背景：1、对象：谁向谁转介绍？全员营销，人人参与。①员工的激励政策、客户的转介绍诱因制作客户画像：a信任；支付能力；意愿度；便利度（根据家长具备四个特征的个数分为四类）B性格分类C职业分类D年龄性别②执行：套路，策略，方法，流程2、诱因：为什么要转介绍？认同信任；多方共赢；传递美好；零风险承诺打动人心，超越期待。选择做教育，就是选择
JAVA学习笔记之23种设计模式学习 victorfreedom Java技术设计模式 android java 常用设计模式
博主最近买了《设计模式》这本书来学习，无奈这本书是以C++语言为基础进行说明，整个学习流程下来效率不是很高，虽然有的设计模式通俗易懂，但感觉还是没有充分的掌握了所有的设计模式。于是博主百度了一番，发现有大神写过了这方面的问题，于是博主迅速拿来学习。一、设计模式的分类总体来说设计模式分为三大类：创建型模式，共五种：工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式。结构型模式，共七种：适配器
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
新能源汽车 BMS 学习笔记篇—BMS 基本定义及分类 WPG大大通其他笔记汽车 BMS 经验分享新能源电池
一、BMS定义1、概念：BMS（BatteryManagementSystem）即电池管理系统，其管理对象是二次电池（充电电池或蓄电池），其主要目的是电池的利用率，防止电池出现过度充电和过度放电，可应用于电动汽车、电瓶车、机器人、无人机等图片来源：腾讯网https://new.qq.com《标准普尔警告，电动汽车电池生产面临供应链和地缘政治风险》2、四大功能①感知和测量：检测电池的电压、电流、温度
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
golang学习笔记--MPG模型 xxzed golang #学习笔记学习笔记 golang
MPG模式：M（Machine）：操作系统的主线程P（Processor）：协程执行需要的资源（上下文context），可以看作一个局部的调度器，使go代码在一个线程上跑，他是实现从N：1到N：M映射的关键G（Goroutine）：协程，有自己的栈。包含指令指针（instructionpointer）和其它信息（正在等待的channel等等），用于调度。一个P下面可以有多个G1、当前程序有三个M,
碎片化学习笔记分享剑客写作
现在生活节奏很快，学习力成为了我们拥有的最大财富。碎片化学习是最好的。首先，不要太过自信，学会虚心学习，是我们面对现实的好方法，才能够常保新鲜。平时我们要拥有什么工具呢？1.思维导图2.写在印象笔记里3.听书，消燥耳机4.教学输出5.录音笔里面最好的方式就是教学输出法，记忆里最好。当输出时我们集中精力记忆里最好。有人认为缩短睡眠时间来学习，其实最好的方式是保持最好的睡眠，记忆力会更好。剥夺睡眠，会
Presto【基础 01】简介+架构+数据源+数据模型 2401_84254343 程序员架构
一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
《随园诗话》学习笔记三百零六飞鸿雪舞
卷五凡诗之传者，都在灵性五、五斗米与诗【原文】丁丑，余觅一抄书人，或荐黄生，名之纪，号星岩者，人甚朴野。偶过其案头，得句云；“破庵僧卖临街瓦，独井人争向晚泉。”余大奇之，即饷米五斗。自此欣然大用力于诗。五言句云：“云开日脚直，雨落水纹圆。竹锐穿泥壁，蝇酣落酒尊。钓久知鱼性，樵多识树名。笔残芦并用，墨尽指同磨。＂七言云：＂小窗近水寒偏觉，古木遮天曙不知。旧生萍处泥犹绿，新落花时水亦香。旧甓恐闲都贮水
D15 论语学习笔记许小兔Angelina
悟：上级对下级的宽容：凡事成定局，就不你说了；已接近完结的事，也没必要匡正和挽回了；既然是过去的事，也没必要追究得失和责任了。对待孩子教育也是，不用“问责制”，这样容易让孩子因为害怕担责而说谎。应当循循善诱，避免再犯错才是最重要的。3.16：【原文】子曰：“射不主皮，为力不同科，古之道也。”【译文】孔子说：“射箭比赛不以射透为主，而主要看是否射得准确，因为人的力量不同，自古如此。”3.17：【原文
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
pnpm解說白总Server 服务器 kubernetes 网络运维云原生 python java
pnpm（PerformanceNodePackageManager）是一个高性能的Node.js包管理器，它旨在解决npm和yarn在处理依赖关系时可能遇到的一些问题，如重复安装相同版本的包、包的存储空间占用过大等。pnpm使用了一种称为“硬链接”和“符号链接”的文件系统技术，这使得它能够以更高效的方式存储和管理依赖项。关键特点：高效存储：pnpm使用一种称为内容可寻址存储（ContentAdd
网络工程师学习笔记（一）专业白嫖怪网络工程师学习笔记学习笔记网络
为了备战下半年的软考——网络工程师，利用每天的下班的闲暇时间看书听课，然后自己手敲整理的系列资料。希望能够对你们有所帮助第一章__计算机网络概述计算机网络的定义：将分散的具有独立运算功能的计算机系统，通过通信线路和通信设备进行连接起来的实现资源的共享。ARPAnet网络的特征：资源共享、分散控制、分组交换1946年第一台通用计算机—埃尼亚克能够相互连通进行数据交换。1960年提出巨型网络，出现了对
K8S学习笔记02——K8S组件沉淅尘 #Docker #K8S kubernetes
Kubernetes组件一、控制平面组件（ControlPlaneComponents）(1)kube-apiserver(2)etcd(3)kube-scheduler(4)kube-controller-manager(5)cloud-controller-manager二、Node组件1.kubelet2.kube-proxy3.容器运行时（ContainerRuntime）三、插件（Add
「Python」2020.04.08学习笔记 | 第六章文件（a+）模式+把随机手机号写入文件小练习 Yetta的书影屋
学习测试开发的Day97，真棒！学习时间为40M第九次全天课(下午视频二20M-50M）>>>fp.seek(0)0>>>fp.read()'你好11你好12你好13你好14你好15\n你好16\n你好17\n你好18\n'>>>fp.seek(0,0)0>>>fp.write("*********************************\n")34>>>fp.seek(0,0)0>>>f
《金文成〈中庸〉学习笔记401。2020-2-24》金吾生
《金文成〈中庸〉学习笔记401。2020-2-24》今天是庚子年戊寅月丁酉日，二月初二，2020年2月24日星期一。二月二龙抬头。第二十二章【唯天下至诚，为能尽其性；能尽其性，则能尽人之性；能尽人之性，则能尽物之性；能尽物之性，则能赞天地之化育；能赞天地之化育，则可以与天地参矣。】上一节，船山讲解说，性作为天用之本体，于圣人和匹夫匹妇而言并无二致，区别来自于诚。诚的区别来自于纯粹与掺杂。掺杂什么呢
CDGA学习笔记三-《数据安全》 zy_chris 网络安全
七、数据安全7.1引言数据安全包括安全策略和过程的规划、建立与执行，为数据和信息资产提供正确的身份验证、授权、访问和审计。要求来自以下方面：（1）利益相关方（2）政府法规（3）特定业务关注点（4）合法访问需求（5）合同义务7.1.1业务驱动因素1、降低风险信息安全首先对组织数据进行分级分类，对组织数据进行分类分级的整个流程：1）识别敏感数据资产并分类分级2）在企业中查找敏感数据3）确定保护每项资产
vue学习笔记——关于对Vue3 ref(), toRef(), toRefs(), unref(), isRef(), reactive()方法的理解。 chen_sir_sh vue学习笔记 javascript 前端 vue
VUE3出现了很多新的API，下面是自己的一些理解进行的总结。欢迎大家一起交流补充。ref()使用ref创建一个数据类型，ref有value这个属性constname1={age:"14",name:"bob1"};constname2=ref({name:"bob2"});//使用ref创建一个数据类型相对于reactive，ref有value属性name2.value="bob3"consol
大数据之flink与hive 星辰_mya 大数据 flink hive
其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
遇到僵尸进程，怎么处理---学习笔记 summer@彤妈性能优化 linux
僵尸进程解释当iowait升高时，进程很可能因为得不到硬件的响应，而长时间处于不可中断状态。从ps或者top命令的输出中，你可以发现它们都处于D状态，也就是不可中断状态（UninterruptibleSleep）。既然说到了进程的状态，进程有哪些状态你还记得吗？我们先来回顾一下。top和ps是最常用的查看进程状态的工具，我们就从top的输出开始。下面是一个top命令输出的示例，S列（也就是Stat
C++学习笔记----6、内存管理（五）---- 智能指针（3）王俊山IT c++学习笔记开发语言
2、shared_ptr有时候吧，有些对象或者一部分代码需要同一个指针的拷贝。那么unique_ptr不能被拷贝，因此就不能用于些场景。这样的话，std::shared_ptr就是一个支持能够被拷贝的拥有共享属主的智能指针。但是，如果有指向同一个资源的多个shared_ptr实例，那么怎么知道什么时候去释放资源呢？这可以通过对于引用记数来解决，这个我们以后再聊。首先，让我们看一下怎么构造与使用sh
hive血缘关系之输入表与目标表的解析 zxfBdd hive 大数据治理大数据
接了一个新需求：需要做数据仓库的血缘关系。正所谓兵来将挡水来土掩，那咱就动手吧。血缘关系是数据治理的一块，其实有专门的第三方数据治理框架，但考虑到目前的线上环境已经趋于稳定，引入新的框架无疑是劳民伤财，伤筋动骨，所以就想以最小的代价把这个事情给做了。目前我们考虑做的血缘关系呢只是做输入表和输出表，最后会形成一张表与表之间的链路图。这个东西的好处就是有助于仓库人员梳理业务，后面可能还会做字段之间的血
【学习笔记】武志红心理学—潜意识决定命运万万千千
冰山一角什么构成了我们的命运？命运是由我们的显意识和潜意识来决定的。我们可以用一张图做一个比喻。看过“冰山一角”图片的都知道，潜意识就是水面以下的部分，显意识是水面以上的部分，从体积来看，潜意识占了大部分，而显意识只是冰山一角，纵向来看，庞大的潜意识支撑着冰山一角的显意识，才得以让冰山漂浮在水面。延伸到我们的人生，我们对自己显意识层面的想法很容易感知到，所以我们会说这是“我”自己做的选择。而潜意识
Prism 教程 yang_B621 Prism IOC
http://t.csdnimg.cn/VXSSvhttps://blog.csdn.net/u010476739/article/details/119341731Prism-随笔分类-Hello——寻梦者！-博客园(cnblogs.com)C#IoC学习笔记-缥缈的尘埃-博客园(cnblogs.com)WPF_SchuylerEX的博客-CSDN博客
iOS http封装 374016526 ios 服务器交互 http 网络请求
程序开发避免不了与服务器的交互，这里打包了一个自己写的http交互库。希望可以帮到大家。内置一个basehttp，当我们创建自己的service可以继承实现。 KuroAppBaseHttp *baseHttp = [[KuroAppBaseHttp alloc] init]; [baseHttp setDelegate:self]; [baseHttp
lolcat ：一个在 Linux 终端中输出彩虹特效的命令行工具 brotherlamp linux linux教程 linux视频 linux自学 linux资料
那些相信 Linux 命令行是单调无聊且没有任何乐趣的人们，你们错了，这里有一些有关 Linux 的文章，它们展示着 Linux 是如何的有趣和“淘气” 。在本文中，我将讨论一个名为“lolcat”的小工具 – 它可以在终端中生成彩虹般的颜色。何为 lolcat ? Lolcat 是一个针对 Linux，BSD 和 OSX 平台的工具，它类似于 cat 命令，并为 cat
MongoDB索引管理（1）——[九] eksliang mongodb MongoDB管理索引
转载请出自出处：http://eksliang.iteye.com/blog/2178427 一、概述数据库的索引与书籍的索引类似，有了索引就不需要翻转整本书。数据库的索引跟这个原理一样，首先在索引中找，在索引中找到条目以后，就可以直接跳转到目标文档的位置，从而使查询速度提高几个数据量级。不使用索引的查询称
Informatica参数及变量 18289753290 Informatica 参数变量
下面是本人通俗的理解，如有不对之处，希望指正 info参数的设置：在info中用到的参数都在server的专门的配置文件中（最好以parma）结尾下面的GLOBAl就是全局的，$开头的是系统级变量，$$开头的变量是自定义变量。如果是在session中或者mapping中用到的变量就是局部变量，那就把global换成对应的session或者mapping名字。 [GLOBAL] $Par
python 解析unicode字符串为utf8编码字符串酷的飞上天空 unicode
php返回的json字符串如果包含中文，则会被转换成\uxx格式的unicode编码字符串返回。在浏览器中能正常识别这种编码，但是后台程序却不能识别，直接输出显示的是\uxx的字符，并未进行转码。转换方式如下 >>> import json >>> q = '{"text":"\u4
Hibernate的总结永夜-极光 Hibernate
1.hibernate的作用,简化对数据库的编码,使开发人员不必再与复杂的sql语句打交道做项目大部分都需要用JAVA来链接数据库，比如你要做一个会员注册的页面，那么获取到用户填写的基本信后，你要把这些基本信息存入数据库对应的表中，不用hibernate还有mybatis之类的框架，都不用的话就得用JDBC，也就是JAVA自己的，用这个东西你要写很多的代码，比如保存注册信
SyntaxError: Non-UTF-8 code starting with '\xc4' 随便小屋 python
刚开始看一下Python语言，传说听强大的，但我感觉还是没Java强吧！写Hello World的时候就遇到一个问题，在Eclipse中写的，代码如下 ''' Created on 2014年10月27日 @author: Logic ''' print("Hello World!"); 运行结果 SyntaxError: Non-UTF-8
学会敬酒礼仪不做酒席菜鸟 aijuans 菜鸟
俗话说，酒是越喝越厚，但在酒桌上也有很多学问讲究，以下总结了一些酒桌上的你不得不注意的小细节。细节一：领导相互喝完才轮到自己敬酒。敬酒一定要站起来，双手举杯。细节二：可以多人敬一人，决不可一人敬多人，除非你是领导。细节三：自己敬别人，如果不碰杯，自己喝多少可视乎情况而定，比如对方酒量，对方喝酒态度，切不可比对方喝得少，要知道是自己敬人。细节四：自己敬别人，如果碰杯，一
《创新者的基因》读书笔记 aoyouzi 读书笔记《创新者的基因》
创新者的基因创新者的“基因”，即最具创意的企业家具备的五种“发现技能”：联想，观察，实验，发问，建立人脉。第一部分破坏性创新，从你开始第一章破坏性创新者的基因如何获得启示：发现以下的因素起到了催化剂的作用：(1) -个挑战现状的问题；(2)对某项技术、某个公司或顾客的观察；(3) -次尝试新鲜事物的经验或实验；(4)与某人进行了一次交谈，为他点醒
表单验证技术百合不是茶 JavaScript DOM对象 String对象事件
js最主要的功能就是验证表单,下面是我对表单验证的一些理解,贴出来与大家交流交流 ,数显我们要知道表单验证需要的技术点, String对象,事件,函数一:String对象;通常是对字符串的操作; 1,String的属性; 字符串.length;表示该字符串的长度; var str= "java"
web.xml配置详解之context-param bijian1013 java servlet web.xml context-param
一.格式定义： <context-param> <param-name>contextConfigLocation</param-name> <param-value>contextConfigLocationValue></param-value> </context-param> 作用：该元
Web系统常见编码漏洞（开发工程师知晓） Bill_chen sql PHP Web fckeditor 脚本
1.头号大敌：SQL Injection 原因：程序中对用户输入检查不严格，用户可以提交一段数据库查询代码，根据程序返回的结果，获得某些他想得知的数据，这就是所谓的SQL Injection，即SQL注入。本质: 对于输入检查不充分，导致SQL语句将用户提交的非法数据当作语句的一部分来执行。示例： String query = "SELECT id FROM users
【MongoDB学习笔记六】MongoDB修改器 bit1129 mongodb
本文首先介绍下MongoDB的基本的增删改查操作，然后，详细介绍MongoDB提供的修改器，以完成各种各样的文档更新操作 MongoDB的主要操作 show dbs 显示当前用户能看到哪些数据库 use foobar 将数据库切换到foobar show collections 显示当前数据库有哪些集合 db.people.update，update不带参数，可
提高职业素养，做好人生规划白糖_ 人生
培训讲师是成都著名的企业培训讲师，他在讲课中提出的一些观点很新颖，在此我收录了一些分享一下。注：讲师的观点不代表本人的观点，这些东西大家自己揣摩。 1、什么是职业规划：职业规划并不完全代表你到什么阶段要当什么官要拿多少钱，这些都只是梦想。职业规划是清楚的认识自己现在缺什么，这个阶段该学习什么，下个阶段缺什么，又应该怎么去规划学习，这样才算是规划。
国外的网站你都到哪边看？ bozch 技术网站国外
学习软件开发技术，如果没有什么英文基础，最好还是看国内的一些技术网站，例如：开源OSchina，csdn，iteye,51cto等等。个人感觉如果英语基础能力不错的话，可以浏览国外的网站来进行软件技术基础的学习，例如java开发中常用的到的网站有apache.org 里面有apache的很多Projects,springframework.org是spring相关的项目网站,还有几个感觉不错的
编程之美-光影切割问题 bylijinnan 编程之美
package a; public class DisorderCount { /**《编程之美》“光影切割问题” * 主要是两个问题： * 1.数学公式（设定没有三条以上的直线交于同一点）： * 两条直线最多一个交点，将平面分成了4个区域； * 三条直线最多三个交点，将平面分成了7个区域； * 可以推出：N条直线 M个交点，区域数为N+M+1。
关于Web跨站执行脚本概念 chenbowen00 Web 安全跨站执行脚本
跨站脚本攻击(XSS)是web应用程序中最危险和最常见的安全漏洞之一。安全研究人员发现这个漏洞在最受欢迎的网站,包括谷歌、Facebook、亚马逊、PayPal,和许多其他网站。如果你看看bug赏金计划,大多数报告的问题属于 XSS。为了防止跨站脚本攻击,浏览器也有自己的过滤器,但安全研究人员总是想方设法绕过这些过滤器。这个漏洞是通常用于执行cookie窃取、恶意软件传播,会话劫持,恶意重定向。在
[开源项目与投资]投资开源项目之前需要统计该项目已有的用户数 comsci 开源项目
现在国内和国外,特别是美国那边,突然出现很多开源项目,但是这些项目的用户有多少,有多少忠诚的粉丝,对于投资者来讲,完全是一个未知数,那么要投资开源项目,我们投资者必须准确无误的知道该项目的全部情况,包括项目发起人的情况,项目的维持时间..项目的技术水平,项目的参与者的势力,项目投入产出的效益.....
oracle alert log file（告警日志文件） daizj oracle 告警日志文件 alert log file
The alert log is a chronological log of messages and errors, and includes the following items: All internal errors (ORA-00600), block corruption errors (ORA-01578), and deadlock errors (ORA-00060)
关于 CAS SSO 文章声明 denger SSO
由于几年前写了几篇 CAS 系列的文章，之后陆续有人参照文章去实现，可都遇到了各种问题，同时经常或多或少的收到不少人的求助。现在这时特此说明几点： 1. 那些文章发表于好几年前了，CAS 已经更新几个很多版本了，由于近年已经没有做该领域方面的事情，所有文章也没有持续更新。 2. 文章只是提供思路，尽管 CAS 版本已经发生变化，但原理和流程仍然一致。最重要的是明白原理，然后
初二上学期难记单词 dcj3sjt126com english word
lesson 课 traffic 交通 matter 要紧；事物 happy 快乐的，幸福的 second 第二的 idea 主意；想法；意见 mean 意味着 important 重要的，重大的 never 从来，决不 afraid 害怕的 fifth 第五的 hometown 故乡，家乡 discuss 讨论；议论 east 东方的 agree 同意；赞成 bo
uicollectionview 纯代码布局, 添加头部视图 dcj3sjt126com Collection
#import <UIKit/UIKit.h> @interface myHeadView : UICollectionReusableView { UILabel *TitleLable; } -(void)setTextTitle; @end #import "myHeadView.h" @implementation m
N 位随机数字串的 JAVA 生成实现 FX夜归人 java Math 随机数 Random
/** * 功能描述随机数工具类<br /> * @author FengXueYeGuiRen * 创建时间 2014-7-25<br /> */ public class RandomUtil { // 随机数生成器 private static java.util.Random random = new java.util.R
Ehcache（09）——缓存Web页面 234390216 ehcache 页面缓存
页面缓存目录 1 SimplePageCachingFilter 1.1 calculateKey 1.2 可配置的初始化参数 1.2.1 cach
spring中少用的注解@primary解析 jackyrong primary
这次看下spring中少见的注解@primary注解，例子 @Component public class MetalSinger implements Singer{ @Override public String sing(String lyrics) { return "I am singing with DIO voice
Java几款性能分析工具的对比 lbwahoo java
Java几款性能分析工具的对比摘自：http://my.oschina.net/liux/blog/51800 在给客户的应用程序维护的过程中，我注意到在高负载下的一些性能问题。理论上，增加对应用程序的负载会使性能等比率的下降。然而，我认为性能下降的比率远远高于负载的增加。我也发现，性能可以通过改变应用程序的逻辑来提升，甚至达到极限。为了更详细的了解这一点，我们需要做一些性能
JVM参数配置大全 nickys jvm 应用服务器
JVM参数配置大全 /usr/local/jdk/bin/java -Dresin.home=/usr/local/resin -server -Xms1800M -Xmx1800M -Xmn300M -Xss512K -XX:PermSize=300M -XX:MaxPermSize=300M -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=5 -
搭建 CentOS 6 服务器(14) - squid、Varnish rensanning varnish
（一）squid 安装 # yum install httpd-tools -y # htpasswd -c -b /etc/squid/passwords squiduser 123456 # yum install squid -y 设置 # cp /etc/squid/squid.conf /etc/squid/squid.conf.bak # vi /etc/
Spring缓存注解@Cache使用 tom_seed spring
参考资料 http://www.ibm.com/developerworks/cn/opensource/os-cn-spring-cache/ http://swiftlet.net/archives/774 缓存注解有以下三个： @Cacheable @CacheEvict @CachePut
dom4j解析XML时出现"java.lang.noclassdeffounderror: org/jaxen/jaxenexception"错误 xp9802
java.lang.NoClassDefFoundError: org/jaxen/JaxenExc 关键字: java.lang.noclassdeffounderror: org/jaxen/jaxenexception 使用dom4j解析XML时，要快速获取某个节点的数据，使用XPath是个不错的方法，dom4j的快速手册里也建议使用这种方式执行时却抛出以下异常： Exceptio