梦飞天

第62课：SparkSQL下的Parquet使用最佳实践和代码实践学习笔记

本期内容：

1 SparkSQL下的Parquet使用最佳实践

2 SparkSQL下的Parquet实战

一：Spark SQL下的Parquet使用最佳实践

1，过去整个业界对大数据的分析的技术栈的Pipeline一般分为以下两种方式：

a) Data Source->HDFS->MR/Hive/Spark(相当于ETL)->HDFS Parquet->Spark SQL/Impala->Result Service(可以放在DB中，也有可能被通过JDBC/ODBC来作为数据服务使用)；

b) Data Source->Real time update data to HBase/DB->Export to Parquet->Spark SQL/Impala-> Result Service(可以放在DB中，也有可能被通过JDBC/ODBC来作为数据服务使用)；

上述的第二种方式完全可以通过Kafka+Spark Streaming+Spark SQL（内部也强烈建议采用Parquet的方式来存储数据）的方式取代。

任何情况下都需要实时处理！人脸识别、信用卡诈骗等都是基于流处理。

2，期待的方式：Data Source->Kafka->Spark Streaming->Parquet->Spark SQL(SparkSQL可以结合ML、GraphX等)->Parquet->其它各种Data Mining等

二：Parquet的精要介绍

1， Parquet是列式存储格式的一种文件类型，列式存储有以下的核心优势：

a.可以跳过不符合条件的数据，只读取需要的数据，降低IO数据量。
b.压缩编码可以降低磁盘存储空间。由于同一列的数据类型是一样的，可以使用更高效的压缩编码（例如Run Length Encoding和Delta Encoding）进一步节约存储空间。

c.只读取需要的列，支持向量运算，能够获取更好的扫描性能。

三．下面编写代码读取parquet文件内容并打印：

package SparkSQLByJava;

import java.util.List;

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.sql.DataFrame;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SQLContext;

public class SparkSQLParquetOps {

public static void main(String[] args) {

SparkConf conf = new SparkConf().setMaster("local").setAppName("SparkSQLParquetOps");

JavaSparkContext sc = new JavaSparkContext(conf);

SQLContext sqlContext = new SQLContext(sc);

DataFrame usersDF = sqlContext.read().parquet("D:\\DT-IMF\\testdata\\users.parquet");

//注册成为临时表以供后续的SQL查询操作

usersDF.registerTempTable("users");

//进行数据的多维度分析

DataFrame result = sqlContext.sql("select * from users");

//对结果进行处理，包括由DataFrame转换成为RDD,以及结构持久化。

List listRow = result.javaRDD().collect();

for(Row row : listRow){

System.out.println(row);

}

在eclipse中的运行console:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

16/04/02 09:17:56 INFO SparkContext: Running Spark version 1.6.0

16/04/02 09:18:07 INFO SecurityManager: Changing view acls to: think

16/04/02 09:18:07 INFO SecurityManager: Changing modify acls to: think

16/04/02 09:18:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(think); users with modify permissions: Set(think)

16/04/02 09:18:09 INFO Utils: Successfully started service 'sparkDriver' on port 60088.

16/04/02 09:18:11 INFO Slf4jLogger: Slf4jLogger started

16/04/02 09:18:11 INFO Remoting: Starting remoting

16/04/02 09:18:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60101]

16/04/02 09:18:11 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60101.

16/04/02 09:18:11 INFO SparkEnv: Registering MapOutputTracker

16/04/02 09:18:12 INFO SparkEnv: Registering BlockManagerMaster

16/04/02 09:18:12 INFO DiskBlockManager: Created local directory at C:\Users\think\AppData\Local\Temp\blockmgr-c045274d-ef94-471d-819a-93e044022e60

16/04/02 09:18:12 INFO MemoryStore: MemoryStore started with capacity 1773.8 MB

16/04/02 09:18:12 INFO SparkEnv: Registering OutputCommitCoordinator

16/04/02 09:18:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/04/02 09:18:13 INFO SparkUI: Started SparkUI at http://192.168.56.1:4040

16/04/02 09:18:13 INFO Executor: Starting executor ID driver on host localhost

16/04/02 09:18:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60108.

16/04/02 09:18:13 INFO NettyBlockTransferService: Server created on 60108

16/04/02 09:18:13 INFO BlockManagerMaster: Trying to register BlockManager

16/04/02 09:18:13 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60108 with 1773.8 MB RAM, BlockManagerId(driver, localhost, 60108)

16/04/02 09:18:13 INFO BlockManagerMaster: Registered BlockManager

16/04/02 09:18:17 WARN : Your hostname, think-PC resolves to a loopback/non-reachable address: fe80:0:0:0:d401:a5b5:2103:6d13%eth8, but we couldn't find any external IP address!

16/04/02 09:18:18 INFO ParquetRelation: Listing file:/D:/DT-IMF/testdata/users.parquet on driver

16/04/02 09:18:20 INFO SparkContext: Starting job: parquet at SparkSQLParquetOps.java:16

16/04/02 09:18:20 INFO DAGScheduler: Got job 0 (parquet at SparkSQLParquetOps.java:16) with 1 output partitions

16/04/02 09:18:20 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at SparkSQLParquetOps.java:16)

16/04/02 09:18:20 INFO DAGScheduler: Parents of final stage: List()

16/04/02 09:18:20 INFO DAGScheduler: Missing parents: List()

16/04/02 09:18:20 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkSQLParquetOps.java:16), which has no missing parents

16/04/02 09:18:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 61.5 KB, free 61.5 KB)

16/04/02 09:18:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.6 KB, free 82.1 KB)

16/04/02 09:18:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60108 (size: 20.6 KB, free: 1773.7 MB)

16/04/02 09:18:20 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006

16/04/02 09:18:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkSQLParquetOps.java:16)

16/04/02 09:18:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/04/02 09:18:21 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2180 bytes)

16/04/02 09:18:21 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

16/04/02 09:18:21 INFO ParquetFileReader: Initiating action with parallelism: 5

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

16/04/02 09:18:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1842 bytes result sent to driver

16/04/02 09:18:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3301 ms on localhost (1/1)

16/04/02 09:18:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool

16/04/02 09:18:24 INFO DAGScheduler: ResultStage 0 (parquet at SparkSQLParquetOps.java:16) finished in 3.408 s

16/04/02 09:18:24 INFO DAGScheduler: Job 0 finished: parquet at SparkSQLParquetOps.java:16, took 4.121836 s

16/04/02 09:18:26 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 61.8 KB, free 143.9 KB)

16/04/02 09:18:26 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 19.3 KB, free 163.2 KB)

16/04/02 09:18:26 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:60108 (size: 19.3 KB, free: 1773.7 MB)

16/04/02 09:18:26 INFO SparkContext: Created broadcast 1 from javaRDD at SparkSQLParquetOps.java:23

16/04/02 09:18:28 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

16/04/02 09:18:28 INFO ParquetRelation: Reading Parquet file(s) from file:/D:/DT-IMF/testdata/users.parquet

16/04/02 09:18:28 INFO SparkContext: Starting job: collect at SparkSQLParquetOps.java:23

16/04/02 09:18:28 INFO DAGScheduler: Got job 1 (collect at SparkSQLParquetOps.java:23) with 1 output partitions

16/04/02 09:18:28 INFO DAGScheduler: Final stage: ResultStage 1 (collect at SparkSQLParquetOps.java:23)

16/04/02 09:18:28 INFO DAGScheduler: Parents of final stage: List()

16/04/02 09:18:28 INFO DAGScheduler: Missing parents: List()

16/04/02 09:18:28 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at javaRDD at SparkSQLParquetOps.java:23), which has no missing parents

16/04/02 09:18:28 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.6 KB, free 167.8 KB)

16/04/02 09:18:28 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.6 KB, free 170.4 KB)

16/04/02 09:18:28 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:60108 (size: 2.6 KB, free: 1773.7 MB)

16/04/02 09:18:28 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006

16/04/02 09:18:28 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at javaRDD at SparkSQLParquetOps.java:23)

16/04/02 09:18:28 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks

16/04/02 09:18:28 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,PROCESS_LOCAL, 2179 bytes)

16/04/02 09:18:28 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)

16/04/02 09:18:28 INFO ParquetRelation$$anonfun$buildInternalScan$1$$anon$1: Input split: ParquetInputSplit{part: file:/D:/DT-IMF/testdata/users.parquet start: 0 end: 615 length: 615 hosts: []}

16/04/02 09:18:28 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

16/04/02 09:18:28 INFO CatalystReadSupport: Going to read the following fields from the Parquet file:

Parquet form:

message spark_schema {

required binary name (UTF8);

optional binary favorite_color (UTF8);

required group favorite_numbers (LIST) {

repeated int32 array;

}

Catalyst form:

StructType(StructField(name,StringType,false), StructField(favorite_color,StringType,true), StructField(favorite_numbers,ArrayType(IntegerType,false),false))

16/04/02 09:18:29 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:60108 in memory (size: 20.6 KB, free: 1773.7 MB)

16/04/02 09:18:29 INFO ContextCleaner: Cleaned accumulator 1

16/04/02 09:18:29 INFO GenerateUnsafeProjection: Code generated in 422.989887 ms

16/04/02 09:18:29 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.

16/04/02 09:18:29 INFO InternalParquetRecordReader: at row 0. reading next block

16/04/02 09:18:29 INFO CodecPool: Got brand-new decompressor [.snappy]

16/04/02 09:18:29 INFO InternalParquetRecordReader: block read in memory in 54 ms. row count = 2

16/04/02 09:18:30 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 3532 bytes result sent to driver

16/04/02 09:18:30 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1796 ms on localhost (1/1)

16/04/02 09:18:30 INFO DAGScheduler: ResultStage 1 (collect at SparkSQLParquetOps.java:23) finished in 1.798 s

16/04/02 09:18:30 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool

16/04/02 09:18:30 INFO DAGScheduler: Job 1 finished: collect at SparkSQLParquetOps.java:23, took 1.863220 s

[Alyssa,null,WrappedArray(3, 9, 15, 20)]

[Ben,red,WrappedArray()]

16/04/02 09:18:30 INFO SparkContext: Invoking stop() from shutdown hook

16/04/02 09:18:30 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4040

16/04/02 09:18:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/04/02 09:18:30 INFO MemoryStore: MemoryStore cleared

16/04/02 09:18:30 INFO BlockManager: BlockManager stopped

16/04/02 09:18:30 INFO BlockManagerMaster: BlockManagerMaster stopped

16/04/02 09:18:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/04/02 09:18:30 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

16/04/02 09:18:30 INFO SparkContext: Successfully stopped SparkContext

16/04/02 09:18:30 INFO ShutdownHookManager: Shutdown hook called

16/04/02 09:18:30 INFO ShutdownHookManager: Deleting directory C:\Users\think\AppData\Local\Temp\spark-46e1adfd-4a69-42a8-9b91-24fb8dd8da16

16/04/02 09:18:30 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

以上内容是王家林老师DT大数据梦工厂《 IMF传奇行动》第62课的学习笔记。
王家林老师是Spark、Flink、Docker、Android技术中国区布道师。Spark亚太研究院院长和首席专家，DT大数据梦工厂创始人，Android软硬整合源码级专家，英语发音魔术师，健身狂热爱好者。

微信公众账号：DT_Spark

联系邮箱[email protected]

电话：18610086859

QQ:1740415547

微信号：18610086859

新浪微博：ilovepains

附：

apache parquet官网http://parquet.apache.org/documentation/latest/ 1/7对parquet的解释：

/ Apache Parquet (http://parquet.apache.org)

Motivation

We created Parquet to make the advantages of compressed, efficient columnar data

representation available to any project in the Hadoop ecosystem.

Parquet is built from the ground up with complex nested data structures in mind, and uses the

record shredding and assembly algorithm (https://github.com/Parquet/parquet-mr/wiki/The-

striping-and-assembly-algorithms-from-the-Dremel-paper) described in the Dremel paper. We

believe this approach is superior to simple flattening of nested name spaces.

Parquet is built to support very efficient compression and encoding schemes. Multiple projects

have demonstrated the performance impact of applying the right compression and encoding

scheme to the data. Parquet allows compression schemes to be specified on a per-column level,

and is future-proofed to allow adding more encodings as they are invented and implemented.

Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing

frameworks, and we are not interested in playing favorites. We believe that an efficient, well-

implemented columnar storage substrate should be useful to all frameworks without the cost of

extensive and difficult to set up dependencies.

Modules

The parquet-format project contains format specifications and Thrift definitions of metadata

required to properly read Parquet files.

The parquet-mr project contains multiple sub-modules, which implement the core components of

reading and writing a nested, column-oriented data stream, map this core onto the parquet

format, and provide Hadoop Input/Output Formats, Pig loaders, and other java-based utilities for

interacting with Parquet.

The parquet-compatibility project contains compatibility tests that can be used to verify that

implementations in different languages can read and write each other’s files.

Building

Java resources can be build using mvn package. The current stable version should always be

available from Maven Central.

C++ thrift resources can be generated via make.

Thrift can be also code-genned into any other thrift-supported language.

Releasing

See How to Release (../how-to-release/).

Glossary

Apache Software Foundation

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 2/7

Block (hdfs block): This means a block in hdfs and the meaning is unchanged for

describing this file format. The file format is designed to work well on top of hdfs.

File: A hdfs file that must include the metadata for the file. It does not need to actually

contain the data.

Row group: A logical horizontal partitioning of the data into rows. There is no physical

structure that is guaranteed for a row group. A row group consists of a column chunk for

each column in the dataset.

Column chunk: A chunk of the data for a particular column. These live in a particular row

group and is guaranteed to be contiguous in the file.

Page: Column chunks are divided up into pages. A page is conceptually an indivisible unit

(in terms of compression and encoding). There can be multiple page types which is

interleaved in a column chunk.

Hierarchically, a file consists of one or more row groups. A row group contains exactly one column

chunk per column. Column chunks contain one or more pages.

Unit of parallelization

MapReduce - File/Row Group

IO - Column chunk

Encoding/Compression - Page

File format

This file and the thrift definition should be read together to understand the format.

4-byte magic number "PAR1"

...

File Metadata

4-byte length in bytes of file metadata

4-byte magic number "PAR1"

In the above example, there are N columns in this table, split into M row groups. The file metadata

contains the locations of all the column metadata start locations. More details on what is

contained in the metadata can be found in the thrift files.

Metadata is written after the data to allow for single pass writing.

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 3/7

Readers are expected to first read the file metadata to find all the column chunks they are

interested in. The columns chunks should then be read sequentially.

Metadata

There are three types of metadata: file metadata, column (chunk) metadata and page header

metadata. All thrift structures are serialized using the TCompactProtocol.

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 4/7

Types

The types supported by the file format are intended to be as minimal as possible, with a focus on

how the types effect on disk storage. For example, 16-bit ints are not explicitly supported in the

storage format since they are covered by 32-bit ints with an efficient encoding. This reduces the

complexity of implementing readers and writers for the format. The types are: - BOOLEAN: 1 bit

boolean - INT32: 32 bit signed ints - INT64: 64 bit signed ints - INT96: 96 bit signed ints - FLOAT:

IEEE 32-bit floating point values - DOUBLE: IEEE 64-bit floating point values - BYTE_ARRAY:

arbitrarily long byte arrays.

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 5/7

Logical Types

Logical types are used to extend the types that parquet can be used to store, by specifying how

the primitive types should be interpreted. This keeps the set of primitive types to a minimum and

reuses parquet’s efficient encodings. For example, strings are stored as byte arrays (binary) with

a UTF8 annotation. These annotations define how to further decode and interpret the data.

Annotations are stored as a ConvertedType in the file metadata and are documented in

LogicalTypes.md (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md).

Nested Encoding

To encode nested columns, Parquet uses the Dremel encoding with definition and repetition

levels. Definition levels specify how many optional fields in the path for the column are defined.

Repetition levels specify at what repeated field in the path has the value repeated. The max

definition and repetition levels can be computed from the schema (i.e. how much nesting there

is). This defines the maximum number of bits required to store the levels (levels are defined for all

values in the column).

Two encodings for the levels are supported BITPACKED and RLE. Only RLE is now used as it

supersedes BITPACKED.

Nulls

Nullity is encoded in the definition levels (which is run-length encoded). NULL values are not

encoded in the data. For example, in a non-nested schema, a column with 1000 NULLs would be

encoded with run-length encoding (0, 1000 times) for the definition levels and nothing else.

Data Pages

For data pages, the 3 pieces of information are encoded back to back, after the page header.

We have the - definition levels data,

- repetition levels data, - encoded values. The size of specified in the header is for all 3 pieces

combined.

The data for the data page is always required. The definition and repetition levels are optional,

based on the schema definition. If the column is not nested (i.e. the path to the column has length

1), we do not encode the repetition levels (it would always have the value 1). For data that is

required, the definition levels are skipped (if encoded, it will always have the value of the max

definition level).

For example, in the case where the column is non-nested and required, the data in the page is

only the encoded values.

The supported encodings are described in Encodings.md (https://github.com/Parquet/parquet-

format/blob/master/Encodings.md)

Column chunks

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 6/7

Column chunks are composed of pages written back to back. The pages share a common header

and readers can skip over page they are not interested in. The data for the page follows the

header and can be compressed and/or encoded. The compression and encoding is specified in

the page metadata.

Checksumming

Data pages can be individually checksummed. This allows disabling of checksums at the HDFS

file level, to better support single row lookups.

Error recovery

If the file metadata is corrupt, the file is lost. If the column metdata is corrupt, that column chunk is

lost (but column chunks for this column in other row groups are okay). If a page header is

corrupt, the remaining pages in that chunk are lost. If the data within a page is corrupt, that page

is lost. The file will be more resilient to corruption with smaller row groups.

Potential extension: With smaller row groups, the biggest issue is placing the file metadata at the

end. If an error happens while writing the file metadata, all the data written will be unreadable.

This can be fixed by writing the file metadata every Nth row group.

Each file metadata would be cumulative and include all the row groups written so far. Combining

this with the strategy used for rc or avro files using sync markers, a reader could recover partially

written files.

Separating metadata and column data.

The format is explicitly designed to separate the metadata from the data. This allows splitting

columns into multiple files, as well as having a single metadata file reference multiple parquet

files.

Configurations

Row group size: Larger row groups allow for larger column chunks which makes it possible

to do larger sequential IO. Larger groups also require more buffering in the write path (or a

two pass write). We recommend large row groups (512MB - 1GB). Since an entire row

group might need to be read, we want it to completely fit on one HDFS block. Therefore,

HDFS block sizes should also be set to be larger. An optimized read setup would be: 1GB

row groups, 1GB HDFS block size, 1 HDFS block per HDFS file.

Data page size: Data pages should be considered indivisible so smaller data pages allow

for more fine grained reading (e.g. single row lookup). Larger page sizes incur less space

overhead (less page headers) and potentially less parsing overhead (processing

headers). Note: for sequential scans, it is not expected to read a page at a time; this is not

the IO chunk. We recommend 8KB for page sizes.

Extensibility

2016/3/23 Apache Parquet

http://parquet.apache.org/documentation/latest/ 7/7

There are many places in the format for compatible extensions: - File Version: The file metadata

contains a version. - Encodings: Encodings are specified by enum and more can be added in the

future.

- Page types: Additional page types can be added and safely skipped.

Apache License v2.0 (http://www.apache.org/licenses/). Apache Parquet and the Apache feather

logo are trademarks of The Apache Software Foundation.

你可能感兴趣的:(Spark)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
Spark 组件 GraphX、Streaming 叶域大数据 spark spark 大数据分布式
Spark组件GraphX、Streaming一、SparkGraphX1.1GraphX的主要概念1.2GraphX的核心操作1.3示例代码1.4GraphX的应用场景二、SparkStreaming2.1SparkStreaming的主要概念2.2示例代码2.3SparkStreaming的集成2.4SparkStreaming的应用场景SparkGraphX用于处理图和图并行计算。Graph
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
写出渗透测试信息收集详细流程卿酌南烛_b805
一、扫描域名漏洞：域名漏洞扫描工具有AWVS、APPSCAN、Netspark、WebInspect、Nmap、Nessus、天镜、明鉴、WVSS、RSAS等。二、子域名探测：1、dns域传送漏洞2、搜索引擎查找（通过Google、bing、搜索c段）3、通过ssl证书查询网站：https://myssl.com/ssl.html和https://www.chinassl.net/ssltools
Spark MLlib模型训练—推荐算法 ALS(Alternative Least Squares) 不二人生 Spark ML 实战 spark-ml 推荐算法算法
SparkMLlib模型训练—推荐算法ALS(AlternativeLeastSquares)如果你平时爱刷抖音，或者热衷看电影，不知道有没有过这样的体验：这类影视App你用得越久，它就好像会读心术一样，总能给你推荐对胃口的内容。其实这种迎合用户喜好的推荐，离不开机器学习中的推荐算法。在今天这一讲，我们就结合两个有趣的电影推荐场景，为你讲解SparkMLlib支持的协同过滤与频繁项集算法电影推荐场
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
分布式离线计算—Spark—基础介绍测试开发abbey 人工智能—大数据
原文作者：饥渴的小苹果原文地址：【Spark】Spark基础教程目录Spark特点Spark相对于Hadoop的优势Spark生态系统Spark基本概念Spark结构设计Spark各种概念之间的关系Executor的优点Spark运行基本流程Spark运行架构的特点Spark的部署模式Spark三种部署方式Hadoop和Spark的统一部署摘要：Spark是基于内存计算的大数据并行计算框架Spar
spark常用命令我是浣熊的微笑 spark
查看报错日志：yarnlogsapplicationIDspark2-submit--masteryarn--classcom.hik.ReadHdfstest-1.0-SNAPSHOT.jar进入$SPARK_HOME目录，输入bin/spark-submit--help可以得到该命令的使用帮助。hadoop@wyy:/app/hadoop/spark100$bin/spark-submit--
spark启动命令学不会又听不懂 spark 大数据分布式
hadoop启动：cd/root/toolssstart-dfs.sh，只需在hadoop01上启动stop-dfs.sh日志查看：cat/root/toolss/hadoop/logs/hadoop-root-datanode-hadoop03.outzookeeper启动：cd/root/toolss/zookeeperbin/zkServer.shstart，三台都要启动bin/zkServ
大数据领域的深度分析——AI是在帮助开发者还是取代他们？阳爱铭大数据与数据中台技术沉淀大数据人工智能后端数据库架构数据库开发 etl工程师 chatgpt
在大数据领域，生成式人工智能（AIGC）的应用正在迅速扩展，改变了数据科学家和开发者的工作方式。本文将从大数据的专业视角，探讨AI工具在这一领域的作用，以及它们是如何帮助开发者而非取代他们的。1.大数据领域的AI工具现状在大数据领域，AI工具已经取得了显著进展，以下是几款主要的AI工具及其功能和实际应用：ApacheSpark+MLlib：ApacheSpark是一个开源的分布式计算系统，广泛用于
大数据新视界 --大数据大厂之 Spark 性能优化秘籍：从配置到代码实践青云交大数据新视界 Spark 性能优化内存分配并行度存储级别 shuffle 减少算法优化代码实践数据读取广播变量数据倾斜 Spark 数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
【面试系列】Spark 高频面试题解答野老杂谈全网最全IT公司面试宝典面试 spark 职场和发展大数据
欢迎来到我的博客，很高兴能够在这里和您见面！欢迎订阅相关专栏：⭐️全网最全IT互联网公司面试宝典：收集整理全网各大IT互联网公司技术、项目、HR面试真题.⭐️AIGC时代的创新与未来：详细讲解AIGC的概念、核心技术、应用领域等内容。⭐️大数据平台建设指南：全面讲解从数据采集到数据可视化的整个过程，掌握构建现代化数据平台的核心技术和方法。⭐️《遇见Python：初识、了解与热恋》：涵盖了Pytho
spark常见面试题爱敲代码的小黑 spark 大数据分布式
文章目录1.Spark的运行流程？2.Spark中的RDD机制理解吗？3.RDD的宽窄依赖4.DAG中为什么要划分Stage？5.Spark程序执行，有时候默认为什么会产生很多task，怎么修改默认task执行个数？6.RDD中reduceBykey与groupByKey哪个性能好，为什么？7.SparkMasterHA主从切换过程不会影响到集群已有作业的运行，为什么？8.SparkMaster使
Spark面试题 golove666 面试题大全 spark 大数据分布式面试
Spark面试题1.Spark基础概念1.1解释Spark是什么以及它的主要特点Spark是什么？Spark的主要特点1.2描述Spark运行时架构和组件主要的Spark架构组件：1.3讲述Spark中的弹性分布式数据集（RDD）和数据帧（DataFrame）弹性分布式数据集（RDD）主要特征：创建和转换：使用场景：数据帧（DataFrame）主要特征：创建和操作：使用场景：RDD与DataFra
图计算：基于SparkGrpahX计算聚类系数妙龄少女郭德纲 Spark 图算法 Scala 聚类数据挖掘机器学习
图计算：基于SparkGrpahX计算聚类系数文章目录图计算：基于SparkGrpahX计算聚类系数一、什么是聚类系数二、基于SparkGraphX的聚类系数代码实现总结一、什么是聚类系数聚类系数（ClusteringCoefficient）是图计算和网络分析中的一个重要概念，用于衡量网络中节点的局部聚集程度。它有助于理解网络中节点之间的紧密程度和网络的结构特性。这是一种用来衡量图中节点聚类程度的
2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到 2401_84569545 程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
Spark运行时架构 tooolik spark 架构大数据
目录一，Spark运行时架构二，YARN集群架构（一）YARN集群主要组件1、ResourceManager-资源管理器2、NodeManager-节点管理器3、Task-任务4、Container-容器5、ApplicationMaster-应用程序管理器6，总结（二）YARN集群中应用程序的执行流程三、SparkStandalone架构（一）client提交方式（二）cluster提交方式四、
使用SparkSql进行表的分析与统计 xingyuan8 大数据 java
背景我们的数据挖掘平台对数据统计有比较迫切的需求，而Spark本身对数据统计已经做了一些工作，希望梳理一下Spark已经支持的数据统计功能，后期再进行扩展。准备数据在参考文献6中下载鸢尾花数据，此处格式为iris.data格式，先将data后缀改为csv后缀（不影响使用，只是为了保证后续操作不需要修改）。数据格式如下：SepalLengthSepalWidthPetalLengthPetalWid
13.Spark Core-Spark中广播变量和累加器 __元昊__
一、前述Spark中因为算子中的真正逻辑是发送到Executor中去运行的，所以当Executor中需要引用外部变量时，需要使用广播变量。累机器相当于统筹大变量，常用于计数，统计。二、具体原理1、广播变量广播变量理解图image注意事项1、能不能将一个RDD使用广播变量广播出去？不能，因为RDD是不存储数据的。可以将RDD的结果广播出去。2、广播变量只能在Driver端定义，不能在Executor
比较Spark与Flink 傲雪凌霜，松柏长青大数据后端 spark flink 大数据
ApacheSpark和ApacheFlink都是目前非常流行的大数据处理引擎，但它们在架构、处理模式、应用场景等方面有一些显著的区别。下面是二者的对比：1.处理模式Spark:主要支持批处理（BatchProcessing），也能通过SparkStreaming处理流式数据，但SparkStreaming本质上是通过微批（micro-batching）的方式处理流数据，延迟相对较高。SparkS
Spark底层逻辑傲雪凌霜，松柏长青大数据后端 spark 大数据
ApacheSpark的底层逻辑可以从其核心概念、组件和执行流程等方面来理解。Spark提供了一个分布式数据处理框架，其底层逻辑基于批处理架构，能够在大规模集群中高效地处理数据。以下是Spark的底层逻辑的详细介绍：1.核心概念Spark的底层基于几个核心概念来实现分布式计算，包括：RDD（ResilientDistributedDataset，弹性分布式数据集）：RDD是Spark最基础的数据抽
Spark - 升级版数据源JDBC2 大猪大猪
在spark的数据源中，只支持Append,Overwrite,ErrorIfExists,Ignore,这几种模式，但是我们在线上的业务几乎全是需要upsert功能的，就是已存在的数据肯定不能覆盖，在mysql中实现就是采用：ONDUPLICATEKEYUPDATE，有没有这样一种实现？官方：不好意思，不提供，dounine：我这有呀，你来用吧。哈哈，为了方便大家的使用我已经把项目打包到mave
PySpark 静听山水 Spark spark
PySpark的本质确实是Python的一个接口层，它允许你使用Python语言来编写ApacheSpark应用程序。通过这个接口，你可以利用Spark强大的分布式计算能力，同时享受Python的易用性和灵活性。1、PySpark的工作原理PySpark的工作原理可以概括为以下几个步骤：编写Python代码：开发者使用Python语法来编写Spark应用程序。这些程序通常涉及创建RDDs（弹性分布
Ubuntu的ssh 请不要问我是谁
安装sshsudoapt-getupdatesudoapt-getinstallopenssh-server检测ssh是否启动sudops-e|grepssh创建root用户sudopasswdroot配置本机无密码ssh登录cd/home/spark0ssh-keygen-trsa-P""cat.ssh/id_rsa.pub>>.ssh/authorized_keyschmod600.ssh/a
2024年大数据最新实时数仓之实时数仓架构(Hudi) 2401_84185556 程序员大数据架构
技术框架Kafka：用于接入数据源；FlinkCDC：如果直接接入业务数据源可以考虑CDC方式，如果通过Kafka缓冲接入业务数据可以忽略;Flink：用于数据ETL，包括接入数据、处理数据及输出数据全链路数据计算任务；Spark：用于数据ETL，包括处理数据及输出数据全链路数据计算任务；Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；Doris：O
实时数仓之实时数仓架构(Hudi)(1)，2024年最新熬夜整理华为最新大数据开发笔试题 2401_84181221 程序员架构大数据
+Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；+Doris：OLAP引擎，同步数仓结果模型，对外提供数据服务支持；+Hbase：用来存储维表信息，维表数据来源一部分有Flink加工实时写入，另一部分是从Spark任务生产，其主要作用用来支持FlinkETL处理过程中的LookupJoin功能。这里选用Hbase原因主要因为Table的HbaseC
数据采集高并发的架构应用 3golden .net
问题的出发点：最近公司为了发展需要，要扩大对用户的信息采集，每个用户的采集量估计约2W。如果用户量增加的话，将会大量照成采集量成3W倍的增长，但是又要满足日常业务需要，特别是指令要及时得到响应的频率次数远大于预期。 &n
不停止 MySQL 服务增加从库的两种方式 brotherlamp linux linux视频 linux资料 linux教程 linux自学
现在生产环境MySQL数据库是一主一从，由于业务量访问不断增大，故再增加一台从库。前提是不能影响线上业务使用，也就是说不能重启MySQL服务，为了避免出现其他情况，选择在网站访问量低峰期时间段操作。一般在线增加从库有两种方式，一种是通过mysqldump备份主库，恢复到从库，mysqldump是逻辑备份，数据量大时，备份速度会很慢，锁表的时间也会很长。另一种是通过xtrabacku
Quartz——SimpleTrigger触发器 eksliang SimpleTrigger TriggerUtils quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208166 一.概述 SimpleTrigger触发器，当且仅需触发一次或者以固定时间间隔周期触发执行；二.SimpleTrigger的构造函数 SimpleTrigger(String name, String group)：通过该构造函数指定Trigger所属组和名称； Simpl
Informatica应用（1） 18289753290 sql workflow lookup 组件 Informatica
1.如果要在workflow中调用shell脚本有一个command组件，在里面设置shell的路径；调度wf可以右键出现schedule，现在用的是HP的tidal调度wf的执行。 2.designer里面的router类似于SSIS中的broadcast（多播组件）;Reset_Workflow_Var：参数重置（比如说我这个参数初始是1在workflow跑得过程中变成了3我要在结束时还要
python 获取图片验证码中文字酷的飞上天空 python
根据现成的开源项目 http://code.google.com/p/pytesser/改写在window上用easy_install安装不上看了下源码发现代码很少于是就想自己改写一下添加支持网络图片的直接解析 #coding:utf-8 #import sys #reload(sys) #sys.s
AJAX 永夜-极光 Ajax
1.AJAX功能:动态更新页面,减少流量消耗,减轻服务器负担 2.代码结构: <html> <head> <script type="text/javascript"> function loadXMLDoc() { .... AJAX script goes here ...
创业OR读研随便小屋创业
现在研一，有种想创业的想法，不知道该不该去实施。因为对于的我情况这两者是矛盾的，可能就是鱼与熊掌不能兼得。研一的生活刚刚过去两个月，我们学校主要的是
需求做得好与坏直接关系着程序员生活质量 aijuans IT 生活
这个故事还得从去年换工作的事情说起，由于自己不太喜欢第一家公司的环境我选择了换一份工作。去年九月份我入职现在的这家公司，专门从事金融业内软件的开发。十一月份我们整个项目组前往北京做现场开发，从此苦逼的日子开始了。系统背景：五月份就有同事前往甲方了解需求一直到6月份，后续几个月也完
如何定义和区分高级软件开发工程师 aoyouzi
在软件开发领域，高级开发工程师通常是指那些编写代码超过 3 年的人。这些人可能会被放到领导的位置，但经常会产生非常糟糕的结果。Matt Briggs 是一名高级开发工程师兼 Scrum 管理员。他认为，单纯使用年限来划分开发人员存在问题，两个同样具有 10 年开发经验的开发人员可能大不相同。近日，他发表了一篇博文，根据开发者所能发挥的作用划分软件开发工程师的成长阶段。　　初
Servlet的请求与响应百合不是茶 servlet get提交 java处理post提交
Servlet是tomcat中的一个重要组成,也是负责客户端和服务端的中介 1,Http的请求方式(get ,post); 客户端的请求一般都会都是Servlet来接受的,在接收之前怎么来确定是那种方式提交的,以及如何反馈,Servlet中有相应的方法, http的get方式 servlet就是都doGet(
web.xml配置详解之listener bijian1013 java web.xml listener
一.定义 <listener> <listen-class>com.myapp.MyListener</listen-class> </listener> 二.作用该元素用来注册一个监听器类。可以收到事件什么时候发生以及用什么作为响
Web页面性能优化（yahoo技术） Bill_chen JavaScript Ajax Web css Yahoo
1.尽可能的减少HTTP请求数 content 2.使用CDN server 3.添加Expires头(或者 Cache-control) server 4.Gzip 组件 server 5.把CSS样式放在页面的上方。 css 6.将脚本放在底部(包括内联的) javascript 7.避免在CSS中使用Expressions css 8.将javascript和css独立成外部文
【MongoDB学习笔记八】MongoDB游标、分页查询、查询结果排序 bit1129 mongodb
游标游标，简单的说就是一个查询结果的指针。游标作为数据库的一个对象，使用它是包括声明打开循环抓去一定数目的文档直到结果集中的所有文档已经抓取完关闭游标游标的基本用法，类似于JDBC的ResultSet(hasNext判断是否抓去完,next移动游标到下一条文档)，在获取一个文档集时，可以提供一个类似JDBC的FetchSize
ORA-12514 TNS 监听程序当前无法识别连接描述符中请求服务的解决方法白糖_ ORA-12514
今天通过Oracle SQL*Plus连接远端服务器的时候提示“监听程序当前无法识别连接描述符中请求服务”，遂在网上找到了解决方案： ①打开Oracle服务器安装目录\NETWORK\ADMIN\listener.ora文件，你会看到如下信息： # listener.ora Network Configuration File: D:\database\Oracle\net
Eclipse 问题 A resource exists with a different case bozch eclipse
在使用Eclipse进行开发的时候，出现了如下的问题： Description Resource Path Location TypeThe project was not built due to "A resource exists with a different case: '/SeenTaoImp_zhV2/bin/seentao'.&
编程之美-小飞的电梯调度算法 bylijinnan 编程之美
public class AptElevator { /** * 编程之美小飞电梯调度算法 * 在繁忙的时间，每次电梯从一层往上走时，我们只允许电梯停在其中的某一层。 * 所有乘客都从一楼上电梯，到达某层楼后，电梯听下来，所有乘客再从这里爬楼梯到自己的目的层。 * 在一楼时，每个乘客选择自己的目的层，电梯则自动计算出应停的楼层。 * 问：电梯停在哪
SQL注入相关概念 chenbowen00 sql Web 安全
SQL Injection：就是通过把SQL命令插入到Web表单递交或输入域名或页面请求的查询字符串，最终达到欺骗服务器执行恶意的SQL命令。具体来说，它是利用现有应用程序，将（恶意）的SQL命令注入到后台数据库引擎执行的能力，它可以通过在Web表单中输入（恶意）SQL语句得到一个存在安全漏洞的网站上的数据库，而不是按照设计者意图去执行SQL语句。首先让我们了解什么时候可能发生SQ
[光与电]光子信号战防御原理 comsci 原理
无论是在战场上,还是在后方,敌人都有可能用光子信号对人体进行控制和攻击,那么采取什么样的防御方法,最简单,最有效呢? 我们这里有几个山寨的办法,可能有些作用,大家如果有兴趣可以去实验一下根据光
oracle 11g新特性:Pending Statistics daizj oracle dbms_stats
oracle 11g新特性:Pending Statistics 转从11g开始，表与索引的统计信息收集完毕后，可以选择收集的统信息立即发布，也可以选择使新收集的统计信息处于pending状态，待确定处于pending状态的统计信息是安全的，再使处于pending状态的统计信息发布，这样就会避免一些因为收集统计信息立即发布而导致SQL执行计划走错的灾难。在 11g 之前的版本中，D
快速理解RequireJs dengkane jquery requirejs
RequireJs已经流行很久了，我们在项目中也打算使用它。它提供了以下功能：声明不同js文件之间的依赖可以按需、并行、延时载入js库可以让我们的代码以模块化的方式组织初看起来并不复杂。在html中引入requirejs 在HTML中，添加这样的 <script> 标签： <script src="/path/to
C语言学习四流程控制if条件选择、for循环和强制类型转换 dcj3sjt126com c
# include <stdio.h> int main(void) { int i, j; scanf("%d %d", &i, &j); if (i > j) printf("i大于j\n"); else printf("i小于j\n"); retu
dictionary的使用要注意 dcj3sjt126com IO
NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys: user.user_id , @"id", user.username , @"username",
Android 中的资源访问(Resource) finally_m xml android String drawable color
简单的说，Android中的资源是指非代码部分。例如，在我们的Android程序中要使用一些图片来设置界面，要使用一些音频文件来设置铃声，要使用一些动画来显示特效，要使用一些字符串来显示提示信息。那么，这些图片、音频、动画和字符串等叫做Android中的资源文件。在Eclipse创建的工程中，我们可以看到res和assets两个文件夹，是用来保存资源文件的，在assets中保存的一般是原生
Spring使用Cache、整合Ehcache 234390216 spring cache ehcache @Cacheable
Spring使用Cache 从3.1开始，Spring引入了对Cache的支持。其使用方法和原理都类似于Spring对事务管理的支持。Spring Cache是作用在方法上的，其核心思想是这样的：当我们在调用一个缓存方法时会把该方法参数和返回结果作为一个键值对存放在缓存中，等到下次利用同样的
当druid遇上oracle blob(clob) jackyrong oracle
http://blog.csdn.net/renfufei/article/details/44887371 众所周知，Oracle有很多坑, 所以才有了去IOE。在使用Druid做数据库连接池后，其实偶尔也会碰到小坑，这就是使用开源项目所必须去填平的。【如果使用不开源的产品，那就不是坑，而是陷阱了，你都不知道怎么去填坑】用Druid连接池，通过JDBC往Oracle数据库的
easyui datagrid pagination获得分页页码、总页数等信息 ldzyz007
var grid = $('#datagrid'); var options = grid.datagrid('getPager').data("pagination").options; var curr = options.pageNumber; var total = options.total; var max =
浅析awk里的数组 nigelzeng 二维数组 array 数组 awk
awk绝对是文本处理中的神器，它本身也是一门编程语言，还有许多功能本人没有使用到。这篇文章就单单针对awk里的数组来进行讨论，如何利用数组来帮助完成文本分析。有这么一组数据： abcd,91#31#2012-12-31 11:24:00 case_a,136#19#2012-12-31 11:24:00 case_a,136#23#2012-12-31 1
搭建 CentOS 6 服务器(6) - TigerVNC rensanning centos
安装GNOME桌面环境 # yum groupinstall "X Window System" "Desktop" 安装TigerVNC # yum -y install tigervnc-server tigervnc 启动VNC服务 # /etc/init.d/vncserver restart # vncser
Spring 数据库连接整理 tomcat_oracle spring bean jdbc
1、数据库连接jdbc.properties配置详解　　jdbc.url=jdbc:hsqldb:hsql://localhost/xdb 　　jdbc.username=sa 　　jdbc.password= 　　jdbc.driver=不同的数据库厂商驱动，此处不一一列举　　接下来，详细配置代码如下：　　 Spring连接池
Dom4J解析使用xpath java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常 xp9802
用Dom4J解析xml,以前没注意,今天使用dom4j包解析xml时在xpath使用处报错异常栈：java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常导入包 jaxen-1.1-beta-6.jar 解决; &nb