不二程序猿

DATA2001 期末知识点概括Week 2 - Week 12

DATA2001 期末知识点概括

文章目录

DATA2001 期末知识点概括
前言
Week 2 Data Cleaning and Exploration with Python
- 2.1 Level of Measurements and Type of Data: Categorical Data (Nominal,Dichotomous,Ordinal) , Quantitative(Interval,Ratio)
- 2.2 variance and standard 方差和标准差
- 2.3 Data Cleaning--pandas 数据清理
- 2.4 correlation statistics 相关统计
Week 3 Accessing Data in Relational Databases; Introduction to SQL
- 3.1 What is a Database?什么是数据库
- 3.2 Advantages of Database 数据库好处
- 3.3 Key Database Concepts 数据库概念
- - 3.3.1 Primary Key and Foreign Key 主键和外键
- 3.4 Kinds of Relationships (One-One , One-Many , Many-Many Relationship) 表和表的关系
Week 4 Declarative Data Analysis with SQL
- 4.1 SQL – The Structured Query Language 结构化查询语言(DDL, DML)
- 4.2 Table Constraints and Relational Keys(Primary key,Foreign keys) 表约束和关系键
- 4.3 SQL Domain Constraints 域约束
- 4.4 SQL查询语言（暂无）
Week 5 Scalable Data Analytics The Role of Indexes and Data Partitioning
- 5.1 Where is Data Stored? 数据存储在哪里
- 5.2 How to Access to Data on Secondary Storage FAST?如何快速访问二级存储上的数据
- 5.3 Alternative File Organizations 替代文件组织（Heap Files，Sorted Files）
- - 5.3.1Heap Files（Unordered）堆文件无序
  - 5.3.2 Sorted Files
  - 5.3.3 Column Store – Pros and Cons (暂无）
- 5.4 Index 索引
- - 5.4.1 Indices (指数）
  - 5.4.2 Index Example 索引例子
  - 5.4.3 Index structure choices:结构选择
  - 5.4.4 Index Definition in SQL 在SQL语言里
  - 5.4.5 Clustered Index 聚集索引
  - 5.4.6 Unclustered Index 非聚集索引
  - 5.4.7 Covering Index 覆盖索引
- 5.5 Distributed Data Management 分布式数据管理(Partitioning)
- - 5.5.1 Partitioning 分区
Week 6 Scraping Web Data
- 6.1 (儿童节快乐）Web Scraping – General Approach 网页抓取 - 一般方法
- 6.2 Robots Exclusion Standard 机器人排除标准(robots协议）
- 6.3 Is it Legal? 法外狂徒？
- 6.4 Web Page Retrieval: URLs 网页检索 URL
- 6.5 HTML – Hypertext Markup Language 超文本标记语言
- 6.6 General Structure of a Web Page 网页的一般结构
- 6.7How to Select Content in a Webpage?如何选择网页中的内容？
- 6.8 HTML Document Model (DOM): Element-Tree 文档模型 (DOM)：元素树
Week 7 Semistructured Data; NoSQL
- 7.1 Getting Data via Service-APIs (Web Services) 通过服务 API（Web 服务）获取数据
- 7.2 Semistructured Data 半结构化数据
- 7.3 HTML vs. XML
- 7.4 Logical Document Structure 逻辑文件结构
- 7.5 How to query or filter XML?如何查询或过滤 XML？
- - 7.5.1 XPath Document Model Tree XPath 文档模型树
- 7.6 Semi-Structured data versus Structured Data 半结构化数据与结构化数据
- 7.7 NoSQL
- - 7.7.1NOSQL
- 7.8 MongoDB Data Model 数据模型
- - 7.8.1 MongoDB vs. RDBMS
Week 8 Text Data Processing: Feature Extraction & Analysis
- 8.1 Text data
- 8.2 Machine Learning tasks 机器学习任务
- 8.3 Tokenisation
- 8.4 Normalisation
- 8.5 Indicator Features
- 8.6 Term Frequency Weighting
- 8.7 TF-IDF Weighting 加权
- 8.8 Vector Space Model 向量空间模型
- 8.9 Document Vectors 文档向量
Week 9 (Geo-)Spatial Data
- 9.1 Spatial Data 空间数据
- 9.2 SDBMS vs GIS
- 9.3 Object Model 对象模型
- 9.4 PostGIS: Geometry vs. Geography Type
- 更多week 9 知识点请翻阅 week_9_lecture
Week 10 Time Series Data
- 10.1 Temporal Data 时间数据
- 10.2 Temporal Support in Databases 数据库中的时间支持
- 10.3 Concepts in Temporal Databases
- - 10.3.1 Temporal Data Types 时态数据类型
  - 10.3.2 Kinds of Data 数据种类
  - 10.3.3 Kinds of Temporal Statements
  - 10.3.4 Transaction Time and Valid Time 交易时间和有效时间
Week 11 Image Data Processing
- 11.1 Image Data 图像数据
- 11.2 Types of Images种类
- 11.3 Aspects of Image Processing 图像处理方面
- 11.4 Morphological Image Processing 形态学图像处理
Week 12 Big Data
- 12.1 Big Data: Volume 量
- 12.2 Big Data: Velocity 速度
- 12.3 Big Data: Variety 多样性
- 12.4 Scale-Up 单个升级
- 12.5 The Alternative: Scale-Out 增加数量
- 12.6 MapReduce Overview
- 12.7 MapReduce Discussion
总结

前言

每周知识点来自于学校lecture，内容是每周的重点，可能会漏掉几个小的part知识点，对应在lecture上找就好。中英文为机翻，有的对照着大概理解一下就好。

Week 2 Data Cleaning and Exploration with Python

2.1 Level of Measurements and Type of Data: Categorical Data (Nominal,Dichotomous,Ordinal) , Quantitative(Interval,Ratio)

Categorical Data

A categorical variable is also known as a discrete or qualitative variable and can have two or more categories.

分类变量也称为离散变量或定性变量，可以有两个或多个类别。
It is further divided into two variants, nominal and ordinal.
它进一步分为两种变体，名义型和有序型。

These variables are sometimes coded as numerical values, or as strings.
这些变量有时被编码为数值或字符串。

Nominal Data

This is an unordered category data. This type of variable may be “label-coded” in numeric form but these numerical values have no mathematical interpretation and are just labeling to denote categories. For example, colours: black, red and white can be coded as 1, 2 and 3.
这是一个无序的类别数据。这种类型的变量可能以数字形式进行**“标签编码”**，但这些数值没有数学解释，只是用来表示类别的标签。例如，颜色：黑色、红色和白色可以编码为 1、2 和 3。

- Dichotomous Data

A dichotomous is a type of nominal data that can only have two possible values, e.g. true or false, or presence or absence. These are also sometimes referred as binary or Boolean variables.二分法是一种名义数据，它只能有两个可能的值，例如 真或假，或存在或不存在。这些有时也称为二进制或布尔变量。

e.g.: ture(1) or false(0)

Ordinal Data

This is ordered categorical data in which there is strict order for comparing the values, so a labelling as numbers is not completely arbitrary. For example, human height (small, medium and high) can be coded into numbers small = 1, medium = 2, high = 3.
这是有序的分类数据，其中比较值有严格的顺序，因此标记为数字并不是完全任意的。例如，人的身高（小、中、高）可以编码为数字小 = 1、中 = 2、高 = 3。

– Values are ordered 值是有序的
– No distance is implied 没有暗示距离
– Eg rank, agreement 例如等级、协议

Quantitative Data

Interval Data

It is a variable in which the interval between values has meaning and there is no true zero value.
它是一个变量，其中值之间的间隔有意义并且没有真正的零值。

Ratio Data

It is variable that might have a true value of zero and represents the total absence of the variable being measured. For example, it makes sense to say a Kelvin temperature of 100 is twice as hot as a Kelvin temperature of 50 because it represents twice as much the thermal energy (unlike Fahrenheit temperatures of 100 and 50).
它是真实值可能为零的变量，表示被测变量完全不存在。例如，可以说 100 的开尔文温度是 50 的开尔文温度的两倍，因为它代表了两倍的热能（与华氏温度 100 和 50 不同）。

2.2 variance and standard 方差和标准差

Samples from two populations with the same mean but different variances. The red population has mean 100 and variance 100 (SD=10) while the blue population has mean 100 and variance 2500 (SD=50).

分布较远variance较大，分布较近variance较小。

2.3 Data Cleaning–pandas 数据清理

Missing Data Handling

Pandas provides various functions for handling missing/wrong data
Part of this already included in the input functions (cf. csv_read() ) where missing values are automatically replaced with NA/NaN.
Pandas 提供了各种处理丢失/错误数据的函数
– 这部分内容已包含在输入函数中（参见 csv_read() ），其中缺失值会自动替换为 NA/NaN

data2 = data[’numGen’].dropna() 

data[‘numGen’].fillna(0, inplace=True)

data[‘numGen’].replace(to_replace=‘<Null>’, value=0, inplace=True)

Fix missing values during import

Some datasets contain placeholders for missing values
such as ‘n/a’, ‘–’ or ‘null’
Best to replace during import to avoid later problem
一些数据集包含缺失值的占位符
例如‘n/a’、‘–’或‘null’
最好在导入时更换以避免以后出现问题

import pandas as pd

missing_values = [“--”,”<Null>”]
data = pd.read_csv(‘MajorPowerStations.csv’, na_values = missing_values)
data.head()

2.4 correlation statistics 相关统计

Scipy 包括各种相关统计，得到-1到1之间的数。
-1到0表示反比，1到0表示正比，0表示毫不相干。

stats.spearmanr
stats.pearsonr

Week 3 Accessing Data in Relational Databases; Introduction to SQL

3.1 What is a Database?什么是数据库

A database is a shared collection of logically related data and its description.
数据库是逻辑相关数据及其描述的共享集合。

The database represents the entities (real-world things), the attributes (their relevant properties), and the logical relationships between the entities.
数据库表示实体（现实世界的事物）、属性（它们的相关属性）以及实体之间的逻辑关系。

3.2 Advantages of Database 数据库好处

– Data is managed, so quality can be enforced by the DBMS
管理数据，因此 DBMS 可以强制执行质量

– Improved Data Sharing 改进的数据共享
• Different users get different views of the data 不同的用户对数据有不同的看法
• Efficient concurrent access 高效的并发访问

– Enforcement of Standards 执行标准
• All data access is done in the same way 所有数据访问均以相同方式完成

– Integrity constraints, data validation rules 完整性约束、数据验证规则

– Better Data Accessibility/ Responsiveness 更好的数据可访问性/响应能力
• Use of standard data query language (SQL) 使用标准数据查询语言 (SQL)

– Security, Backup/Recovery, Concurrency 安全、备份/恢复、并发
• Disaster recovery is easier 灾难恢复更容易

Program-Data Independence 程序数据独立性
– Metadata stored in DBMS, so applications don’t need to worry about data formats 元数据存储在 DBMS 中，因此应用程序无需担心数据格式

– Data queries/updates managed by DBMS so programs don’t need to process data access routines
数据查询/更新由 DBMS 管理，因此程序不需要处理数据访问例程

– Results in:
• Reduced application development time 缩短应用程序开发时间
• Increased maintenance productivity 提高维护效率
• Efficient access 高效访问

3.3 Key Database Concepts 数据库概念

– Table – an arrangement of related information stored in columns and rows.
表格 – 存储在列和行中的相关信息的排列。

– Field / Attribute – column in a table, contains homogenous set of data.
字段/属性 – 表中的列，包含同类数据集。

– Field data types - kind of data that can be stored in a field. For example, a field whose data type is Text can store data consisting of either text or number characters, but a Number field can store only numerical data.
字段数据类型 - 可以存储在字段中的数据类型。例如，数据类型为文本的字段可以存储由文本或数字字符组成的数据，但数字字段只能存储数字数据。

– Primary Key (PK) – a field in a table whose value is uniquely identifies each record in the table. A PK cannot be null (it must be given).
主键 (PK) – 表中的字段，其值唯一标识表中的每条记录。 PK 不能为空（必须给出）。

– Record – A row in table.
记录 – 表中的一行

3.3.1 Primary Key and Foreign Key 主键和外键

Primary Key

– A primary key is a unique attribute which the database uses to identify a row in a table.
主键是数据库用来标识表中行的唯一属性。

– It is a unique, auto-incrementing ID which is filled in by the database - in other words it is NEVER NULL-– ( NULL has the special meaning in databases of “unknown” or “not given” )
它是一个唯一的、自动递增的 ID，由数据库填充 - 换句话说，它永远不会为空–（NULL 在“未知”或“未给出”的数据库中具有特殊含义）

– A primary ID number will only ever be issued once
一个主要的 ID 号码只会发出一次

Foreign Key

– When we need to refer to a record in a separate table we reference its ID as a foreign key.
当我们需要引用单独表中的记录时，我们将其 ID 作为外键引用。

– A foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table.
外键定义在第二个表中，但它指的是第一个表中的主键或唯一键

3.4 Kinds of Relationships (One-One , One-Many , Many-Many Relationship) 表和表的关系

One-One Relationship (1-1 Relationship):
One-to-One (1-1) relationship is defined as the relationship between two tables where both the tables should be associated with each other based on only one matching row.
一对一 (1-1) 关系定义为两个表之间的关系，其中两个表应仅基于一个匹配行相互关联。

One-Many Relationship (1-M Relationship):
The One-toMany relationship is defined as a relationship between two tables where a row from one table can have multiple matching rows in another table.
一对多关系定义为两个表之间的关系，其中一个表中的一行可以在另一个表中具有多个匹配行。

Many-to-Many Relationship (M-N Relationship) 多对多关系

Week 4 Declarative Data Analysis with SQL

4.1 SQL – The Structured Query Language 结构化查询语言(DDL, DML)

– SQL is the standard declarative query language for RDBMS
SQL 是 RDBMS 的标准声明式查询语言

– Describing what data we are interested in, but not how to retrieve it.
描述我们感兴趣的数据，而不是如何检索它。

– Supported commands from roughly two categories:
支持的命令大致分为两类

– DDL (Data Definition Language) 数据定义语言
• Create, drop, or alter the relation schema 创建、删除或更改关系模式
• Example:
CREATE TABLE name ( list_of_columns )

– DML (Data Manipulation Language) 数据操作语言
• for retrieval of information also called query language 用于检索信息，也称为查询语言
•Example:
INSERT, DELETE, UPDATE
SELECT … FROM … WHERE

4.2 Table Constraints and Relational Keys(Primary key,Foreign keys) 表约束和关系键

When creating a table, we can also specify Integrity Constraints for columns
创建表时，我们还可以为列指定完整性约束
– eg. domain types per attribute, or NULL / NOT NULL constraints
例如。每个属性的域类型，或 NULL / NOT NULL 约束

– Primary key: unique, minimal identifier of a relation.
主键：唯一的、最小的关系标识符。
– Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique.
示例包括员工编号、社会保险号等。这就是我们如何保证所有行都是唯一的。

– Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)
外键是使依赖关系（在关系的多方面）能够引用其父关系（在关系的一侧）的标识符

– Must refer to a candidate key of the parent relation 必须引用父关系的候选键
– Like a `logical pointer’ 就像一个“逻辑指针”

– Keys can be simple (single attribute) or composite (multiple attributes)
键可以是简单的（单属性）或复合的（多属性）

4.3 SQL Domain Constraints 域约束

SQL supports various domain constraints to restrict attribute to valid domains
SQL 支持各种域约束以将属性限制为有效域

• NULL / NOT NULL whether an attribute is allowed to become NULL (unknown)
是否允许属性变为 NULL（未知）

• DEFAULT to specify a default value
DEFAULT 指定默认值

• CHECK( condition ) a Boolean condition that must hold for every tuple in the db instance
一个布尔条件，该条件必须适用于数据库实例中的每个元组

Example:(DDL)

CREATE TABLE Student 
( 
	sid 		INTEGER 		PRIMARY KEY,
	name 		VARCHAR(20) 	NOT NULL,
	gender 		CHAR CHECK 		(gender IN ('M,'F','T')),
	birthday 	DATE,
	country 	VARCHAR(20),
	level 		INTEGER 		DEFAULT 1 CHECK (level BETWEEN 1 and 5)
);

4.4 SQL查询语言（暂无）

SQL语言不做详细介绍，基础语言可以在菜鸟教程找
链接: PostgreSQL 教程

Week 5 Scalable Data Analytics The Role of Indexes and Data Partitioning

5.1 Where is Data Stored? 数据存储在哪里

Main Memory (RAM):主要存储内存条
• Expensive 昂贵的
• Volatile 易挥发的

Secondary Storage (HDD):辅助存储硬盘
• Cheap 便宜的
• Stable 稳定的
• BIG 容量大

Tertiary Storage (e.g. Tape): 三级存储（例如磁带）
• Very Cheap 非常便宜
• Stable 稳定的

5.2 How to Access to Data on Secondary Storage FAST?如何快速访问二级存储上的数据

Key Challenge: Secondary storage needed for sheer data volume (and persistence), but it is slow.
主要挑战：纯粹的数据量（和持久性）需要二级存储，但速度很慢……

Approaches:方法
– Block-wise transfer 分块转移
• transfer data in fixed-size chunks (blocks or pages) between storage layers
在存储层之间以固定大小的块（块或页）传输数据

– Caching / Buffering 缓存/缓冲
• Keep ‘hot’ data in memory, use secondary storage for ‘cold’ data
将“热”数据保存在内存中，将“冷”数据使用二级存储

– Optimised File Organisation 优化的文件组织
• Heap Files vs. Sorted Files; Row Stores vs Column Stores
堆文件与排序文件；行存储与列存储

– Indexing 索引

– Partitioning 分区

5.3 Alternative File Organizations 替代文件组织（Heap Files，Sorted Files）

Many alternatives exist, each ideal for some situations, and not so good in others:
存在许多替代方案，每一种都适用于某些情况，但在其他情况下不太好：

– Indexes – data structures to organize records via trees or hashing
通过树或散列来组织记录的数据结构
– like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields
与排序文件一样，它们可以根据某些（“搜索键”）字段中的值加快对记录子集的搜索
– Updates are much faster than in sorted files.
更新比排序文件快得多。

5.3.1Heap Files（Unordered）堆文件无序

a record can be placed anywhere in the file where there is space (random order)
堆文件 – 记录可以放置在文件中任何有空间的地方（随机顺序）

– suitable when typical access is a file scan retrieving all records.
适用于典型访问是检索所有记录的文件扫描。

– Simplest file structure contains records in no particular order.
最简单的文件结构包含没有特定顺序的记录

–Access method is a linear scan 访问方法是线性扫描
–— In average half of the pages in a file must be read,in the worst case even the whole file
平均必须读取文件中的一半页面，在最坏的情况下甚至整个文件
–— Efficient if all rows are returned (SELECT * FROM table)
如果返回所有行则有效（SELECT * FROM table）
—– Very inefficient if a few rows are requested 如果请求几行，效率非常低

– Rows appended to end of file as they are inserted
行在插入时附加到文件末尾
—– Hence the file is unordered因此文件是无序的

– Deleted rows create gaps in file 删除的行在文件中产生间隙
–— File must be periodically compacted to recover space 必须定期压缩文件以恢复空间

5.3.2 Sorted Files

– store records in sequential order, based on the value of the search key of each record
Sorted Files – 根据每条记录的搜索键值按顺序存储记录
– best if records must be retrieved in some order, or only a ‘range’ of records is needed.
最好是必须按某种顺序检索记录，或者只需要一个“范围”的记录。

5.3.3 Column Store – Pros and Cons (暂无）

看lecture上讲的全。(偷懒一下，没人发现）

5.4 Index 索引

5.4.1 Indices (指数）

– Idea: Separate location mechanism from data storage
将定位机制与数据存储分离

– Just remember a book index:只需记住一个书籍索引
Index is a set of pages (a separate file) with pointers (page numbers) to the data page which contains the value
索引是一组页（一个单独的文件），带有指向包含值的数据页的指针（页码）

– Instead of scanning through whole book (relation) each time, using the index is much faster to navigate (less data to search)
不是每次都扫描整本书（关系），使用索引导航要快得多（要搜索的数据更少）

– Index typically much smaller than the actual data
索引通常比实际数据小得多

5.4.2 Index Example 索引例子

Here, index is on name attribute of Stations table
索引位于 Stations 表的 name 属性上
– We say name is the search key for this index (it is the attribute which we use to look up data)
name 是这个索引的搜索键（它是我们用来查找数据的属性）

5.4.3 Index structure choices:结构选择

– Tree index: search keys are stored in sorted order in index [this supports range query for search key]
树索引：搜索键按排序顺序存储在索引中[这支持搜索键的范围查询]

– Hash index: search keys are distributed uniformly across “buckets” in index using a “hash function”.
哈希索引：使用“哈希函数”在索引中的“桶”中均匀分布搜索键。

5.4.4 Index Definition in SQL 在SQL语言里

创建index：
CREATE INDEX name ON relation-name ()
例子：

CREATE INDEX StationNameIdx ON Stations(name)

To drop an index 删除索引:
DROP INDEX index-name

5.4.5 Clustered Index 聚集索引

In a clustered index, both index entries and rows with the actual data are ordered in the same way.
在聚集索引中，索引条目和包含实际数据的行都以相同的方式排序。

– The particular index structure (e.g. hash or tree) dictates how the index entries are organized in the storage structure
特定的索引结构（例如哈希或树）决定了索引条目在存储结构中的组织方式

– For a clustered index, this then dictates how the data rows are organized
对于聚集索引，这决定了数据行的组织方式

– There can be at most one clustered index on a table.
一张表上最多可以有一个聚集索引。
– e.g. the white pages of the phone book in alphabetical order
例如按字母顺序排列的电话簿白页

–CREATE TABLE statement generally creates a clustered index on primary key.
CREATE TABLE 语句通常在主键上创建聚集索引。
– To have clustered index on other attribute, 要在其他属性上设置聚集索引
in PostgreSQL use command: CLUSTER TABLE name ON Index

5.4.6 Unclustered Index 非聚集索引

– Index entries and rows are not ordered in the same way.
索引条目和行的排序方式不同。

– There can be many secondary indices on a table.
一个表上可以有很多次要索引。

– Index created by CREATE INDEX is generally an unclustered, secondary index.
CREATE INDEX 创建的索引通常是非聚集的二级索引

5.4.7 Covering Index 覆盖索引

– Goal: Is it possible to answer whole query just from an index?
目标：是否可以仅从索引中回答整个查询？

– Covering Index - an index that contains all attributes required to answer a given SQL query:覆盖索引 - 包含回答给定 SQL 查询所需的所有属性的索引：
– all attributes from the WHERE filter condition 来自 WHERE 过滤条件的所有属性
– if it is a grouping query, also all attributes from GROUP BY & HAVING
果是分组查询，还有来自 GROUP BY 和 HAVING 的所有属性
– all attributes mentioned in the SELECT clause
SELECT 子句中提到的所有属性

– Typically a multi-attribute index
通常是多属性索引

– Order of attributes is important: Prefix of the search key must be the attributes from the WHERE
属性顺序很重要：搜索关键字的前缀必须是来自 WHERE 的属性.

5.5 Distributed Data Management 分布式数据管理(Partitioning)

Two main physical design techniques:

– Data Partitioning 数据分区

– Storing sub-sets of the original data set at different places
在不同地方存储原始数据集的子集

• can be in different tables in schema on same server, or at remote sites
可以位于同一服务器或远程站点的架构中的不同表中

– Goal is to query smaller data sets & to gain scalability by parallelism
目标是查询较小的数据集并通过并行性获得可扩展性
\

– Sub-sets can be defined by
可以通过以下方式定义子集

• columns: Vertical Partitioning 列：垂直分区
• rows: Horizontal Partitioning 行：水平分区
(if each partition is stored on a different site also called Sharding)
如果每个分区都存储在不同的站点上，也称为分片）

–Data Replication (Not covered in this unit of study)
数据复制（本研究单元未涵盖）

– Storing copies (‘replicas’) of the same data at more than one place
在多个地方存储相同数据的副本（“副本”）
– Goal is fail safety / availability
目标是故障安全/可用性

5.5.1 Partitioning 分区

– Advantages of Partitioning:分区的优点：
– Easier to manage than a large table 比大桌子更容易管理

– Better availability: 更好的可用性
if one partition is down, others are unaffected if stored on different tablespace / disk
如果一个分区关闭，其他分区不受影响，如果存储在不同的表空间/磁盘上

– Helps with bulk loading, e.g for data warehouse applications
助于批量加载，例如用于数据仓库应用程序

– Queries faster on smaller partitions; can be evaluated in parallel
在较小的分区上查询速度更快；可以并行评估

Week 6 Scraping Web Data

6.1 (儿童节快乐）Web Scraping – General Approach 网页抓取 - 一般方法

– Reconnaissance 侦察

– Identify source, and check its structure and content
识别来源，并检查其结构和内容

– Webpage Retrieval 网页检索

– Download one or multiple pages from source
从源下载一个或多个页面
– Typically in a script or program that auto-generates new URLs based on website structure and its URL format
通常在根据网站结构及其 URL 格式自动生成新 URL 的脚本或程序中

– Data Extraction from webpage 从网页中提取数据
– Content parsing, raw data extraction
内容解析、原始数据提取

– Data Cleaning and transformation into required format
数据清理和转换为所需格式

– Data Storage / Analysis / combining with other data sets
数据存储/分析/与其他数据集结合

6.2 Robots Exclusion Standard 机器人排除标准(robots协议）

– Many websites provide a robots.txt file
许多网站提供 robots.txt 文件

– Meant for web crawlers who should check this content first before starting crawling a website
适用于在开始抓取网站之前应先检查此内容的网络爬虫

– Different rules in here这里有不同的规则
• Crawling/scraping allowed at all?是否允许爬行/抓取？

• Only specific subdirectories?
只有特定的子目录?

• Only certain programs (“user-agent”)?
只有某些程序（“用户代理”）？

• Which frequency (“request-rate”)?
哪个频率（“请求率”）？

Df. https://en.wikipedia.org/wiki/Robots_exclusion_standard

– Be a good net citizen: 做一个好的网民
Check, ask, don’t overload – and don’t steal (check copyright!)
检查、询问、不要超载——也不要偷窃（检查版权！）

6.3 Is it Legal? 法外狂徒？

– Web scraping per itself is not illegal, you are free to save all publicly data available on the internet to your computer.
网络抓取本身并不违法，您可以自由地将互联网上所有可用的公开数据保存到您的计算机上。

– The way you will use that data is what might be illegal.
您使用该数据的方式可能是非法的。

– Please read the website terms and conditions, and robots.txt, and make sure you are not doing anything illegal
请阅读网站条款和条件以及 robots.txt，并确保您没有做任何违法的事情

6.4 Web Page Retrieval: URLs 网页检索 URL

– URL – Uniform Resource Locator URL – 统一资源定位符

– “address” format on the web 网络上的“地址”格式
– Example:
• https://convictrecords.com.au/ships/adamant/1821

– General Format 通用格式
• protocol://site/path_to_resource
• Typical protocols: http https ftp
典型协议

– Can be scripted or programmed; more details later and in tutorials
可以编写脚本或编程；稍后和教程中的更多详细信息

6.5 HTML – Hypertext Markup Language 超文本标记语言

– Webpages are written in HTML网页是用 HTML 编写的

– Textual markup language that defines structure, content, and design of a page as well as active elements (scripts, forms, etc.)
定义页面结构、内容和设计以及活动元素（脚本、表单等）的文本标记语言

– Typically several additional files linked:通常链接几个附加文件：
• CSS - cascading style sheets CSS - 级联样式表
• Scripts, Images, videos etc. 脚本、图像、视频等

6.6 General Structure of a Web Page 网页的一般结构

– Head 头部
– title, style sheets, scripts, meta-data
标题、样式表、脚本、元数据

– Body 主体
– headings, text, lists, tables, images, forms etc.
标题、文本、列表、表格、图像、表格等

6.7How to Select Content in a Webpage?如何选择网页中的内容？

详细请看lecture

– Four options: 四个选项

– text patterns 文本模式
– DOM navigation DOM 导航
– CSS selectors CSS 选择器
– XPath expressions XPath 表达式

6.8 HTML Document Model (DOM): Element-Tree 文档模型 (DOM)：元素树

Week 7 Semistructured Data; NoSQL

7.1 Getting Data via Service-APIs (Web Services) 通过服务 API（Web 服务）获取数据

– Many website or web service provide programmable APIs which allow you to explicitly request data for a program to process, instead of pages to view in browser
许多网站或 Web 服务提供可编程 API，允许您明确请求数据以供程序处理，而不是在浏览器中查看页面

7.2 Semistructured Data 半结构化数据

– HTML, XML and JSON are examples of so-called semistructured data models
HTML、XML 和 JSON 是所谓的半结构化数据模型的示例
– data with non-rigid structure具有非刚性结构的数据

– Characteristics of semistructured data半结构化数据的特征
– Missing or additional attributes
缺少或额外的属性

– Multiple attributes
多个属性

– Nesting: semistructured objects (‘documents’) are hierarchical / have tree-structure
嵌套：半结构化对象（“文档”）是分层的/具有树状结构

– Different types in different objects
不同对象中的不同类型

– Heterogeneous collections
异构集合

Self-describing, irregular data, no a priori structure
自描述，不规则数据，无先验结构

7.3 HTML vs. XML

– While HTML is mainly for web page design, 虽然 HTML 主要用于网页设计
XML is the more structured “cousin” for data exchange XML 是更结构化的数据交换“表亲”

– Some web services can be asked to send XML rather than HTML pages
可以要求某些 Web 服务发送 XML 而不是 HTML 页面

– Also common in enterprise data exchange, or open data sets
也常见于企业数据交换或开放数据集

7.4 Logical Document Structure 逻辑文件结构

– XML refers to its objects as elements XML 将其对象称为元素

– The top-most element is called the root or document element.
最顶层的元素称为根或文档元素。

– Elements are bound by tags:元素由标签绑定

– Tree structure! (not a graph) 树结构！（不是图表）

– Solely data type for leaf elements:PCDATA (parseable character data)
叶元素的唯一数据类型：PCDATA（可解析的字符数据）

7.5 How to query or filter XML?如何查询或过滤 XML？

– DOM Navigation DOM 导航

– XML documents represent a tree structure which can be navigated using XML’s Document Object Model (DOM)
XML 文档表示可以使用 XML 的文档对象模型 (DOM) 导航的树结构

– XPath

– XPath expressions allow to query single values, node(s) or whole subtrees within one XML document
XPath 表达式允许在一个 XML 文档中查询单个值、节点或整个子树

– XQuery

– XQuery builds on XPath to specify a declarative query language over a set of XML documents
XQuery 建立在 XPath 之上，以在一组 XML 文档上指定声明性查询语言

7.5.1 XPath Document Model Tree XPath 文档模型树

7.6 Semi-Structured data versus Structured Data 半结构化数据与结构化数据

– Relational World关系世界

Schema-first, rich type system for attributes, integrity constraints
模式优先，丰富的属性类型系统，完整性约束

– “First Normal Form”: only atomic type attributes allowed
第一范式”：仅允许原子类型属性

– Semi-structured World 半结构化世界

– Self-describing data with flexible structure
结构灵活的自描述数据

– Nested data model with tree-structure
具有树结构的嵌套数据模型

– optional attributes, grammar, schema and vocabulary
可选属性、语法、模式和词汇

7.7 NoSQL

– Traditional dbms platforms were relational (SQL as query language; relational data model) and also powerful (lots of features for integrity, security, tuning), expensive, resource-intensive, hard to administer
传统的 dbms 平台是关系型的（SQL 作为查询语言；关系型数据模型）并且功能强大（许多功能用于完整性、安全性、调优）、昂贵、资源密集、难以管理

– Mostly focused on scale-up (run on powerful expensive servers to get excellent performance)
主要关注纵向扩展（在功能强大的昂贵服务器上运行以获得出色的性能）

– Rise of cloud computing shifted focus to scale-out on many commodity simple servers, with fault-tolerance
云计算的兴起将重点转移到具有容错性的许多商用简单服务器上的横向扩展

– New systems were designed, and described as “NoSQL” because they gave up features of traditional platforms
设计了新系统，并将其描述为“NoSQL”，因为它们放弃了传统平台的功能

– Simpler data model, simpler queries and updates (eg without crosstable joins or triggers), weaker guarantees for consistency and integrity
更简单的数据模型，更简单的查询和更新（例如，没有跨表连接或触发器），一致性和完整性的保证较弱

– Often open-source and sometimes free
通常是开源的，有时是免费的

7.7.1NOSQL

– Over time, the new platforms added features like joins, triggers and integrity (under pressure from users) while old platforms added support for more diverse data models
随着时间的推移，新平台增加了连接、触发器和完整性等功能（在用户压力下），而旧平台增加了对更多样化数据模型的支持

– The phrase “Not only SQL” has been used for these systems
短语“不仅是 SQL”已用于这些系统

7.8 MongoDB Data Model 数据模型

– Basically a JSON store (JSON type system)
基本上是一个 JSON 存储（JSON 类型系统）

– Flexible schema: Document in a collection do not need to have the same structure
灵活的模式：集合中的文档不需要具有相同的结构

– All documents have an object ID (_id) – either user-defined or automatically generated
所有文档都有一个对象 ID (_id) – 用户定义或自动生成

– Relationships:
either via nested documents (“embedded sub-documents”) or using references
通过嵌套文档（“嵌入式子文档”）或使用引用

7.8.1 MongoDB vs. RDBMS

Week 8 Text Data Processing: Feature Extraction & Analysis

8.1 Text data

Text data usually does not have a pre-defined data model, is unstructured and is typically text-heavy, but may contain dates, numbers and facts as well.
文本数据通常没有预定义的数据模型，是非结构化的，通常是大量文本，但也可能包含日期、数字和事实。

This results in ambiguities that make it more difficult to understand than data in structured databases.
这会导致歧义，使其比结构化数据库中的数据更难理解。

8.2 Machine Learning tasks 机器学习任务

– Supervised learning – predict a value where truth is available in the training data
监督学习 – 预测训练数据中的真实值

– Prediction 预言

– Classification (categorical - discrete labels), Regression (quantitative -numeric values)
分类（分类 - 离散标签），回归（定量 - 数值）

– Unsupervised learning – find patterns without ground truth in training data
无监督学习 – 在训练数据中找到没有基本事实的模式

– Clustering 聚类

– Probability distribution estimation 概率分布估计

– Finding association (in features) 寻找关联（在功能中）

– Dimension reduction 降维

Other tasks: Semi-supervised learning, Reinforcement learning
其他任务：半监督学习、强化学习

8.3 Tokenisation

把有用的提取出来

Split a string (document) into pieces called tokens
将字符串（文档）拆分为称为令牌的部分

– Possibly remove some characters, e.g., punctuation
可能会删除一些字符，例如标点符号

– Remove “stop words” such as “a”, “the”, “and” which are considered irrelevant
删除“停用词”，如“a”、“the”、“and”等被认为不相关的词

8.4 Normalisation

统一同一格式

Map similar words to the same token
将相似的词映射到相同的标记

– Stemming/lemmatisation 词干/词形还原

– Avoid grammatical and derivational sparseness 避免语法和派生稀疏
– E.g., “was” => “be”

– Lower casing, encoding 下壳，编码

– E.g., “Naïve” => “naive”

8.5 Indicator Features

记录

Binary indicator feature for each word in a document
文档中每个单词的二进制指示符功能

Ignore frequencies
忽略频率

8.6 Term Frequency Weighting

Term frequency 词频

– Give more weight to terms that are common in document
对文档中常见的术语给予更多权重
– TF = |occurrences of term in doc|

– Damping 阻尼

– Sometimes want to reduce impact of high counts
有时想减少高计数的影响
TF = log(|occurrences of term in doc|)

8.7 TF-IDF Weighting 加权

Inverse document frequency (IDF)逆向文档频率 (IDF)

– Give less weight to terms that are common across documents
减少文档中常见术语的权重
• deals with the problems of the Zipf distribution
处理 Zipf 分布的问题
– IDF = log(|total docs|/|docs containing term|)

– TFIDF
– TFIDF = TF * IDF

8.8 Vector Space Model 向量空间模型

Documents are represented as vectors in term space
文档在术语空间中表示为向量

– Terms are usually stems
术语通常是词干
– Document vector values can be weighted by, e.g., frequency
文档向量值可以通过例如频率加权

– Queries represented the same as documents
查询表示与文档相同

8.9 Document Vectors 文档向量

All document vectors together: Document-Term-Matrix (Feature-Matrix) 所有文档向量加在一起

Week 9 (Geo-)Spatial Data

9.1 Spatial Data 空间数据

– Spatial data is about objects and entities which have a location and/or a geometry
空间数据是关于具有位置和/或几何形状的对象和实体

– A special form is geospatial data which refers to data or information that identifies the geographic location of features and boundaries on Earth (such as localities, cities, suburbs etc)
一种特殊形式是地理空间数据，它指的是识别地球上特征和边界（例如地点、城市、郊区等）的地理位置的数据或信息

9.2 SDBMS vs GIS

– Spatial Database Management System (SDBMS)
空间数据库管理系统 (SDBMS)

– Handle large amount of spatial data stored in secondary storage.
处理存储在二级存储中的大量空间数据。

– Spatial semantics built into query language
查询语言中内置的空间语义

– Specialized index structure to access spatial data
访问空间数据的专用索引结构

– **Geographic Information System (GIS)**地理信息系统 (GIS)

– SDBMS Client SDBMS 客户端

– Characterized by a rich set of geographic analysis functions
以丰富的地理分析功能为特点

– SDBMS allows GIS to scale to large databases, which are now becoming the norm
SDBMS 允许 GIS 扩展到大型数据库，这现已成为常态

– Information in a GIS is typically organized in “layers”. GIS 中的信息通常按“层”组织。
• For example a map will have a layer of “roads”, “train stations”, “suburbs” and “water bodies”.
例如，地图将具有“道路”、“火车站”、“郊区”和“水体”层。
• GIS allows data exploration and integration across layers.
GIS 允许跨层进行数据探索和集成

9.3 Object Model 对象模型

– Object model concepts 对象模型概念

– Objects: distinct identifiable things relevant to an application
对象：与应用程序相关的不同可识别事物
• Objects have attributes and operations
对象具有属性和操作

– Attribute: a simple (e.g. numeric, string) property of an object
属性：对象的简单（例如数字、字符串）属性

– Operations: function maps object attributes to other objects
操作：函数将对象属性映射到其他对象

9.4 PostGIS: Geometry vs. Geography Type

– Geometry type: 几何类型(平面)：
– shapes on a plane; shortest path between two points is a straight line
平面上的形状；两点之间的最短路径是一条直线

– Geography type 地理类型(球体):
– Basis is a sphere; shortest path between two points is a circle arc
基础是一个球体；两点之间的最短路径是圆弧

Week 10 Time Series Data

10.1 Temporal Data 时间数据

– Almost all data is qualified with time (period or point)
几乎所有数据都用时间（周期或点）限定

– Web stores 网上商店
– Data warehousing 数据仓库
– Medical records, loans, … 医疗记录、贷款、…
– Sensor data and time series 传感器数据和时间序列
– Transport information 运输信息

10.2 Temporal Support in Databases 数据库中的时间支持

– Limited support for temporal data management in DBMSs
对 DBMS 中时态数据管理的有限支持

– Conventional (non-temporal) DBs represent a static snapshot
传统（非临时）数据库代表静态快照

– Management of temporal aspects is implemented by the application
时间方面的管理由应用程序实现
• Adds additional complexity to application programs
增加了应用程序的复杂性

– Some time data types and functions available in SQL, e.g., DATE, TIME, DATEADD(), DATEDIFF()
SQL 中可用的一些时间数据类型和函数，例如 DATE、TIME、DATEADD()、DATEDIFF()
• SQL:2011 added support for temporal tables
SQL:2011 添加了对时态表的支持
• Still very limited query support
仍然非常有限的查询支持

– A temporal database provides built-in support for the management of temporal data/time
时态数据库为时态数据/时间的管理提供内置支持

– Representation of various temporal aspects, e.g., valid time, transaction time
各种时间方面的表示，例如，有效时间、交易时间

– Support for multiple calendars and granularities
支持多种日历和粒度

– Easy formulation of complex queries over time
随着时间的推移轻松制定复杂的查询

– Queries over and modification of previous states
查询和修改以前的状态

10.3 Concepts in Temporal Databases

10.3.1 Temporal Data Types 时态数据类型

– SQL supports time instants and intervals (but no periods)
SQL 支持时间瞬间和间隔（但不支持句点）

– Instant data types: 即时数据类型：

– DATE 日期
• SQL-92: day, month and year of a time instant (from year 1 to 9999)
• Postgresql: date (no time of day) from 4713 BC to 5874897 AD

– TIMESTAMP 时间戳
• SQL-92: date + time with variable resolution of fractions of a second (default: 1ms)
• Postgresql: date + time of same range than DATE with 1 ms resolution; optional time zone

– TIME 时间
• SQL-92: hours, minutes, seconds and optional fractional digits of second
• not really a time instant (no date!); in PostgreSQL with 1ms resolution

– Interval data types: 间隔数据类型
– Various specification options, eg. Year-Month Intervals: INTERVAL YEAR TO MONTH
各种规格选项，例如。年月间隔

– Many DBMS only support time instants, but no intervals
许多 DBMS 只支持时间瞬间，但不支持时间间隔
– Must hence be simulated with two time instants (start + end)
因此必须模拟两个时间点（开始 + 结束）

10.3.2 Kinds of Data 数据种类

– User-defined time 用户定义的时间

– According to Snodgrass as ‘an uninterpreted time interval’
根据 Snodgrass 的说法，这是“未解释的时间间隔”

– E.g. a birthdate or a publication time
例如出生日期或出版时间

– Valid Time & Transaction Time
有效时间和交易时间
– Cf. following examples
参见下面的例子

– A table can be associated with none, one, two or all three kinds of time
一个表可以关联无、一种、两种或所有三种时间

10.3.3 Kinds of Temporal Statements

– Current 当前的

– “What is now?”
• E.g. “How many products do we currently have in stock?”

– Sequenced已排序

– “What was, and when?”
• E.g. “Give the sequence of how many product were in stock.”
• Or “When did the stock level fall below X in the past?”

– Very central, but not directly supported by SQL!

– Nonsequenced 无序

– “What was at any time?”
• E.g. “How many products A did we have at any time in stock?”

10.3.4 Transaction Time and Valid Time 交易时间和有效时间

– Valid time records the time when a fact is true in the real world.
有效时间记录事实在现实世界中为真的时间。

– Can move forward and backward 可以前后移动

– Transaction time records the history of database activity.
事务时间记录了数据库活动的历史。

– Only moves forward (as you cannot go back in history and change things –alas!)
只能前进（因为您无法回到历史并改变事物 – 唉！）

– Therefore allows rollback (very useful for auditing)
因此允许回滚（对审计非常有用）

Week 11 Image Data Processing

11.1 Image Data 图像数据

– Images can be described as vector graphics or raster data
图像可以描述为矢量图形或光栅数据

– Raster images 光栅图像

– Matrix with fixed number of rows and columns
具有固定行数和列数的矩阵

– Digital images consist of fixed number of picture elements, called pixels
数字图像由固定数量的图片元素组成，称为像素

– Each pixel represents brightness of a given color
每个像素代表给定颜色的亮度
• Color depth => different number of channels
颜色深度 => 不同的通道数

– Raster images can be created in multiple ways
可以通过多种方式创建光栅图像

– Digital photography / video
数码摄影/视频

– Image sensors in (scientific) instruments (e.g. satellite images, astronomy, DNA sequencers, microscopes, …)
(科学）仪器中的图像传感器（例如卫星图像、天文学、DNA 测序仪、显微镜等）
– Scanners 扫描仪

– Medical instruments (e.g. Xray, CET, MRT) 医疗器械（例如 X 射线、CET、MRT）

11.2 Types of Images种类

TrueColor or RGB Image 真彩色或RGB图像

Gray-scale image 灰度图像

Binary image 二进制图像

详细内容在lecture-11上

11.3 Aspects of Image Processing 图像处理方面

– Image Enhancement: Processing an image so that the result is more suitable for a particular application. (sharpening or de-blurring an out of focus image, highlighting edges, improving image contrast, or brightening an image, removing noise)
– 图像增强：处理图像以使结果更适合特定应用程序。（锐化或去模糊离焦图像、突出边缘、提高图像对比度或增亮图像、去除噪点）

– Image Restoration: This may be considered as reversing the damage done to an image by a known cause. (removing of blur caused by linear motion, removal of optical distortions)
图像恢复：这可以被视为逆转已知原因对图像造成的损坏。（去除线性运动引起的模糊，去除光学畸变）

– Image Segmentation: This involves subdividing an image into constituent parts, or isolating certain aspects of an image. (finding lines, circles, or particular shapes in an image, in an aerial photograph, identifying cars, trees, buildings, or roads.
– 图像分割：这涉及将图像细分为组成部分，或隔离图像的某些方面。（在图像、航拍照片中寻找线条、圆圈或特定形状，识别汽车、树木、建筑物或道路。

11.4 Morphological Image Processing 形态学图像处理

– broad set of operations that process images based on shapes.
基于形状处理图像的广泛操作集。

– Goal: removing of imperfections in images (binary or grayscale)
目标：去除图像中的缺陷（二进制或灰度）

– Morphological techniques probe an image with a small shape or template called a structuring element.
形态学技术使用称为结构元素的小形状或模板探测图像。

– The structuring element is a small binary image, i.e. a small matrix of pixels, each with a value of zero or one
结构元素是一个小的二进制图像，即一个小的像素矩阵，每个像素的值为零或一

Week 12 Big Data

12.1 Big Data: Volume 量

– very relative due to Moore’s Law 由于摩尔定律而非常相关

– What once was considered big data, is considered a main-memory problem nowadays
曾经被认为是大数据，现在被认为是主内存问题

– eg. Excel: In 2003 max 65000 rows, now max 1 million rows, still …
例如。 Excel：2003 年最多 65000 行，现在最多 100 万行，仍然…

– Nowadays: Terabyte to Exabyte 如今：太字节到艾字节

12.2 Big Data: Velocity 速度

– conventional scientific research:
常规科学研究

– months to gather data from 100s cases, weeks to analyze the data and years to publish.
几个月收集 100 个案例的数据，几周来分析数据，几年来发布。

– Example: Iris flower data set by Edgar Anderson and Ronal Fisher from 1936
示例：1936 年 Edgar Anderson 和 Ronal Fisher 设置的鸢尾花数据

– on the other end of the scale: Twitter
在天平的另一端：推特
– average 6000 tweets/sec, 500 million per day or 200 billion per year
平均每秒 6000 条推文，每天 5 亿条或每年 2000 亿条

12.3 Big Data: Variety 多样性

– Structured Data, such as CSV or RDBMS
结构化数据，例如 CSV 或 RDBMS

– Semi-structured Data, such as JSON or XML
半结构化数据，例如 JSON 或 XML

– Unstructured Data, ie. text, e-mails, images, video
非结构化数据，即。文本、电子邮件、图像、视频

– an estimated 80% of enterprise data is unstructured
估计 80% 的企业数据是非结构化的

– study by Forester Research: variety biggest challenge in Big Data
Forester Research 的研究：大数据中的多样性最大挑战

12.4 Scale-Up 单个升级

– The traditional approach: 传统方法：

– To scale with increasing load, buy more powerful, larger hardware
为了随着负载的增加而扩展，购买更强大、更大的硬件
• from single workstation
从单个工作站

• to dedicated db server
到专用数据库服务器

• to large massive-parallel database appliance
到大型海量并行数据库设备

12.5 The Alternative: Scale-Out 增加数量

A single server has limits… 单个服务器有限制……
For real Big Data processing, need to scale-out to a cluster of multiple servers (nodes):
对于真正的大数据处理，需要横向扩展到多台服务器（节点）的集群：

12.6 MapReduce Overview

– Scan large volumes of data
扫描大量数据

– Map: Extract some interesting information
地图：提取一些有趣的信息

– Shuffle and sort intermediate results
对中间结果进行洗牌和排序

– Reduce: aggregate intermediate results
减少：聚合中间结果

– Generate final output
生成最终输出

– Key idea: provide an abstraction at the point of these two operations (map and reduce)
关键思想：在这两个操作（map 和 reduce）的点上提供一个抽象

– Higher-order functions高阶函数
– Cf. map functions in functional programming languages such as Lisp or Haskell
参见函数式编程语言（如 Lisp 或 Haskell）中的映射函数

12.7 MapReduce Discussion

Pros:优点

– very flexible due to the user-defined functions 由于用户定义的功能而非常灵活

– great scalability because FP approach 伟大的可扩展性，因为 FP 方法

– easy parallelism due to stateless functions 由于无状态函数而易于并行

– fault-tolerance 容错

Cons: 缺点
– requires programming skills and functional thinking 需要编程技能和函数式思维
– relatively low-level, even filtering to be coded manually相对低级，甚至过滤手动编码
– complex frameworks 复杂的框架
– batch-processing oriented 面向批处理

总结

祝大家，考的全会，懵的全对！！！！！

你可能感兴趣的:(悉尼大学,DATA2001,大数据,postgresql,经验分享,恰饭)

网络空间安全专业发展历程及开设院校菜根Sec 安全网络安全网络安全高校网络空间安全信息安全
一、专业发展历程1.早期探索阶段（1990年代末—2000年代初）（1）背景：1990年代互联网进入中国，计算机病毒、黑客攻击等问题逐渐显现，社会对信息安全人才的需求开始萌芽。（2）高校尝试：1997年，西安电子科技大学在密码学领域积累深厚，率先开设与信息安全相关的选修课程和研究方向。1998年，武汉大学依托其计算机学院和数学学科优势，开始探索信息安全方向的本科教育。2.正式设立本科专业（2001
网络空间安全专业培养方案及学习建议菜根Sec 学习网络安全网络空间安全信息安全大学专业
一、网络空间安全专业培养方案（示例）本文以武汉大学网络空间安全专业培养方案为例，列举本科期间学习的课程。详情参见：https://cse.whu.edu.cn/rcpy/lxspy/zyjs/wlkjaqzypyfa.htm1、培养目标网络空间安全学科是综台计算机、通信、电子、数学、物理、生物、管理、法律和教育等学科，并发展演绎而形成的交叉学科。培养的本科生要求掌握网络空间安全学科的基本理论、基本
哈尔滨工业大学DeepSeek公开课人工智能：大模型原理技术与应用-从GPT到DeepSeek｜附视频下载方法你觉得205 人工智能机器学习大数据 ai 知识图谱 python 运维
导读INTRODUCTION今天继续哈尔滨工业大学车万翔教授带来了一场主题为“DeepSeek技术前沿与应用”的报告。本报告深入探讨了大语言模型在自然语言处理（NLP）领域的核心地位及其发展历程，从基础概念出发，延伸至语言模型在机器翻译、拼音输入法、语音识别等任务中的关键作用。强调了语言模型不仅辅助其他NLP任务，本身也蕴含大量知识，如地理信息、语义理解和推理能力。随着技术的发展，尤其是trans
计算机网络笔记(四)——1.4计算机网络在我国的发展 xiao--xin 计算机网络计算机网络笔记面试学习
一、早期探索与奠基（1980-1994年）国际联网的起点1986年：中国启动首个国际联网项目“中国学术网（CANET）”，由北京计算机应用技术研究所与德国卡尔斯鲁厄大学合作，目标是实现电子邮件通信。1987年9月20日：中国发出第一封电子邮件《越过长城，走向世界》，标志着中国首次接入国际互联网。科研网络的突破1989年：中关村地区教育与科研示范网络（NCFC）立项，由中国科学院、北京大学、清华大学
html5 相册翻转效果,HTML5 css3：3D旋转木马效果相册岑依惜 html5 相册翻转效果
这篇博客的目的是因为上篇HTML5CSS3专题诱人的实例CSS3打造百度贴吧的3D翻牌效果中有个关于CSS3D效果的比较重要的知识点没讲到，就是perspective和tranlateY效果图：嘿嘿，我把大学毕业时的一些照片，做成旋转木马，绕着我大文理旋转，不忘母校的培育之恩~1、perspectiveperspective属性包括两个属性：none和具有单位的长度值。其中perspective属
大数据技术实战---项目中遇到的问题及项目经验一个“不专业”的阿凡大数据
问题导读：1、项目中遇到过哪些问题？2、Kafka消息数据积压，Kafka消费能力不足怎么处理？3、Sqoop数据导出一致性问题？4、整体项目框架如何设计？项目中遇到过哪些问题7.1Hadoop宕机（1）如果MR造成系统宕机。此时要控制Yarn同时运行的任务数，和每个任务申请的最大内存。调整参数：yarn.scheduler.maximum-allocation-mb（单个任务可申请的最多物理内存
模式搜索+扩散模型：FlowMo重构图像Token化的技术革命芯作者 DD：日记重构
图像Token化作为现代生成式AI系统的核心技术，长期面临对抗性训练不稳定、潜在空间冗余等挑战。斯坦福大学李飞飞与吴佳俊团队提出的FlowMo（FlowtowardsModes）创新性地融合模式搜索与扩散模型，在多个关键维度突破传统方法局限，为图像压缩与重建开辟新路径。本文将深度解析其技术突破、实现原理及行业影响。一、传统图像Token化的困境与FlowMo的破局之道1.1传统方法的三大桎梏传统T
Apache大数据旭哥优选大数据选题 Apache大数据旭大数据定制选题 java hadoop spark 开发语言 idea hive 数据库架构
定制旭哥服务，一对一，无中介包安装+答疑+售后态度和技术都很重要定制按需求做要求不高就实惠一点定制需提前沟通好怎么做，这样才能避免不必要的麻烦python、flask、Django、mapreduce、mysqljava、springboot、vue、echarts、hadoop、spark、hive、hbase、flink、SparkStreaming、kafka、flume、sqoop分析+推
用c++语言编写的小程序,利用C++编写一些有趣的小程序瑞士鲁迅用c++语言编写的小程序
虽然说中学没有参加过信息学竞赛，但相对来说，我接触编程算是比较早的。和我同龄的人，若小学参加过计算机竞赛，大概还对PC-logo有点印象，这算是我对编程的最初体验，这里就不叙述。到了初中，便按着规定学习了一点Pascal，在家里也自己写过一点极其简单的程序。高中会考也需要学习VisualBasic，但学的十分浅显，并无什么收获。C语言是大学的必修课，于是在军训期间，我就买来《C++Primer》自
Java后端开发技术详解小二爱编程· java 开发语言
Java作为一门成熟的编程语言，已广泛应用于后端开发领域。其强大的生态系统和广泛的支持库使得Java成为许多企业和开发者的首选后端开发语言。随着云计算、微服务架构和大数据技术的兴起，Java后端开发的技术栈也不断演进。本文将详细介绍Java后端开发的核心技术，包括Java基础、常见框架、数据库操作、缓存技术、异步编程等。1.Java基础：理解面向对象的编程Java是一种面向对象的编程语言，面向对象
基于springboot的社区团购系统设计 Olivia-gogogo spring boot 后端 java
一、引言在当今数字化时代，信息技术正以前所未有的速度渗透到社会的各个领域，深刻地改变着人们的生活和工作方式。教育领域也不例外，随着高等教育的普及和招生规模的不断扩大，大学生入学审核工作面临着越来越大的挑战。传统的人工入学审核方式已难以满足现代教育管理的需求，暴露出诸多弊端。传统人工入学审核方式效率低下。在每年的招生季，高校招生工作人员需要面对大量的入学申请材料，这些材料不仅数量庞大，而且种类繁多，
**[特殊字符] 计算机领域创作挑战赛，邀你共绘文字海洋！** 爱编程的Loren 活动文章活动文章
亲爱的大学博主们：大家好！你是否热爱写作，渴望在文字的海洋中遨游，展示自己的创作才华？你是否对计算机领域充满热情，希望将你的知识和见解分享给更多人？如果你对这两点都给出了肯定的答案，那么这个创作活动就是为你量身定制的！ **创作挑战赛火热开启** 我们诚挚地邀请你参加为期14天的创作挑战赛！这是一个以写作博客为目的的创作活动，旨在鼓励大学生博主们挖掘自己的创作潜能，展现自己的写作才
IDC权威认证！永洪科技入选 IDC「GBI图谱」，点亮生成式 BI 价值灯塔永洪科技科技人工智能 BI 大数据数据分析
大数据市场正在稳步前进，生成式AI已成为厂商服务的重点方向，其发展离不开数据底座建设和数据工程管理，反过来AI也会帮助开发运维人员、业务人员和管理层更好地使用、查询数据。IDC调研数据显示，在生成式AI的驱动下，未来5年企业在数据管理和数据分析基础设施建设的投资增长率将分别达到8.7%和9.2%。近日，国际咨询机构IDC发布了《中国数据智能市场生态图谱V5.0》，在这一领域，永洪科技以其创新前沿的
打造金融数据新引擎，看永洪科技助力头部农信社搭建一站式分析平台永洪科技金融数据可视化 BI 数据分析大数据
在数字化转型的浪潮中，金融行业作为经济发展的核心引擎，正加速探索数字化、智能化的新路径。永洪科技，近日成功助力某省农村信用社联合社（简称：Z企业）完成了其数字化转型的重要一步，通过部署先进的商业智能解决方案，为Z企业的业务升级与效能提升注入了强劲动力。随着智能金融时代的来临，以大数据、人工智能、移动互联等新兴技术为核心的金融科技持续赋能银行金融业务数字化、智能化、开放化的发展，为金融机构营销体系的
清华出品DeepSeek教程7版合集，一站式掌握前沿技术 2501_91206263 pdf
亲爱的读者们，今天要给大家介绍一套由清华大学出品的超硬核教程——DeepSeek教程7版合集！「DeepSeek清华资料」共7册链接：https://pan.quark.cn/s/b8d8760976ca「DeepSeek使用手册大全」链接：https://pan.quark.cn/s/52c234062a2e「DeepSeek资料合集」链接：https://pan.quark.cn/s/71c8
读书笔记五 ---大数据之路--数仓分层 qq_38215991 big data 大数据
数据分层在流式数据模型中,数据模型整体上分为五层。ODS层跟离线系统的定义一样,ODS层属于操作数据层,是直接从业务系统采集过来的最原始数据（进行了数据清洗）,包含了所有业务的变更过程,数据粒度也是最细的。在这一层,实时和离线在源头上是统一的,这样的好处是用同一份数据加工出来的指标,口径基本是统一的,可以更方便进行实时和离线问数据比对。例如:原始的订单变更记录数据、服务器引擎的访同日志。（原始数据
春招，作为普通IT实习生的我有哪些想法和准备？十八朵郁金香学习
想法&随记对即将毕业的大学生来说，秋招和春招是求职的好机会，也是一次尝试向社会进行"自我推销"的机会，体现自己的综合能力，是“意满签Offer”？还是“露出马脚，社死场面”？“准备-天时地利人和-行动”后，我们会有答案。话题回到自己，春招，我有哪些想法？去年，在学校举办校园秋招招聘之前，我已经通过线上投递+线下面试找到实习，在一家人数规模100~200的公司当前端实习生，到现在已有四个多月，没有经
统一的视频动作模型三谷秋水计算机视觉机器学习人工智能计算机视觉深度学习机器学习人工智能
25年3月来自斯坦福大学的论文“UnifiedVideoActionModel”。统一的视频和动作模型对机器人技术具有重大意义，其中视频为动作预测提供丰富的场景信息，而动作为视频预测提供动态信息。然而，有效地结合视频生成和动作预测仍然具有挑战性，当前基于视频生成的方法在动作准确性和推理速度方面难以与直接策略学习的性能相匹配。为了弥补这一差距，引入统一的视频动作模型（UVA），它联合优化视频和动作预
麒麟服务器操作系统PostgreSQL环境部署手册太极淘麒麟操作系统管理工具 linux 服务器
软件简介PostgreSQL是一个免费的对象-关系数据库服务器(ORDBMS)，在灵活的BSD许可证下发行。ORDBMS（对象关系数据库系统）是面向对象技术与传统的关系数据库相结合的产物，查询处理是ORDBMS的重要组成部分，它的性能优劣将直接影响到DBMS的性能。软件环境操作系统环境操作系统版本操作系统架构银河麒麟服务器操作系统V10SP系列X86-64银河麒麟服务器操作系统V10SP系列ARM
pip设置国内源 pip设置国内镜像程序员leon Linux系列 pip python
以下是配置pip国内镜像源的完整方法及注意事项，综合主流配置方案和常见问题解决方案：一、临时使用国内源（单次有效）安装时通过-i参数指定镜像源：pipinstall包名-ihttps://pypi.tuna.tsinghua.edu.cn/simple--trusted-hostpypi.tuna.tsinghua.edu.cn-推荐源地址*：清华大学：https://pypi.tuna.tsin
使用LangGraph迁移MapReduceDocumentsChain进行长文档的摘要 dgay_hua python
在大数据处理和文本分析领域，MapReduce是一种非常重要的策略，用于处理和分析大型数据集。具体到文本处理方面，MapReduceDocumentsChain구현了一种map-reduce策略，可以有效地处理长文本。本文将介绍如何从MapReduceDocumentsChain迁移到LangGraph，并探讨LangGraph在流处理、检查点恢复等方面的优势。技术背景介绍MapReduceDoc
Python用Bokeh处理大规模数据可视化的最佳实践一键难忘 Bokeh python 开发语言
用Bokeh处理大规模数据可视化的最佳实践在大规模数据处理和分析中，数据可视化是一个至关重要的环节。Bokeh是一个在Python生态中广泛使用的交互式数据可视化库，它具有强大的可扩展性和灵活性。本文将介绍如何使用Bokeh处理大规模数据可视化，并提供一些最佳实践和代码实例，帮助你高效地展示大数据集中的重要信息。1.为什么选择Bokeh？Bokeh是一个专为浏览器呈现而设计的可视化库，它支持高效渲
使用LangChain加载College Confidential数据 scaFHIO langchain python
#使用LangChain加载CollegeConfidential数据##技术背景介绍CollegeConfidential是一个提供有关3800+所大学和学院信息的平台。它被广泛使用于教育咨询和申请指导领域。为了方便开发者从CollegeConfidential获取数据，我们可以使用LangChain的`CollegeConfidentialLoader`模块进行加载和处理。##核心原理解析La
Python, C ++开发工厂管理APP Geeker-2025 python c++
开发一款通用的**工厂管理App**，结合Python和C++的优势，可以实现高效的后端数据处理、实时的生产监控以及用户友好的前端界面。以下是一个详细的开发方案，涵盖技术选型、功能模块、开发步骤等内容。##技术选型###后端（Python）-**编程语言**：Python-**Web框架**：Django或Flask-**数据库**：PostgreSQL或MySQL-**实时通信**：WebSoc
分页优化之——游标分页 PhilipJ0303 Java面试 java 数据库优化游标分页分页查询
游标分页（Cursor-basedPagination）是一种高效的分页方式，特别适用于大数据集和无限滚动的场景。与传统的基于页码的分页（如page=1&size=10）不同，游标分页通过一个唯一的游标（通常是时间戳或唯一ID）来标记分页的位置，避免了传统分页在数据变动时的重复或遗漏问题。以下是游标分页在前后端的实现方式：1.游标分页的核心概念游标（Cursor）：游标是一个唯一标识符，通常是数据
轻松入门Apache SeaTunnel：数据集成利器窝窝和牛牛 SeaTunnel ETL 数据集成
文章目录轻松入门ApacheSeaTunnel：数据集成利器什么是SeaTunnel基本原理运行流程SeaTunnelvsDataX：两大数据集成工具对比实战场景：MySQL数据同步至ElasticsearchSeaTunnel实现方案DataX实现方案实现原理对比底层依赖环境方案优缺点分析快速上手环境准备简单示例总结轻松入门ApacheSeaTunnel：数据集成利器什么是SeaTunnelAp
AI学习手册合集｜零基础入门宝典 2501_91234994 pdf
DeepSsek资料包：https://pan.quark.cn/s/2672e0be6178现在AI持续火热，越来越多的人开始使用AI辅助工作，大大提高了生产效率。甚至很多自由职业者，通过学习DeepSeek，在互联网淘金日入过万，登上热搜。普通人如何高效入门AI?清华团队亲自下场教学！自从第一弹AI学习手册《DeepSeek入门到精通》火了后，清华大学接连发布多版AI进阶资料，即便零基础也能轻
探索数据安全新境界：Apache Spark SQL Ranger Security插件深度揭秘乌昱有Melanie
探索数据安全新境界：ApacheSparkSQLRangerSecurity插件深度揭秘项目地址:https://gitcode.com/gh_mirrors/sp/spark-ranger随着大数据的爆炸性增长，数据安全性成为了企业不可忽视的核心议题。在这一背景下，【ApacheSparkSQLRangerSecurityPlugin】以其强大的数据访问控制能力脱颖而出，成为数据处理领域的明星级
Java 大视界 -- Java 大数据在智能医疗远程会诊与专家协作中的技术支持（146）青云交大数据新视界 Java 大视界 java 大数据智能医疗远程会诊专家协作数据安全病例诊断
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！一、欢迎加入【福利社群】点击快速加入：青云交灵犀技韵交响盛汇福利社群点击快速加入2：2024CSDN博客之星创作交流营（NEW)二、本博客的精华专栏：大数据新视
postgreSQL数据库常用语法东木月关系型数据库：MySQL PostgreSQL postgresql 数据库 database
postgreSQL常用语法1、CRUD增删改查创建用户角色createuserldcwithpassword'ldc-';创建数据库createDATABASEschool_infoENCODING='utf-8'--指定字符集TABLESPACE=
ztree设置禁用节点 3213213333332132 JavaScript ztree json setDisabledNode Ajax
ztree设置禁用节点的时候注意，当使用ajax后台请求数据,必须要设置为同步获取数据，否者会获取不到节点对象，导致设置禁用没有效果。 $(function(){ showTree(); setDisabledNode(); });
JVM patch by Taobao bookjovi java HotSpot
在网上无意中看到淘宝提交的hotspot patch，共四个，有意思，记录一下。 7050685：jsdbproc64.sh has a typo in the package name 7058036：FieldsAllocationStyle=2 does not work in 32-bit VM 7060619：C1 should respect inline and
将session存储到数据库中 dcj3sjt126com sql PHP session
CREATE TABLE sessions ( id CHAR(32) NOT NULL, data TEXT, last_accessed TIMESTAMP NOT NULL, PRIMARY KEY (id) ); <?php /** * Created by PhpStorm. * User: michaeldu * Date
Vector 171815164 vector
public Vector<CartProduct> delCart(Vector<CartProduct> cart, String id) { for (int i = 0; i < cart.size(); i++) { if (cart.get(i).getId().equals(id)) { cart.remove(i);
各连接池配置参数比较 g21121 连接池
排版真心费劲，大家凑合看下吧，见谅~ Druid DBCP C3P0 Proxool 数据库用户名称 Username Username User 数据库密码 Password Password Password 驱动名
[简单]mybatis insert语句添加动态字段 53873039oycg mybatis
mysql数据库,id自增,配置如下： <insert id="saveTestTb" useGeneratedKeys="true" keyProperty="id" parameterType=&
struts2拦截器配置云端月影 struts2拦截器
struts2拦截器interceptor的三种配置方法方法1. 普通配置法 <struts> <package name="struts2" extends="struts-default"> &
IE中页面不居中，火狐谷歌等正常 aijuans IE中页面不居中
问题是首页在火狐、谷歌、所有IE中正常显示，列表页的页面在火狐谷歌中正常，在IE6、7、8中都不中，觉得可能那个地方设置的让IE系列都不认识，仔细查看后发现，列表页中没写HTML模板部分没有添加DTD定义，就是<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3
String,int,Integer,char 几个类型常见转换 antonyup_2006 html sql .net
如何将字串 String 转换成整数 int? int i = Integer.valueOf(my_str).intValue(); int i=Integer.parseInt(str); 如何将字串 String 转换成Integer ? Integer integer=Integer.valueOf(str); 如何将整数 int 转换成字串 String ? 1.
PL/SQL的游标类型百合不是茶显示游标(静态游标)隐式游标游标的更新和删除 %rowtype ref游标(动态游标)
游标是oracle中的一个结果集,用于存放查询的结果; PL/SQL中游标的声明; 1,声明游标 2,打开游标(默认是关闭的); 3,提取数据 4,关闭游标注意的要点:游标必须声明在declare中,使用open打开游标,fetch取游标中的数据,close关闭游标隐式游标:主要是对DML数据的操作隐
JUnit4中@AfterClass @BeforeClass @after @before的区别对比 bijian1013 JUnit4 单元测试
一.基础知识 JUnit4使用Java5中的注解（annotation），以下是JUnit4常用的几个annotation： @Before：初始化方法对于每一个测试方法都要执行一次（注意与BeforeClass区别，后者是对于所有方法执行一次）@After：释放资源对于每一个测试方法都要执行一次（注意与AfterClass区别，后者是对于所有方法执行一次
精通Oracle10编程SQL(12)开发包 bijian1013 oracle 数据库 plsql
/* *开发包 *包用于逻辑组合相关的PL/SQL类型（例如TABLE类型和RECORD类型）、PL/SQL项（例如游标和游标变量）和PL/SQL子程序（例如过程和函数） */ --包用于逻辑组合相关的PL/SQL类型、项和子程序，它由包规范和包体两部分组成 --建立包规范：包规范实际是包与应用程序之间的接口，它用于定义包的公用组件，包括常量、变量、游标、过程和函数等 --在包规
【EhCache二】ehcache.xml配置详解 bit1129 ehcache.xml
在ehcache官网上找了多次，终于找到ehcache.xml配置元素和属性的含义说明文档了，这个文档包含在ehcache.xml的注释中！ ehcache.xml ： http://ehcache.org/ehcache.xml ehcache.xsd ： http://ehcache.org/ehcache.xsd ehcache配置文件的根元素是ehcahe ehcac
java.lang.ClassNotFoundException: org.springframework.web.context.ContextLoaderL 白糖_ java eclipse spring tomcat Web
今天学习spring+cxf的时候遇到一个问题：在web.xml中配置了spring的上下文监听器： <listener> <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class> </listener> 随后启动
angular.element boyitech AngularJS AngularJS API angular.element
angular.element 描述: 包裹着一部分DOM element或者是HTML字符串，把它作为一个jQuery元素来处理。（类似于jQuery的选择器啦）如果jQuery被引入了，则angular.element就可以看作是jQuery选择器，选择的对象可以使用jQuery的函数；如果jQuery不可用，angular.e
java-给定两个已排序序列，找出共同的元素。 bylijinnan java
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class CommonItemInTwoSortedArray { /** * 题目：给定两个已排序序列，找出共同的元素。 * 1.定义两个指针分别指向序列的开始。 * 如果指向的两个元素
sftp 异常，有遇到的吗？求解 Chen.H java jcraft auth jsch jschexception
com.jcraft.jsch.JSchException: Auth cancel at com.jcraft.jsch.Session.connect(Session.java:460) at com.jcraft.jsch.Session.connect(Session.java:154) at cn.vivame.util.ftp.SftpServerAccess.connec
[生物智能与人工智能]神经元中的电化学结构代表什么? comsci 人工智能
我这里做一个大胆的猜想,生物神经网络中的神经元中包含着一些化学和类似电路的结构,这些结构通常用来扮演类似我们在拓扑分析系统中的节点嵌入方程一样,使得我们的神经网络产生智能判断的能力,而这些嵌入到节点中的方程同时也扮演着"经验"的角色.... 我们可以尝试一下...在某些神经
通过LAC和CID获取经纬度信息 dai_lm lac cid
方法1：用浏览器打开http://www.minigps.net/cellsearch.html，然后输入lac和cid信息(mcc和mnc可以填0)，如果数据正确就可以获得相应的经纬度方法2：发送HTTP请求到http://www.open-electronics.org/celltrack/cell.php?hex=0&lac=<lac>&cid=&
JAVA的困难分析 datamachine java
前段时间转了一篇SQL的文章（http://datamachine.iteye.com/blog/1971896），文章不复杂，但思想深刻，就顺便思考了一下java的不足，当砖头丢出来，希望引点和田玉。 -----------------------------------------------------------------------------------------
小学5年级英语单词背诵第二课 dcj3sjt126com english word
money 钱 paper 纸 speak 讲，说 tell 告诉 remember 记得，想起 knock 敲，击，打 question 问题 number 数字，号码 learn 学会，学习 street 街道 carry 搬运，携带 send 发送，邮寄，发射 must 必须 light 灯，光线，轻的 front
linux下面没有tree命令 dcj3sjt126com linux
centos p安装 yum -y install tree mac os安装 brew install tree 首先来看tree的用法 tree 中文解释：tree 功能说明：以树状图列出目录的内容。语　　法：tree [-aACdDfFgilnNpqstux][-I <范本样式>][-P <范本样式
Map迭代方式，Map迭代，Map循环蕃薯耀 Map循环 Map迭代 Map迭代方式
Map迭代方式，Map迭代，Map循环 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年
Spring Cache注解+Redis hanqunfeng spring
Spring3.1 Cache注解依赖jar包：  <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-redis</artifactId>
Guava中针对集合的 filter和过滤功能 jackyrong filter
在guava库中，自带了过滤器(filter)的功能，可以用来对collection 进行过滤，先看例子： @Test public void whenFilterWithIterables_thenFiltered() { List<String> names = Lists.newArrayList("John"
学习编程那点事 lampcy 编程 android PHP html5
一年前的夏天，我还在纠结要不要改行，要不要去学php？能学到真本事吗？改行能成功吗？太多的问题，我终于不顾一切，下定决心，辞去了工作，来到传说中的帝都。老师给的乘车方式还算有效，很顺利的就到了学校，赶巧了，正好学校搬到了新校区。先安顿了下来，过了个轻松的周末，第一次到帝都，逛逛吧！接下来的周一，是我噩梦的开始，学习内容对我这个零基础的人来说，除了勉强完成老师布置的作业外，我已经没有时间和精力去
架构师之流处理---------bytebuffer的mark,limit和flip nannan408 ByteBuffer
1.前言。如题，limit其实就是可以读取的字节长度的意思，flip是清空的意思，mark是标记的意思。 2.例子. 例子代码: String str = "helloWorld"; ByteBuffer buff = ByteBuffer.wrap(str.getBytes()); Sy
org.apache.el.parser.ParseException: Encountered " ":" ": "" at line 1, column 1 Everyday都不同 $转义 el表达式
最近在做Highcharts的过程中，在写js时，出现了以下异常：严重: Servlet.service() for servlet jsp threw exception org.apache.el.parser.ParseException: Encountered " ":" ": "" at line 1,
用Java实现发送邮件到163 tntxia java实现
/* 在java版经常看到有人问如何用javamail发送邮件？如何接收邮件？如何访问多个文件夹等。问题零散，而历史的回复早已经淹没在问题的海洋之中。本人之前所做过一个java项目，其中包含有WebMail功能，当初为用java实现而对javamail摸索了一段时间，总算有点收获。看到论坛中的经常有此方面的问题，因此把我的一些经验帖出来，希望对大家有些帮助。此篇仅介绍用
探索实体类存在的真正意义 java小叶檀 POJO
一. 实体类简述实体类其实就是俗称的POJO,这种类一般不实现特殊框架下的接口，在程序中仅作为数据容器用来持久化存储数据用的 POJO（Plain Old Java Objects）简单的Java对象它的一般格式就是 public class A{ private String id; public Str