Outline

5.1 Information Theory
5.2 Information Technology
5.3 Data quality
5.4 Data cleaning
5.5 Data fusion
5.6 Data storage
5.7 Data mining
5.8 Multimedia information processing

5.3 Data quality 数据质量

Uncertain Data 不确定数据

Data uncertainty occur during:

Name	名字
Data collection	数据收集
Data transmission	数据传输
Data processing	数据处理

Causes of Data Uncertainty

Name	名字
Environmental factors	环境因素
Low battery power	电池电量低
Packet losses	丢包

Classification of Data Uncertainty

Source Classification 根据不确定数据的来源分类（重点）

Name	实例	翻译
Undesirable uncertainty	Noisy sensor data
	Imprecise GPS Data
	Unreliable extracted/integrated data	不可靠的提取/集成数据
Desirable uncertainty	Medical data with generalized attributes	具有通用属性的医疗数据
	Cloaked trajectory data	隐藏的轨迹数据

Granularity Classification 根据粒度分类

Name	翻译
Tuple Uncertainty	元组的不确定性
Attribute Uncertainty	属性不确定性

Correlations Classification 根据相互关系分类

Name	翻译
Independent Uncertainty	独立的不确定性
Correlated Uncertainty	相关的不确定性
Uncertainty with Local Correlations	局部相关不确定性

Meaning of Data Quality 数据质量的意义（重点）

Generally, you have a problem if the data doesn’t mean what you think it does, or should.
通常情况下，如果数据的含义与您认为的不同，或者不应该相同，那么就会出现问题
Data quality problems are expensive and pervasive.
数据质量问题昂贵且普遍存在

Conventional Definition of Data Quality 数据质量的常规标准（定义

Name	翻译	解释
Accuarcy	精度	recorded correctly
Completeness	完整	All data was recorded
Uniqueness	独一	recorded once
Timeliness	及时	The data is kept up to date
Consistency	一致	The data agrees with itself

5.4 Data Cleaning 数据清理

the process of detecting and correcting (or removing) errors and inconsistencies from data in order to improve the quality of data.
To identifying incomplete, incorrect, inaccurate, irrelevant, etc.
从数据中检测和纠正(或消除)错误和不一致以提高数据质量的过程。
该技术目的在于识别不完整、不正确、不准确、不相关等。

Data cleaning tasks 数据清洗的任务（重点）

Name	翻译
Fill in missing values	填充缺失的值
Identify outliers and smooth out noisy data	识别异常值并平滑噪声数据
Correct inconsistent data	纠正不一致的数据
Resolve redundancy caused by data integration	解决数据集成造成的冗余

Methods to Handle Noisy Data

Name	解释
Binning	装箱法，把数据按箱处理Smooth掉边缘数据
Regression	回归函数拟合
Clustering	聚类，检测到不属于大类的元素，删掉
Combined inspection	计算机和人工检查相结合

Sensor Cleaning Pipeline

Uses temporal and spatial characteristics of sensor data
利用传感器数据的时空特性

Step 1: Point

Operates: Single value of sensor stream.
操作:单值传感器流。
Purpose: Filter individual values
目的:过滤单独的值
① Errant (dirty / faulty) RFID tags
错误的RFID标签
② Obvious outliers
明显的异常值
③ Conversion of raw data into tuples
将原始数据转换为元组

Step 1: Point

Step 2: Smoothing

Purpose: Interpolates (inserts) lost readings
目的:插入丢失的读数
①Temporal interpolation
时间插值
②Outlier detection
异常值检测
Method: Window based queries
方法:基于窗口的查询

Step 2: Smoothing

Step 3: Merge

Purpose: Spatial interpolation
目的:空间插值
例如:在一个空间颗粒中，通过计算来自不同尘埃的读数的平均值，并忽略偏离平均值两个偏差之外的单个读数。

Step 3: Merge

Step 4: Arbitrate 仲裁

Purpose: Remove
目的：删除
① conflicting readings
冲突的读数
② de-duplication
重复数据删除

Step 4: Arbitrate

Step 5: Virtualize 虚拟化

Purpose: Multi-source integration
目的:多源集成

Step 5: Virtualize

Data Fusion 数据融合

概念（重点）
Data fusion combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source.
数据融合将来自多个来源的数据组合起来，并收集这些信息，以实现推断，这将比通过单一来源实现更有效和更准确。
填空题
Sensors only give an estimate of the measured physical property
传感器只能对测量到的物理性质作出估计。
Nature of errors often determine the preferred fusion algorithm
误差的性质往往决定了融合算法的首选。

Three Processing Architectures 三个处理架构

Name	翻译
Data-level fusion	数据级融合
Feature-level fusion	特征级融合
Decision-level fusion	决策级融合

Data-level fusion: Direct fusion of sensor data
数据级融合: 传感器数据的直接融合，
Feature-level fusion: Representation of sensor data via feature vectors, with subsequent fusion of the feature vectors
特征级融合: 通过特征向量表示传感器数据，然后融合特征向量
Decision-level fusion: Processing of each sensor to achieve high-level inferences or decisions, which are subsequently combined.
决策级融合 :对每个传感器进行处理，以实现高级推理或决策，然后将这些推理或决策组合在一起。

Data Fusion

Data-level Fusion

使用条件: if the sensors are measuring the same physical phenomena.
如果传感器测量的是相同的物理现象

Data-level Fusion

Data Storage 数据存储

Database System

Database: collection of persistent data
数据库:持久数据的收集
Data: Known facts that can be recorded and have an implicit meaning.
数据:可以记录并具有隐含意义的已知事实。
Database Management System (DBMS): software system that supports creation, population, and querying of a database
数据库管理系统(DBMS):支持数据库的创建、填充和查询的软件系统
Database System: DBMS + Database
数据库系统:DBMS +数据库

DBMS 功能

Name	解释
Define	定义特定的数据库
Construct	构造初始数据库
Manipulate	增删改查数据库
Share a database	数据库共享

Define a database.
根据数据类型、结构和约束定义特定的数据库
Construct or Load the initial database.
在辅助存储介质上构造或加载初始数据库内容
Manipulate the database:
操作数据库:
① Retrieval, Modification
检索，修改
② Accessing the database through Web applications
通过Web应用程序访问数据库
Share a database
共享数据库允许多个用户和程序同时访问数据库

Data Storage Solution 数据存储解决方案（重点）

Name	解释
Direct Attached Storage	直接连接存储器(DAS)
Network Attached Storage	网络附加存储(NAS)
Storage Area Network	存储区域网络(SAN)

Direct Attached Storage (DAS)
Characteristics: Storage devices attached directly to servers (only point of access)
直接连接到服务器的存储设备(仅访问点)

DAS

Network Attached Storage (NAS)
Characteristics: more reliable than DAS, limited by LAN bandwidth.

NAS

Storage Area Network (SAN)
Characteristics: more expensive

SAN

5.7 Data Mining 数据挖掘

Major Data Mining Tasks 数据挖掘的主要任务

Name	解释
Classification	分类，预测项目类
Association Rule Discovery	关联发现
Clustering	聚类，查找项目类
Sequential Pattern Discovery	顺序模式发现
Deviation Detection	偏差检测
Forecasting	预测
Description	描述
Link analysis	寻找联系和关联

Classification 分类

定义
Find a model for class attribute as a function of the
values of other attributes.
将class属性作为其他属性值的函数来查找模型。
test set 测试集
A test set is used to determine the accuracy of the model.
测试集用于确定模型的准确性。
Classification method 分类方法

Name	解释
Decision Tree	决策树
Naive Bayesian classifiers	朴素贝叶斯分类器
Using association rule	使用关联规则
Neural networks	神经网络

Clustering 聚类定义

Given a set of data points, each having a set ofattributes, and a similarity measure among them.

5.8 Multimedia Information Processing 多媒体信息处理

定义
Multimedia is a combination of text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.
多媒体是文本、图形、声音、动画和视频的组合，通过电子或数字操作的方式交互地传递给用户

Digital Image Processing 数字图像处理

Digital Image
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels.
数字图像是二维图像的一种表示，它是一组有限的数字值，称为图像元素或像素。
Pixel values 像素值
typically represent gray levels, colours, opacities etc.
表示灰度、颜色、不透明度。
填空：Remember digitization implies that a digital image is an approximation of a real scene.

Major tasks for digital Image Processing

Improvement of pictorial information for human interpretation.
改善图像信息的人类解释。
Processing of image data for storage, transmission and representation for autonomous machine perception.
用于存储、传输和表示自主机器感知的图像数据处理。

Information Processing for IoT

Outline

5.3 Data quality 数据质量

Uncertain Data 不确定数据

Causes of Data Uncertainty

Classification of Data Uncertainty

Meaning of Data Quality 数据质量的意义（重点）

Conventional Definition of Data Quality 数据质量的常规标准（定义

5.4 Data Cleaning 数据清理

Data cleaning tasks 数据清洗的任务（重点）

Methods to Handle Noisy Data

Sensor Cleaning Pipeline

Step 1: Point

Step 2: Smoothing

Step 3: Merge

Step 4: Arbitrate 仲裁

Step 5: Virtualize 虚拟化

Data Fusion 数据融合

Three Processing Architectures 三个处理架构

Data-level Fusion

Data Storage 数据存储

Database System

DBMS 功能

Data Storage Solution 数据存储解决方案（重点）

5.7 Data Mining 数据挖掘

Major Data Mining Tasks 数据挖掘的主要任务

Classification 分类

Clustering 聚类定义

5.8 Multimedia Information Processing 多媒体信息处理

Digital Image Processing 数字图像处理

Major tasks for digital Image Processing

Processing level

你可能感兴趣的:(Information Processing for IoT)

Information Processing for IoT

Outline

5.3 Data quality 数据质量

Uncertain Data 不确定数据

Causes of Data Uncertainty

Classification of Data Uncertainty

Meaning of Data Quality 数据质量的意义（重点）

Conventional Definition of Data Quality 数据质量的常规标准（定义

5.4 Data Cleaning 数据清理

Data cleaning tasks 数据清洗的任务 （重点）

Methods to Handle Noisy Data

Sensor Cleaning Pipeline

Step 1: Point

Step 2: Smoothing

Step 3: Merge

Step 4: Arbitrate 仲裁

Step 5: Virtualize 虚拟化

Data Fusion 数据融合

Three Processing Architectures 三个处理架构

Data-level Fusion

Data Storage 数据存储

Database System

DBMS 功能

Data Storage Solution 数据存储解决方案（重点）

5.7 Data Mining 数据挖掘

Major Data Mining Tasks 数据挖掘的主要任务

Classification 分类

Clustering 聚类定义

5.8 Multimedia Information Processing 多媒体信息处理

Digital Image Processing 数字图像处理

Major tasks for digital Image Processing

Processing level

你可能感兴趣的:(Information Processing for IoT)

Data cleaning tasks 数据清洗的任务（重点）