Information Processing for IoT

Outline

5.1 Information Theory
5.2 Information Technology
5.3 Data quality
5.4 Data cleaning
5.5 Data fusion
5.6 Data storage
5.7 Data mining
5.8 Multimedia information processing

5.3 Data quality 数据质量

Uncertain Data 不确定数据

  • Data uncertainty occur during:
Name 名字
Data collection 数据收集
Data transmission 数据传输
Data processing 数据处理

Causes of Data Uncertainty

Name 名字
Environmental factors 环境因素
Low battery power 电池电量低
Packet losses 丢包

Classification of Data Uncertainty

  • Source Classification 根据不确定数据的来源分类 (重点)
Name 实例 翻译
Undesirable uncertainty Noisy sensor data
Imprecise GPS Data
Unreliable extracted/integrated data 不可靠的提取/集成数据
Desirable uncertainty Medical data with generalized attributes 具有通用属性的医疗数据
Cloaked trajectory data 隐藏的轨迹数据
  • Granularity Classification 根据粒度分类
Name 翻译
Tuple Uncertainty 元组的不确定性
Attribute Uncertainty 属性不确定性
  • Correlations Classification 根据相互关系分类
Name 翻译
Independent Uncertainty 独立的不确定性
Correlated Uncertainty 相关的不确定性
Uncertainty with Local Correlations 局部相关不确定性

Meaning of Data Quality 数据质量的意义(重点)

  • Generally, you have a problem if the data doesn’t mean what you think it does, or should.
    通常情况下,如果数据的含义与您认为的不同,或者不应该相同,那么就会出现问题
  • Data quality problems are expensive and pervasive.
    数据质量问题昂贵且普遍存在

Conventional Definition of Data Quality 数据质量的常规标准(定义

Name 翻译 解释
Accuarcy 精度 recorded correctly
Completeness 完整 All data was recorded
Uniqueness 独一 recorded once
Timeliness 及时 The data is kept up to date
Consistency 一致 The data agrees with itself

5.4 Data Cleaning 数据清理

the process of detecting and correcting (or removing) errors and inconsistencies from data in order to improve the quality of data.
To identifying incomplete, incorrect, inaccurate, irrelevant, etc.
从数据中检测和纠正(或消除)错误和不一致以提高数据质量的过程。
该技术目的在于识别不完整、不正确、不准确、不相关等。

Data cleaning tasks 数据清洗的任务 (重点)

Name 翻译
Fill in missing values 填充缺失的值
Identify outliers and smooth out noisy data 识别异常值并平滑噪声数据
Correct inconsistent data 纠正不一致的数据
Resolve redundancy caused by data integration 解决数据集成造成的冗余

Methods to Handle Noisy Data

Name 解释
Binning 装箱法,把数据按箱处理Smooth掉边缘数据
Regression 回归函数拟合
Clustering 聚类,检测到不属于大类的元素,删掉
Combined inspection 计算机和人工检查相结合

Sensor Cleaning Pipeline

Sensor Cleaning Pipeline

Uses temporal and spatial characteristics of sensor data
利用传感器数据的时空特性

Step 1: Point
  • Operates: Single value of sensor stream.
    操作:单值传感器流。
  • Purpose: Filter individual values
    目的:过滤单独的值
    ① Errant (dirty / faulty) RFID tags
    错误的RFID标签
    ② Obvious outliers
    明显的异常值
    ③ Conversion of raw data into tuples
    将原始数据转换为元组


    Step 1: Point
Step 2: Smoothing
  • Purpose: Interpolates (inserts) lost readings
    目的:插入丢失的读数
    ①Temporal interpolation
    时间插值
    ②Outlier detection
    异常值检测
  • Method: Window based queries
    方法:基于窗口的查询


    Step 2: Smoothing
Step 3: Merge
  • Purpose: Spatial interpolation
    目的:空间插值
  • 例如:在一个空间颗粒中,通过计算来自不同尘埃的读数的平均值,并忽略偏离平均值两个偏差之外的单个读数。
Step 3: Merge
Step 4: Arbitrate 仲裁
  • Purpose: Remove
    目的:删除
    ① conflicting readings
    冲突的读数
    ② de-duplication
    重复数据删除
Step 4: Arbitrate
Step 5: Virtualize 虚拟化
  • Purpose: Multi-source integration
    目的:多源集成
Step 5: Virtualize

Data Fusion 数据融合

  • 概念(重点)
    Data fusion combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source.
    数据融合将来自多个来源的数据组合起来,并收集这些信息,以实现推断,这将比通过单一来源实现更有效和更准确。

  • 填空题
    Sensors only give an estimate of the measured physical property
    传感器只能对测量到的物理性质作出估计。
    Nature of errors often determine the preferred fusion algorithm
    误差的性质往往决定了融合算法的首选。

Three Processing Architectures 三个处理架构

Name 翻译
Data-level fusion 数据级融合
Feature-level fusion 特征级融合
Decision-level fusion 决策级融合
  • Data-level fusion: Direct fusion of sensor data
    数据级融合: 传感器数据的直接融合,
  • Feature-level fusion: Representation of sensor data via feature vectors, with subsequent fusion of the feature vectors
    特征级融合: 通过特征向量表示传感器数据,然后融合特征向量
  • Decision-level fusion: Processing of each sensor to achieve high-level inferences or decisions, which are subsequently combined.
    决策级融合 :对每个传感器进行处理,以实现高级推理或决策,然后将这些推理或决策组合在一起。


    Data Fusion

Data-level Fusion

  • 使用条件: if the sensors are measuring the same physical phenomena.
    如果传感器测量的是相同的物理现象
Data-level Fusion

Data Storage 数据存储

Database System

  • Database: collection of persistent data
    数据库:持久数据的收集
  • Data: Known facts that can be recorded and have an implicit meaning.
    数据:可以记录并具有隐含意义的已知事实。
  • Database Management System (DBMS): software system that supports creation, population, and querying of a database
    数据库管理系统(DBMS):支持数据库的创建、填充和查询的软件系统
  • Database System: DBMS + Database
    数据库系统:DBMS +数据库

DBMS 功能

Name 解释
Define 定义特定的数据库
Construct 构造初始数据库
Manipulate 增删改查数据库
Share a database 数据库共享
  • Define a database.
    根据数据类型、结构和约束定义特定的数据库
  • Construct or Load the initial database.
    在辅助存储介质上构造或加载初始数据库内容
  • Manipulate the database:
    操作数据库:
    ① Retrieval, Modification
    检索,修改
    ② Accessing the database through Web applications
    通过Web应用程序访问数据库
  • Share a database
    共享数据库允许多个用户和程序同时访问数据库

Data Storage Solution 数据存储解决方案(重点)

Name 解释
Direct Attached Storage 直接连接存储器(DAS)
Network Attached Storage 网络附加存储(NAS)
Storage Area Network 存储区域网络(SAN)
  • Direct Attached Storage (DAS)
    Characteristics: Storage devices attached directly to servers (only point of access)
    直接连接到服务器的存储设备(仅访问点)
DAS
  • Network Attached Storage (NAS)
    Characteristics: more reliable than DAS, limited by LAN bandwidth.
NAS
  • Storage Area Network (SAN)
    Characteristics: more expensive


    SAN

5.7 Data Mining 数据挖掘

Major Data Mining Tasks 数据挖掘的主要任务

Name 解释
Classification 分类,预测项目类
Association Rule Discovery 关联发现
Clustering 聚类,查找项目类
Sequential Pattern Discovery 顺序模式发现
Deviation Detection 偏差检测
Forecasting 预测
Description 描述
Link analysis 寻找联系和关联

Classification 分类

  • 定义
    Find a model for class attribute as a function of the
    values of other attributes.
    将class属性作为其他属性值的函数来查找模型。

  • test set 测试集
    A test set is used to determine the accuracy of the model.
    测试集用于确定模型的准确性。

  • Classification method 分类方法

Name 解释
Decision Tree 决策树
Naive Bayesian classifiers 朴素贝叶斯分类器
Using association rule 使用关联规则
Neural networks 神经网络

Clustering 聚类定义

Given a set of data points, each having a set ofattributes, and a similarity measure among them.

5.8 Multimedia Information Processing 多媒体信息处理

  • 定义
    Multimedia is a combination of text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.
    多媒体是文本、图形、声音、动画和视频的组合,通过电子或数字操作的方式交互地传递给用户

Digital Image Processing 数字图像处理

  • Digital Image
    A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels.
    数字图像是二维图像的一种表示,它是一组有限的数字值,称为图像元素或像素。
  • Pixel values 像素值
    typically represent gray levels, colours, opacities etc.
    表示灰度、颜色、不透明度。
  • 填空:Remember digitization implies that a digital image is an approximation of a real scene.

Major tasks for digital Image Processing

  • Improvement of pictorial information for human interpretation.
    改善图像信息的人类解释。
  • Processing of image data for storage, transmission and representation for autonomous machine perception.
    用于存储、传输和表示自主机器感知的图像数据处理。

Processing level

Processing level

你可能感兴趣的:(Information Processing for IoT)