大数据简介视频下载
We are going to deliver a series of Tutorials on the following concepts one by one:
我们将逐一提供有关以下概念的一系列教程:
First we will start with BigData Basics, then move to Hadoop to Cloud then finally we will discuss about “How to use BigData Solutions with Cloud Platforms”. We will discuss different BigData and Cloud Platforms Solutions available in the current market like Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure, IBM Bluemix, Pivotal Cloud Foundry, Yahoo Cloud Platform etc.
首先,我们将从BigData基础知识开始,然后从Hadoop迁移到Cloud,最后我们将讨论“如何在云平台上使用BigData解决方案”。 我们将讨论当前市场上可用的各种BigData和云平台解决方案,例如Amazon Web Services(AWS),Google Cloud Platform,Microsoft Azure,IBM Bluemix,Pivotal Cloud Foundry,Yahoo Cloud Platform等。
Finally we will discuss how to develop applications using Spring Cloud and Spring Hadoop Modules. We feel that these two are really Big subjects: BigData and Cloud so it may take more time to discuss all these concepts in-detail with Real-time examples. Please bare with us.
最后,我们将讨论如何使用Spring Cloud和Spring Hadoop模块开发应用程序。 我们认为这两个主题确实很重要:BigData和Cloud,因此可能需要更多时间来讨论所有这些概念以及实时示例。 请裸露我们。
In this series, first we are going to discuss about BigData Basics in this post.
在本系列中,首先我们将在本文中讨论BigData基础。
Now We are living in Big Data Era.
现在我们生活在大数据时代。
Few years ago, Systems or Organizations or Applications were using all Structured Data only ( Structured Data means In the form of Rows and Columns). It was very easy to use Relational Data Bases (RDBMS) and old Tools to store, manage, process and report this Data.
几年前,系统,组织或应用程序仅使用所有结构化数据(结构化数据表示为行和列的形式)。 使用关系数据库(RDBMS)和旧工具来存储,管理,处理和报告此数据非常容易。
However recently, Nature of Data is changed. And Systems or Organizations or Applications are generating huge amount of Data in variety of formats at very fast rate.
但是最近,数据的性质发生了变化。 系统,组织或应用程序正以非常快的速度生成各种格式的大量数据。
That means Data is not simple Structured Data(Not in the form of simple Rows and Columns). It does not have any proper format, just RawData without any format. It is “very difficult or not possible” to use Old Technologies, Traditional Relational Databases and Tools to store, manage, process and report this Data. Traditional DataBases cannot Store, Process and Analysis this kind of Data.
这意味着数据不是简单的结构化数据(不是简单的行和列的形式)。 它没有任何适当的格式,只有RawData没有任何格式。 使用旧技术,传统关系数据库和工具来存储,管理,处理和报告此数据“非常困难或不可能”。 传统的数据库无法存储,处理和分析此类数据。
Then how to solve this problem? Here BigData Solutions come into picture.
那么如何解决这个问题呢? 在这里,BigData Solutions成为现实。
Big Data Solutions solve all these problems very easily.
大数据解决方案非常轻松地解决了所有这些问题。
Let us start with understanding What is BigData and How important it is in our life.
让我们首先了解什么是大数据及其在我们生活中的重要性。
We don’t have a straightforward definition to BigData. However, we will try to answer this question in different ways.
我们没有BigData的简单定义。 但是,我们将尝试以不同的方式回答这个问题。
In Simple Words, Big Data is a technique to solve data problems that are not solvable using Traditional DataBases and Tools.
用简单的话来说,大数据是一种解决传统数据库和工具无法解决的数据问题的技术。
In other way, BigData means not just huge amount of Data. BigData means huge amount of data generating at very fast rate in different formats.
换句话说,BigData不仅意味着大量的数据。 BigData意味着以不同格式以非常快的速度生成大量数据。
Big Data is a Technique to “Store, Process, Manage, Analysis and Report” a huge amount of variety data, at the required speed, and within the required time to allow Real-time Analysis and Reaction.
大数据是一种在要求的速度和要求的时间内“存储,处理,管理,分析和报告”大量多样性数据的技术,以实现实时分析和React。
BigData is Data with has the following three characteristics:
BigData是具有以下三个特征的数据:
The following three are known as “BigData Characteristics”.
以下三个称为“大数据特征”。
Volume means “How much Data is generated”. Now-a-days, Organizations or Human Beings or Systems are generating or getting very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa Byte(EB) and more.
数量表示“生成了多少数据”。 如今,组织或人类或系统正在生成或获取大量数据,例如TB(兆字节)到PB(Peta字节)到Ex字节(EB)等等。
Velocity means “How fast produce Data”. Now-a-days, Organizations or Human Beings or Systems are generating huge amounts of Data at very fast rate.
速度表示“数据生成速度有多快”。 如今,组织,人类或系统正在以非常快的速度生成大量数据。
Variety means “Different forms of Data”. Now-a-days, Organizations or Human Beings or Systems are generating very huge amount of data at very fast rate in different formats. We will discuss in details about different formats of Data soon.
多样性是指“数据的不同形式”。 如今,组织,人类或系统正以非常快的速率以不同的格式生成大量数据。 我们很快将详细讨论有关数据的不同格式。
BigData refers to 3V (VVV) Paradigm:
BigData指的是3V(VVV)范例:
Three “Vs” Paradigm (Volume, Velocity, Variety) of Big Data was defined by “Doug Laney” in 2001.
大数据的三个“ V”范式(体积,速度,多样性)由“ Doug Laney”在2001年定义。
If our Organization’s Data is in this 3Vs Paradigm, that means we are in BigData Problems. So we should use some BigData Solutions to solve our problems.
如果我们组织的数据处于此3Vs范式中,则意味着我们处于大数据问题中。 因此,我们应该使用一些BigData解决方案来解决我们的问题。
These 3Vs Paradigm is not enough to get better value from our BigData. There is another V (4th V), which is most important for every BigData problem.
这些3V范例不足以从我们的BigData获得更好的价值。 还有另一个V(第4个V),对于每个BigData问题而言,这都是最重要的。
Veracity means “The Quality or Correctness or Accuracy of Captured Data”. Out of 4Vs, it is most important V for any BigData Solutions. Because without Correct Information or Data, there is no use of storing large amount of data at fast rate and different formats. That data should give correct business value.
准确性是指“捕获数据的质量,正确性或准确性”。 在4V中,对于任何BigData解决方案而言,它都是最重要的V。 因为没有正确的信息或数据,就无法以快速的速度和不同的格式存储大量数据。 该数据应提供正确的业务价值。
So this 4th V answers the following questions:
因此,此第4 V回答以下问题:
How accurate is that data in predicting business value?
Do the results of a big data analysis actually make sense?
该数据在预测业务价值方面有多准确?
大数据分析的结果真的有意义吗?
BigData 4Vs In Simple Terminology:
V(Volume) : The Amount of Data
V(Variety) : The number of Type of Data
V(Velocity) : The Speed of Data Processing
V(Veracity) : The Correctness of Data
简单术语中的BigData 4V:
V(卷):数据量
V(Variety):数据类型的数量
V(Velocity):数据处理的速度
V(准确性):数据的正确性
We are living in Data Era or Information Era. Data is most important factor for all Organizations for the following reasons or benefits:
我们生活在数据时代或信息时代。 出于以下原因或好处,数据对于所有组织都是最重要的因素:
And More.
和更多。
Now-a-Days, Big data is very very important for Organizations or Companies form Medium-Size to Large-Size, because it enables them to gather, store, manage, and manipulate “Extremely Large Amounts Of Data, Extremely High Velocity of Data and Extremely Wide Variety of Data”:
如今,大数据对于从中型到大型的组织或公司来说非常重要,因为它使它们能够收集,存储,管理和操纵“极大量的数据,极高的数据速度”和种类繁多的数据”:
By following this Big Data 4Vs Paradigm, we will get lot of benefits as shown below:
通过遵循此大数据4Vs范例,我们将获得很多好处,如下所示:
By using those BigData 4Vs Paradigm, Organizations can get many befits by understanding “What, Who, When, Where, How” kind of questions:
通过使用这些BigData 4Vs范例,组织可以通过理解“什么,谁,何时,何地,如何”这类问题来获得很多好处:
In BigData 3V Paradigm, one V refers to Variety. It means generating or getting data in different formats.
在BigData 3V范例中,一个V表示品种。 这意味着生成或获取不同格式的数据。
In Data Era, We, Systems, Devices or Organizations are generating or getting the following types of Data Formats.
在数据时代,我们,系统,设备或组织正在生成或获取以下类型的数据格式。
Structured Data means Data that is in the form of Rows and Columns. So it is very easy to store even in Relational Databases.
结构化数据是指行和列形式的数据。 因此,即使存储在关系数据库中也非常容易。
In Simple words, Anything which possible to store in the form of Rows and Columns that is Structured Data.
简而言之,就是任何可能以行和列的形式存储的结构化数据。
For Example:- Relational DBs Data(Online Subscription, Transactional Data etc).
例如:-关系数据库数据(在线订阅,交易数据等)。
Semi-Structured Data means Data that is formatted in some way. But it is not formatted in the form of Rows and Columns. It is possible to store in Relational Databases, but bit complex to manage and provide very less performance.
半结构化数据是指以某种方式格式化的数据。 但是它的格式不是行和列的形式。 可以存储在关系数据库中,但是管理起来有点复杂,并且提供的性能非常低。
For Example:-
例如:-
In Log Files, Columns are separated by using “Whitespace” charaters (Which are characters used to align things either horizontally or vertically. For instance, space or Tab space, next line etc).
在日志文件中,使用“空白”字符分隔列(这些字符用于水平或垂直对齐内容,例如空格或制表符,下一行等)。
Observe the following JBoss Server log file:
观察以下JBoss Server日志文件:
09:20:01,054 INFO [org.jboss.modules] (main) JBoss Modules version 1.3
09:20:01,652 INFO [org.jboss.as.process.Host Controller.status] (main) JBAS012017: Starting process 'Host Controller'
09:20:05,079 INFO [org.jboss.as.process.Server: myserver.status] (ProcessController-threads - 10) JBAS012017: Starting process 'Server: myserver'
17:01:58,833 INFO [org.jboss.as.process] (Shutdown thread) JBAS012016: Shutting down process controller
17:02:03,408 INFO [org.jboss.as.process.Host Controller.status] (Shutdown thread) JBAS012018: Stopping process 'Host Controller'
17:02:15,246 INFO [org.jboss.as.process.Server: myserver.status] (ProcessController-threads - 9) JBAS012018: Stopping process 'Server: myserver'
17:03:02,990 INFO [org.jboss.as.process.Server:myserver.status] (reaper for Server: myserver) JBAS012010: Process 'Server: myserver' finished with an exit status of 0
17:03:13,170 INFO [org.jboss.as.process.Host Controller.status] (reaper for Host Controller) JBAS012010: Process 'Host Controller' finished with an exit status of 0
17:03:13,195 INFO [org.jboss.as.process] (Shutdown thread) JBAS012015: All processes finished; exiting
If we observe above log file, first column (contains “timestamp”) is separated by some Whitespaces with 2nd column (Contains Logging level). It is semi-formatted, not fully formatted text.
如果我们观察到上面的日志文件,则第一列(包含“时间戳记”)由一些带有第二列的空白分隔(包含日志记录级别)。 它是半格式的,不是全格式的文本。
Observe the following XML Document. It is also semi-formatted with XML start and end tags.
遵守以下XML文档。 它还使用XML的开始和结束标记进行半格式化。
Un-Structured Data means Data that is not formatted in any way. It is not possible to store data in Relational Databases.
非结构化数据表示未以任何方式格式化的数据。 无法在关系数据库中存储数据。
For Example:- Audio files, Videos, Call Centre Executive Typed Text, Photos, Sensor Data,Web Data,Mobile Data,GPS Data,Social Media Data etc are Un-Structured Data.
例如:-音频文件,视频,呼叫中心主管键入的文本,照片,传感器数据,Web数据,移动数据,GPS数据,社交媒体数据等均为非结构化数据。
If we open any image file (for instance, jpeg file) in any text editor, we can see all binary data, which is not at all formatted any form.
如果在任何文本编辑器中打开任何图像文件(例如jpeg文件),我们将看到所有二进制数据,而这些数据根本没有格式化为任何形式。
Now-a-Days, People, Machines, Devices, Organizations and Internet are generating Multi-Structured Data that means combination of Structured Data, Semi-Structured Data and Un-Structured Data. It is not at all possible to store and manage this kind of Data using Traditional Old Technologies, Databases and Tools.
如今,人员,机器,设备,组织和Internet正在生成多结构化数据 ,这意味着结构化数据,半结构化数据和非结构化数据的组合。 使用传统的旧技术,数据库和工具根本不可能存储和管理此类数据。
Here Big Data solutions solve this problem in efficient and cost-effective way.
在这里,大数据解决方案以有效且具有成本效益的方式解决了这一问题。
If we use BigData solutions to store, manage, process and report our Data, we will get the following benefits:
如果我们使用BigData解决方案来存储,管理,处理和报告我们的数据,我们将获得以下好处:
The following is the list of Most Popular BigData Solutions available in the market.
以下是市场上最受欢迎的BigData解决方案列表。
Most of the Organizations are using or moving to BigData. So it is not possible to list out all those BigData Organizations or Customers here.
大多数组织都在使用或迁移到BigData。 因此,不可能在此处列出所有这些BigData组织或客户。
We will provide only some popular Organizations who are using and benefiting from Big Data Solutions.
我们将仅提供一些使用大数据解决方案并从中受益的受欢迎的组织。
Facebook is one of the popular Social Networking WebSite. World-wide, Around 1000 million users are using Facebook Application. It is collecting around 500TB (Tera Bytes) per Day from Users Subscription, User Likes, Posts, Relations Information, Audios, Videos, Pictures etc.
Facebook是流行的社交网站之一。 在全球范围内,大约有1亿用户在使用Facebook应用程序。 它每天从用户订阅,用户喜欢,帖子,关系信息,音频,视频,图片等中收集大约500TB(兆字节)。
Google is also using their BigData Cloud Platform to mange their applications data like Gmail, Google+, Google Search Engine, YouTube etc.
Google还使用其BigData Cloud Platform来管理其应用程序数据,例如Gmail,Google +,Google搜索引擎,YouTube等。
In India, UIDAI (Unique Identification Authority Of India) manages all Adhar Card information. It is also using BigData solutions to manage that huge amount of Data.
在印度,UIDAI(印度唯一识别机构)管理所有Adhar卡信息。 它还使用BigData解决方案来管理大量数据。
RedBus is India’s largest online Bus Ticket and Hotel Booking organization. It is also using BigData Solutions to manage that huge amount of Data with very high traffic rate.
RedBus是印度最大的在线巴士票务和酒店预订组织。 它还使用BigData解决方案以很高的流量速率管理大量数据。
Two World famous online shopping giants: eBay and Amazon are also using BigData solutions to mange their Customer Data, products information etc.
两家世界著名的在线购物巨头:eBay和亚马逊也正在使用BigData解决方案来管理其客户数据,产品信息等。
A lot of Airlines (For Example:- British Airways, Singapore Airlines etc.) today are using BigData solutions to store and mange their aircraft and customers information.
今天,许多航空公司(例如:-英国航空公司,新加坡航空公司等)都在使用BigData解决方案来存储和管理其飞机和客户信息。
Yahoo is also using their BigData Cloud Platform solutions to mange their applications data like Yahoo Mail, Yahoo Search Engine, Flickr etc.
雅虎还使用其BigData Cloud Platform解决方案来管理其应用程序数据,例如Yahoo Mail,Yahoo Search Engine,Flickr等。
Safari Books Online is an online subscription service for Individuals and Organizations to access their online Books, Tutorials, Videos.
Safari联机丛书是一项在线订阅服务,供个人和组织访问其在线丛书,教程和视频。
The New York Stock Exchange is one the famous Stock Exchanges in the World. It generates about 5 TB (Tera Bytes) of data per day.
纽约证券交易所是世界上著名的证券交易所之一。 它每天产生约5 TB(兆字节)的数据。
That’s it all about BigData Introduction. We will discuss some more BigData concepts and Hadoop Basics in my coming posts.
这就是关于BigData Introduction的全部内容。 在我的后续文章中,我们将讨论更多BigData概念和Hadoop基础。
Please drop me a comment if you like my post or have any issues/suggestions.
如果您喜欢我的帖子或有任何问题/建议,请给我评论。
翻译自: https://www.journaldev.com/8734/introduction-to-bigdata
大数据简介视频下载