How are big data and machine learning related?(大数据与机器学习间关系)
下面是回答:
1.
Big data and machine learning are not related, but when used together can do real wonder. (没有直接联系,但是在一起效果更好)
Machine Learning & Big Data: The learning comes from extensive calculations done over existing datasets to create a learning model(in most cases). A normal system can’t handle very large dataset calculation and data size is increasing day by day, thus the obtained model should be adapted accordingly. To obtain this we have to implement distributed computing using big data technologies like Apache Mahout, Spark, R-Hadoop or initial analytics processing in projects like hive/ pig and feed output to machine learning algorithms for model/ learning generation.(机器学习需要对已经存储的数据集进行广泛计算进而产生学习模型。但是常规的系统不能处理大量的数据集,并且数据大小与日俱增,随着时间推移,已经得到的模型需要进行更新。为了达成这个目标,我们需要用分布式计算,利用大数据的技术,来产生模型和机器学习算法。)
2.
You can apply machine learning algorithms to big data and/or you can apply big data processing techniques to machine learning.(两种技术可以相互渗透)
An example of the first case would be training a neural network or logistic regression with a large dataset using online gradient descent.(在大数据集上用在线梯度下降来训练神经网络或逻辑回归)
An example of the second case would be parallelizing gradient descent to run in a Map-Reduce environment.(在Map-Reduce环境下执行并行梯度下降)
In Machine learning large datasets usually mean you need to use simpler algorithms and they perform much better than on smaller datasets.
3.
There are two types of insights anyone can get from a dataset :
Q1. Direct (group by/join/ sum/ max / average)(直接)
Q2. Inductive (if something is.. then something else is.. else anything is..)(推测)
Mind that the first type of insights are always exact, so you need to use computational tools like excel in small data and hadoop in big data to calculate.
The inductive insights on the other hand are approximations on seeing the data. For small amount of Data, a human can try and infer things seeing charts/graphs etc. However, when the data is huge, its beyond human capacity to infer rules from data. This is exactly when Machine Learning comes in.
4.
One of the biggest reason’s why we use big data is to extract some meaning out of it, so that we can make better decisions. And that’s what machine learning does! It is the science of training systems to learn from data and output appropriate response without being explicitly programmed for that .But, on flip side without big data machine learning would be totally irrelevant, because to learn anything from data you need to have a large number of ‘training examples’ so that all possible scenarios are exhausted and also to avoid faulty training due to few erroneous datasets.
So, they are deeply interconnected.(一句话,大数据集让机器学习出来的模型不偏颇)
5.
I have often found these terms used in an interchangeable way, which is totally wrong.
Big data has got more to do with High Performance Computing(大数据跟高性能计算相关), while Machine Learning is a part of Data Science(机器学习是数据科学的一部分). What happens in Big Data is large volumes of data which cannot be processed in reasonable amount of time, is processed quickly by various techniques and tools. In Machine Learning, a system learns from past experiences and is able to build a model which would most likely be able to comprehend future instances.
One of the main reason why big data and machine learning are used together is because big data is more likely to be a preprocessing step to machine learning.
6.
Machine Learning is a science of studying patterns in the data. These patterns explain how the data is correlated. This correlated data is used to make future predictions.
Big Data is an art of working with large amount of data. As such, machine learning could be done on a smaller set of data, but larger the data; better the predictions.
So if I were to give a short answer; When you have a lot of structured/unstructured data that you want to study and find patterns, then you use big data and run your Machine Learning algorithms and find patterns that make a business use case.
7.
Machine Learning - Build models. When people hear the term “machine learning”, they make mental images of robots who walk, climb or clean houses. In reality, machine learning starts alot closer to home. When you open your emails, spam has been filtered out from your important messages by an algorithm that has learnt to classify “spam” and “not spam”. Your Facebook news feed features posts from your closest friends because an algorithm has examined your likes, tags and photos to decipher who you connect with most. When you upload a photo and the website identifies your face, it’s fuelled by a facial recognition algorithm. When you use a search engine, you see the best and most relevant content first because of a sophisticated search ranking algorithm. In short, machine learning permeates our lives i.e it builds models for self learning algorithms.
Data Mining - It is an analytic process designed to explore data and consequently find Patterns in data. It is a practice of applying algorithms (mostly Machine learning algorithms ) to find patterns in data.
Artificial Intelligence - Behaves and Reasons. Science to develop a system or software to mimic human to respond and behave in a circumference. As field with extremely broad scope, AI has defined its goal into multiple chunks. Later each chuck has become a separate field of study to solve its problem.
Major list of AI goal :-
Reasoning
Knowledge Representation
Computer Vision
Machine Leaning
Natural Language
Robotics
General intelligence, or strong AI
Machine learning is field emerged from one the AI goal to help machine to learn on it own to solve problems it’s can come across.
Natural language processing is another such field emerged from AI goal to help machine to communicate with real human.
Computer vision is a field emerged from AI goal to identify and distinguish objects that the machine could see.
Robotics is a field emerged from AI goal to give a physical appearance for a machine to do physical actions.
以下是关于大数据/算法/机器学习的知名演讲或paper:
https://www.quora.com/What-are-the-best-talks-lectures-related-to-big-data-algorithms-machine-learning