机器学习 python 库_Python机器学习的基本库

机器学习 python 库

by Shubhi Asthana

通过Shubhi Asthana

Python机器学习的基本库 (Essential libraries for Machine Learning in Python)

Python is often the language of choice for developers who need to apply statistical techniques or data analysis in their work. It is also used by data scientists whose tasks need to be integrated with web apps or production environments.

对于需要在工作中应用统计技术或数据分析的开发人员,Python通常是选择的语言。 数据科学家还需要将其与Web应用程序或生产环境集成在一起使用。

Python really shines in the field of machine learning. Its combination of consistent syntax, shorter development time and flexibility makes it well-suited to developing sophisticated models and prediction engines that can plug directly into production systems.

Python确实在机器学习领域大放异彩。 它结合了一致的语法,较短的开发时间和灵活性,使其非常适合开发可直接插入生产系统的复杂模型和预测引擎。

One of Python’s greatest assets is its extensive set of libraries.

Python的最大资产之一就是其广泛的库集。

Libraries are sets of routines and functions that are written in a given language. A robust set of libraries can make it easier for developers to perform complex tasks without rewriting many lines of code.

库是用给定语言编写的例程和函数集。 一组强大的库可以使开发人员更轻松地执行复杂的任务,而无需重写多行代码。

Machine learning is largely based upon mathematics. Specifically, mathematical optimization, statistics and probability. Python libraries help researchers/mathematicians who are less equipped with developer knowledge to easily “do machine learning”.

机器学习主要基于数学。 具体来说,是数学优化,统计和概率。 Python库可以帮助缺乏开发人员知识的研究人员/数学家轻松地“进行机器学习”。

Below are some of the most commonly used libraries in machine learning:

以下是机器学习中最常用的一些库:

Scikit-learn用于经典ML算法 (Scikit-learn for working with classical ML algorithms)

Scikit-learn is one the most popular ML libraries. It supports many supervised and unsupervised learning algorithms. Examples include linear and logistic regressions, decision trees, clustering, k-means and so on.

Scikit-learn是最受欢迎的ML库之一。 它支持许多有监督和无监督的学习算法。 示例包括线性和逻辑回归,决策树,聚类,k均值等。

It builds on two basic libraries of Python, NumPy and SciPy. It adds a set of algorithms for common machine learning and data mining tasks, including clustering, regression and classification. Even tasks like transforming data, feature selection and ensemble methods can be implemented in a few lines.

它基于Python的两个基本库NumPy和SciPy。 它为常见的机器学习和数据挖掘任务添加了一组算法,包括聚类,回归和分类。 甚至转换数据,特征选择和集成方法之类的任务也可以在几行之内完成。

For a novice in ML, Scikit-learn is a more-than-sufficient tool to work with, until you start implementing more complex algorithms.

对于ML的新手来说,Scikit-learn是一个比您可以使用的工具还要多,直到您开始实现更复杂的算法为止。

用于深度学习的Tensorflow (Tensorflow for Deep Learning)

If you are in the world of machine learning, you have probably heard about, tried or implemented some form of deep learning algorithm. Are they necessary? Not all the time. Are they cool when done right? Yes!

如果您身处机器学习领域,那么您可能已经听说,尝试或实现了某种形式的深度学习算法。 他们有必要吗? 并非一直如此。 做对了他们很酷吗? 是!

The interesting thing about Tensorflow is that when you write a program in Python, you can compile and run on either your CPU or GPU. So you don’t have to write at the C++ or CUDA level to run on GPUs.

Tensorflow的有趣之处在于,当您使用Python编写程序时,可以在CPU或GPU上编译并运行。 因此,您不必在C ++或CUDA级别上编写即可在GPU上运行。

It uses a system of multi-layered nodes that allows you to quickly set up, train, and deploy artificial neural networks with large datasets. This is what allows Google to identify objects in photos or understand spoken words in its voice-recognition app.

它使用多层节点系统,可让您快速设置,训练和部署具有大型数据集的人工神经网络。 借助此功能,Google可以在其语音识别应用程序中识别照片中的对象或理解口语。

Theano也适用于深度学习 (Theano is also for Deep Learning)

Theano is another good Python library for numerical computation, and is similar to NumPy. Theano allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Theano是另一个用于数值计算的优秀Python库,与NumPy相似。 Theano允许您有效地定义,优化和评估涉及多维数组的数学表达式。

What sets Theano apart is that it takes advantage of the computer’s GPU. This allows it to make data-intensive calculations up to 100 times faster than when run on the CPU alone. Theano’s speed makes it especially valuable for deep learning and other computationally complex tasks.

Theano之所以与众不同,是因为它利用了计算机的GPU。 这使得它进行数据密集型计算的速度比仅在CPU上运行时快100倍。 Theano的速度使其对于深度学习和其他计算复杂的任务特别有价值。

The final release of Theano library was last year — 2017, version 1.0.0 with a lot of new features, interface changes and improvements.

Theano库的最终版本是去年-2017年,版本1.0.0,具有许多新功能,界面更改和改进。

熊猫用于数据提取和准备 (Pandas for data extraction and preparation)

Pandas is a very popular library that provides high-level data structures which are simple to use as well as intuitive.

Pandas是一个非常受欢迎的库,它提供了易于使用且直观的高级数据结构。

It has many inbuilt methods for grouping, combining data and filtering as well as performing time series analysis.

它具有许多用于分组,组合数据和过滤以及执行时间序列分析的内置方法。

Pandas can easily fetch data from different sources like SQL databases, CSV, Excel, JSON files and manipulate the data to perform operations on it. There are two main structures in the library:

熊猫可以轻松地从SQL数据库,CSV,Excel,JSON文件等不同来源获取数据,并可以对数据进行操作以对其执行操作。 库中有两个主要结构:

  • “Series” — one dimensional

    “系列”-一维
  • “Data Frames” — two dimensional.

    “数据框架”-二维。

For more details on how to use Series and Dataframes, check out my other blog post.

有关如何使用系列和数据框的更多详细信息,请参阅我的其他博客文章 。

Matplotlib用于数据可视化 (Matplotlib for data visualization)

The best and most sophisticated ML is meaningless if you can’t communicate it to other people.

如果您无法与其他人交流,最好,最复杂的ML毫无意义。

So how do you actually turn around value from all this data that you have? How do you inspire your business analysts and tell them “stories” full of “insights”?

那么,如何实际利用所有这些数据来实现价值的转变呢? 您如何激发业务分析师并告诉他们充满“见解”的“故事”?

This is where Matplotlib comes to the rescue. It is a standard Python library used by every data scientist for creating 2D plots and graphs. It’s pretty low-level, meaning it requires more commands to generate nice-looking graphs and figures than with some advanced libraries.

这就是Matplotlib抢救的地方。 它是每个数据科学家用来创建2D绘图和图形的标准Python库。 它非常底层,这意味着与某些高级库相比,它需要更多命令来生成美观的图形和图形。

However, the flip side of that is flexibility. With enough commands, you can make just about any kind of graph you want with Matplotlib. You can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs.

但是,另一方面是灵活性。 使用足够的命令,您可以使用Matplotlib制作几乎任何类型的图形。 您可以构建各种图表,从直方图和散点图到非笛卡尔坐标图。

It supports different GUI backends on all operating systems, and can also export graphics to common vector and graphic formats like PDF, SVG, JPG, PNG, BMP, GIF, etc.

它在所有操作系统上支持不同的GUI后端,并且还可以将图形导出为常见的矢量和图形格式,例如PDF,SVG,JPG,PNG,BMP,GIF等。

Seaborn是另一个数据可视化库 (Seaborn is another data visualization library)

Seaborn is a popular visualization library that builds on Matplotlib’s foundations. It is a higher-level library, meaning it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots.

Seaborn是一个流行的可视化库,建立在Matplotlib的基础上。 它是一个更高级别的库,这意味着更容易生成某些种类的图,包括热图,时间序列和小提琴图。

结论 (Conclusion)

This is a collection of the most important Python libraries for Machine Learning. These libraries are worth looking at as well as getting familiarized with, if you plan to work with Python and data science.

这是最重要的机器学习Python库的集合。 如果您打算使用Python和数据科学,那么这些库值得研究和熟悉。

Did I miss any important Python ML Library ? If so, please make sure to mention it in the comments below. Even though I tried to cover the most useful libraries, I may still not cover some others that deserve to be looked at.

我是否错过了任何重要的Python ML库? 如果是这样,请确保在下面的评论中提及它。 即使我试图介绍最有用的库,但可能仍然没有涵盖其他值得关注的库。

Questions or feedback? I’d love to hear from you — please feel free to leave out a comment, or connect with me on Twitter/Linkedin.

有疑问或反馈吗? 我希望收到您的来信 -请随时发表评论,或在T witter / Linkedin上与我联系。

翻译自: https://www.freecodecamp.org/news/essential-libraries-for-machine-learning-in-python-82a9ada57aeb/

机器学习 python 库

你可能感兴趣的:(算法,大数据,python,机器学习,人工智能)