统计 python_Python统计简介

统计 python

数据分析 (Data Analytics)

什么是统计 (What is Statistics)

Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied.

统计是一门涉及数据收集,组织,分析,解释和表示的学科。 在将统计数据应用于科学,工业或社会问题时,通常从统计人口或要研究的统计模型开始。

中心趋势: (Central Tendencies:)

is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages.

是概率分布的中心值或典型值。 也可以称为分布的中心或位置。 通俗地说, 集中趋势的度量通常称为平均值。

分散: (Dispersion:)

is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

是分布被拉伸或压缩的程度。 统计离差度量的常见示例是方差,标准差和四分位数范围。

相关性: (Correlation:)

or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related.

依存关系是两个随机变量或双变量数据之间的任何统计关系,无论是否为因果关系。 从广义上讲,相关性是任何统计关联,尽管它通常是指一对变量线性相关的程度。

辛普森悖论: (Simpson’s Paradox:)

which goes by several names, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.

这有几个名字,是概率和统计上的一种现象,其中趋势出现在几个不同的数据组中,但是当这些组组合在一起时便消失或反转。

什么是高级数据分析 (What is Data Analytics at high level)

Data Analytics solutions offer a convenient way to leverage business data. But the number of solutions on the market can be daunting — and many may seem to cover a different category of analytics. How can organizations make sense of it all? Start by understanding the different types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.

数据分析解决方案提供了一种利用业务数据的便捷方法。 但是市场上的解决方案数量可能令人望而生畏,而且许多解决方案似乎涵盖了不同类别的分析。 组织如何理解这一切? 首先了解不同类型的分析,包括描述性,诊断性,预测性和规范性分析。

  • Descriptive Analytics tells you what happened in the past.

    描述性分析可以告诉您过去发生了什么。

  • Diagnostic Analytics helps you understand why something happened in the past.

    Diagnostic Analytics可帮助您了解过去发生过什么的原因。

  • Predictive Analytics predicts what is most likely to happen in the future.

    预测分析预测未来最有可能发生的事情。

  • Prescriptive Analytics recommends actions you can take to affect those outcomes.

    规范分析建议您可以采取的措施来影响这些结果。

Python中的应用统计方法 (Applied Statistics Methods in Python)

Imagine we have to do some data analysis with the number of friends for each member of our staffs in the work has. The number of friends will be described in a Python list like below :

想象一下,我们必须对工作中每位员工的朋友数进行一些数据分析。 朋友的数量将在下面的Python列表中描述:

num_friends = [100, 49, 41, 40, 25, 100, 100, 100, 41, 41, 49, 59, 25, 25, 4, 4, 4, 4, 4, 4, 10, 10, 10, 10,
]

We will display the num_friends in Histogram with matplotlib :

我们将使用matplotlib在直方图中显示num_friends:

Seeing the histogram would be

看到直方图将是

统计 python_Python统计简介_第1张图片
Histogram friends counter 直方图朋友专柜

集中趋势 (Central Tendencies)

  • mean

    意思

We would like to get the mean of number of friends

我们想得到朋友数量的平均值

def mean(x):
return sum(x) / len(x)

Apply this method will get the value for number of friends like

应用此方法将获得喜欢的朋友数量的价值

35.791666666666664
  • median

    中位数

The median is a simple measure of central tendency. To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values.

中位数是集中趋势的简单度量。 为了找到中位数 ,我们按从最小到最大的顺序排列观察值。 如果观察值的数量为奇数,则中位数为中间值。 如果观察数为偶数,则中位数为两个中间值的平均值。

Apply this method will give us the result

应用此方法将给我们结果

25.0
  • quantile

    分位数

A generalization of the median is the quantile, which represents the value less than which a certain percentile of the data lies. (The median represents the value less than which 50% of the data lies.)

中位数的一般化是分位数,它表示的值小于数据的某个百分位数所在的值。 (中位数表示小于该值的50%的值。)

def quantile(x, p):
"""returns the pth-percentile value in x"""
p_index = int(p * len(x))
return sorted(x)[p_index]

Apply quantile method with num_friends for the percentile is 0.8 would have result

将分位数方法与num_friends应用于百分位数为0.8将产生结果

59
  • mode (or most common values)

    模式(或最常见的值)

Apply mode method for num_friends will return

num_friends的Apply模式方法将返回

[4]

结论 (Conclusion)

Studying about statistics help us know more about the fundamentals concept of Data Analysis or Data Science in general. There’s a lot more about statistics like Hypothesis testing, Correlation, or Estimation which I have not went over. So feel free to learn more about them.

研究统计信息可以帮助我们更全面地了解数据分析或数据科学的基本概念。 假设检验,相关性或估计等统计信息还有很多,我还没有介绍。 因此,随时了解更多有关它们的信息。

翻译自: https://towardsdatascience.com/introduction-to-statistics-in-python-6f5a8876c994

统计 python

你可能感兴趣的:(python,linux,java,算法,人工智能)