边际概率条件概率_数据科学家解释的边际联合和条件概率

边际概率条件概率

Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.

P robability起着数据科学非常重要的作用,为数据科学家经常试图绘制可以用来更好地预测数据或分析数据的统计推断。

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability (Source: Wikipedia), hence understanding random variables and their probability distributions is a required skill to work on many Data Science problems.

统计推断是使用数据分析来推断潜在概率分布的属性的过程( 来源 :Wikipedia),因此了解随机变量及其概率分布是解决许多数据科学问题的必备技能。

I am going to start this discussion by providing a scenario as we are going to be learning about probability distributions from this scenario.

我将通过提供一个场景开始此讨论,因为我们将从该场景中学习概率分布。

情境 (Scenario)

A survey was carried out with 500 strangers in London’s West End to determine people’s favorite sports. The options were Football, Rugby and the rest was grouped together in Other; The results of the test are displayed in Figure 1.

在伦敦西区,对500个陌生人进行了一项调查,以确定人们最喜欢的运动。 选项包括“足球”,“橄榄球”,其余分组在“其他”中。 测试结果如图1所示。

Figure 1: The Results of the test 图1:测试结果

Figure 1 is not quite a probability distribution, but if we want to get the probability distribution we can simply divide each number in Figure 1 by 500 (number of observations) and the result will be the image in Figure 2.

图1并不是一个概率分布,但是如果我们想要获得概率分布,我们可以简单地将图1中的每个数字除以500(观察值的数量),结果将是图2中的图像。

边际概率条件概率_数据科学家解释的边际联合和条件概率_第1张图片
Figure 2: Probability Distribution 图2:概率分布

联合概率 (Joint Probability)

The Joint probability is a statistical measure that is used to calculate the probability of two events occurring together at the same time — P(A and B) or P(A,B). For example, using Figure 2 we can see that the joint probability of someone being a male and liking football is 0.24.

联合概率是一种统计量度,用于计算两个事件同时发生的概率-P(A和B)或P(A,B)。 例如,使用图2可以看到某人是男性并且喜欢足球的联合概率为0.24。

边际概率条件概率_数据科学家解释的边际联合和条件概率_第2张图片
Figure 3: The Joint Probability Distribution. 图3:联合概率分布。

Note: The cells highlighted in Figure 3 (the Joint Probability Distribution) must sum to 1 because everyone in the distribution must be in one of the cells.

注意 :图3中的单元格(联合概率分布)必须加1,因为分布中的每个人都必须位于其中一个单元格中。

The Joint probability is symmetrical meaning that P(Male and Football) = P(Football and Male) and we can also use it to find other types of distributions, the marginal distribution and the conditional distribution.

联合概率是对称的,意味着P(男和足球)= P(足球和男),我们也可以用它来找到其他类型的分布,即边际分布和条件分布。

边际分布 (Marginal Distribution)

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables (Source: Wikipedia) — If that was too much jargon, to put it simply, the marginal probability is the probability of an event irrespective of the outcome of another variable — P(A) or P(B).

在概率论和统计学中,随机变量集合的子集的边际分布是子集中包含的变量的概率分布。 它给出了子集中变量的各种值的概率,而没有参考其他变量的值( 来源 : Wikipedia )—如果说的话太多了,简单来说,边际概率就是事件的概率另一个变量-P(A)或P(B)的结果。

边际概率条件概率_数据科学家解释的边际联合和条件概率_第3张图片
Figure 4: The Marginal Distribution 图4:边际分布

Note: Whether we ignore the gender or the sport our Marginal Distributions must sum to 1.

注意 :无论我们忽略性别还是运动,我们的边际分布总和必须为1。

A fun fact of marginal probability is that all the marginal probabilities appear in the margins — how cool is that. Hence the P(Female) = 0.46 which completely ignores the sport the Female prefers, and the P(Rugby) = 0.25 completely ignores the gender.

边际概率的一个有趣的事实是,所有边际概率都出现在边际中-这多么酷。 因此,P(女性)= 0.46完全忽略了女性偏爱的运动,而P(Rugby)= 0.25则完全忽略了性别。

条件概率 (Conditional Probability)

The conditional probability concept is one of the most fundamental in probability theory and in my opinion is a trickier type of probability. It defines the probability of one event occurring given that another event has occurred (by assumption, presumption, assertion or evidence).

条件概率概念是概率论中最基本的概念之一,在我看来是一种棘手的概率类型。 它定义了假设已发生另一事件(通过假设,推定,主张或证据)而发生一个事件的概率。

Figure 5: Expression of the Conditional Probability 图5:条件概率的表达式

To make sense of this let’s again use Figure 2; If we want to calculate the probability that a person would like Rugby given that they are a female, we must take the joint probability that the person is female and likes rugby (P(Female and Rugby)) and divide it by the probability of the condition. In this case, the probability is that the person is a female (P(Female)) which we can work out from the margin to be 0.46 hence we get 0.11 (2 decimal places).

为了理解这一点,让我们再次使用图2 ; 如果要计算某人喜欢橄榄球的概率(假设某人是女性),则必须考虑该人是女性并且喜欢橄榄球的联合概率( P(Female and Rugby) ),然后将其除以概率健康)状况。 在这种情况下,概率是该人是一个女性( P(Female) ),我们可以从裕度算出其为0.46,因此得到0.11(小数点后两位)。

Let's write that up neater:

让我们写得更整洁一些:

P(Female, Rugby) = 0.05

P(女,橄榄球)= 0.05

P(Female) = 0.46

P(女)= 0.46

P(Rugby | Female) = 0.05 / 0.46 = 0.11 (to 2 decimal places).

P(橄榄球|母)= 0.05 / 0.46 = 0.11(小数点后2位)。

If we continued to fill in the probability of preferring a sport given the observant is a female then we would have a Conditional Probability Distribution.

如果在观察者是女性的情况下,如果我们继续填写喜欢某项运动的可能性,那么我们将获得条件概率分布。

结语 (Wrap Up)

This is guide is a very simple introduction to joint, marginal and conditional probability. Being a Data Scientist and knowing about these distributions may still get you death stares from the envious Statisticians, but at least this time it’s because they are just angry people rather than you being wrong — I am joking!

本指南是对联合概率,边际概率和条件概率的非常简单的介绍。 作为数据科学家并了解这些分布可能仍然会让您羡慕嫉妒的统计学家,但至少这次是因为他们只是在生气,而不是您在做错- 我在开玩笑!

Let’s continue the conversation on LinkedIn…

让我们继续在LinkedIn上进行对话…

翻译自: https://towardsdatascience.com/marginal-joint-and-conditional-probabilities-explained-by-data-scientist-4225b28907a4

边际概率条件概率

你可能感兴趣的:(python,机器学习,人工智能,算法,java)