算法学习 | 期望dp+概率dp

(我一直以为我三四天前就发出来了)
最近集训的时候学习了概率与组合的内容,恰巧今天又学了概率dp和期望dp,个人觉得超级有意思,比其他dp有趣多了,借机分享一下自己的理解心得。(我才不会告诉你是因为别的dp太难了我学不会呜呜)

概率DP
今天qko学长讲课的时候对概率dp的概括特别到位:“概率dp一般求的是实际的结果。在dp过程中,当前状态是由所有子状态的概率共同转移而来,所以概率dp只是利用了dp的动态而没有规划(即只需转移无需决策)。”

其实至今我也对需要规划的dp迷迷糊糊,对需要决策的dp迷迷糊糊qaq,所以对一个式子走天下的概率dp好感有加。

讲道理我对概率dp的理解更像是高中时候的递推式,你要做的就是把递推式算出来
,从前向后(单纯概率dp)或者从后向前(期望dp)遍历算一遍即可。算法十分简单,代码无敌好写,比较不容易的就是递推式的推导。

例题(单纯概率dp)
传送门

Rimi learned a new thing about integers, which is - any positive integer greater than 1 can be divided by its divisors. So, he is now playing with this property. He selects a number N. And he calls this D.
In each turn he randomly chooses a divisor of D (1 to D). Then he divides D by the number to obtain new D. He repeats this procedure until D becomes 1. What is the expected number of moves required for N to become 1.

简述一下就是,给你一个数,用它的因子去除它,循环重复这个过程,直至这个数变成1,问需要多少次。
看起来问多少次是一个期望问题,但这个不是单纯的期望dp,举个例子,比如50。50的因子为{1,2,5,10,25,50}。显而易见,用因子去除一个数剩下的数还是他的因子,假如现在剩下的数是10,那么问题就转化为了把10变成1需要几次的问题。从这里我们可以看出,50需要的次数(我们记为E(50))取决于1,2,5,10,25,50出现的次数。这就注定了我们的计算顺序是从前往后依次计算的。
假设数X的因子个数是为N,则
f(X) = 1/N * (f(X1) + f(X2) + … +f(X))
由于在此时f(X)在此时是未知的,我们呢就将式子化简一下。
最终化简成 f (X) = 1/(N-1) * (f(X1) + f(X2) + … +f(X(N-1)))
用一个函数在之前将左右的f(x)算好即可。
代码么得什么难度~

#include
#include
#include
using namespace std;

double f[100009];

void ans()
{
	f[1] = 0;
	for(int i=2;i<=100000;i++)
	{ 
		int cnt = -1;//保证最后加完是n-1  
		double sum =0 ;
		for(int j=1;j<=sqrt(i);j++)
		{
			if(i%j == 0)
			{
				cnt++;
				sum += f[j] +1;
				if(j*j != i)
				{
					cnt++;
					sum+=f[i/j]+1; 
				}
			}
		}
		f[i] = sum/cnt;
	} 
}



int main()
{
	int T;
	scanf("%d",&T);
	ans();
	for(int i=1;i<=T;i++)
	{
		int x;
		scanf("%d",&x);
		printf("Case %d: %.6f\n",i,f[x]);
	}
	return 0;
}


下面举一个期望dp的例子,这个题的思路是源于看了刘汝佳之后。他说期望dp与概率dp不一样的地方在于概率是最开始的为已知,比如上面题的f[1] = 0
但期望不是,你不会知道最开始的期望值(我一直都理解成平均值),但你会知道 最后一个数的期望值,因为已经达到状态,所以最后一种已经达到状态的dp值一定为0。

那么如何构造期望dp的递推式呢?

假如我们要求当前的状态dp[i][j] 我们讨论从当前状态出发一共有多少种可能的情况,然后根据这些情况发生的概率相乘叠加,即可得到现在的dp[i][j]。
从上面的推导我们就可以知道,当前的值是仰赖于这个数之后的值的,所以求答案的时候应该从后向前求。即逆序循环。

给出一个例题 POJ 2096

Ivan is fond of collecting. Unlike other people who collect post stamps, coins or other material stuff, he collects software bugs. When Ivan gets a new program, he classifies all possible bugs into n categories. Each day he discovers exactly one bug in the program and adds information about it and its category into a spreadsheet. When he finds bugs in all bug categories, he calls the program disgusting, publishes this spreadsheet on his home page, and forgets completely about the program.
Two companies, Macrosoft and Microhard are in tight competition. Microhard wants to decrease sales of one Macrosoft program. They hire Ivan to prove that the program in question is disgusting. However, Ivan has a complicated problem. This new program has s subcomponents, and finding bugs of all types in each subcomponent would take too long before the target could be reached. So Ivan and Microhard agreed to use a simpler criteria — Ivan should find at least one bug in each subsystem and at least one bug of each category.
Macrosoft knows about these plans and it wants to estimate the time that is required for Ivan to call its program disgusting. It’s important because the company releases a new version soon, so it can correct its plans and release it quicker. Nobody would be interested in Ivan’s opinion about the reliability of the obsolete version.
A bug found in the program can be of any category with equal probability. Similarly, the bug can be found in any given subsystem with equal probability. Any particular bug cannot belong to two different categories or happen simultaneously in two different subsystems. The number of bugs in the program is almost infinite, so the probability of finding a new bug of some category in some subsystem does not reduce after finding any number of bugs of that category in that subsystem.
Find an average time (in days of Ivan’s work) required to name the program disgusting.

题目很长啊读题读的头痛(流下了英语菜鸡的泪水)
大概就是讲n种bug 和s 个系统,需要每种bug每个系统都要找到一个。
我们设dp[i][j]表示已经找到i种bug,并存在于j个子系统中,要达到目标状态的天数的期望。
显然,dp[n][s]=0,因为已经达到目标了。而dp[0][0]就是我们要求的答案。

根据题意,每个dp[i][j] 之后可以达到的状态有四种。
于是就有了这样一个递推式,具体细节跟上一题类似。化简的时候要细心,我才不会告诉你我化简wa了三发。
化简后的式子就是
dp[i][j]=(((n-i) * j * dp[i+1][j] + i * (s-j) * dp[i][j+1] + (n-i) * (s-j) * dp[i+1][j+1] + n * s)) / ((n*s - i * j));
然后就根据这个式子从后往前推即可得到答案。
贴代码:

#include
#include
#include
using namespace std;

const int maxn = 1009;

double dp[maxn][maxn];

int main()
{
	int n,s;
	scanf("%d%d",&n,&s);
	memset(dp,0,sizeof dp);
	dp[n][s] = 0;
	for(int i = n;i >= 0;i--)
	{
		for(int j = s;j >= 0;j--)
		{
			if(i == n && j == s) continue;
			dp[i][j]=(((n-i)*j*dp[i+1][j] + i*(s-j)*dp[i][j+1] + (n-i)*(s-j)*dp[i+1][j+1] + n*s)*(1.0)) / ((n*s-i*j)*(1.0));
		}
	}
	printf("%.4f\n",dp[0][0]);
	return 0;
}

以上就是结合题目之后我对期望DP与概率DP的理解,总结就是,无论是概率还是期望都是着重于公式的推导,分析状态以及之后状态的可能性,根据期望与概率的相关数学公式推导即可owo。
这篇博客到这就结束了qwq。

你可能感兴趣的:(ACM习题,DP,概率,期望)