Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析

Bayes Nets for Genetic Inheritance

基因遗传的贝叶斯网络

 

1.构建基因遗传的贝叶斯网络

 

本章要求构建如下图所示的贝叶斯网络:

 

Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析_第1张图片

 

图中,变量1、2、3分别表示父母及子女的基因型(Genotype),变量4、5、6分别表示父母及子女基因型所对应的性状(Phenotype)。同时,基因型本身由等位基因(Allele)决定。图中的三个虚线框标记了组成整个贝叶斯网络的三个基本模板因子(Template),也就是本节编程作业的2-4题。最后我们可以通过组合这些模板因子构建出完整的贝叶斯网络。

 

phenotypeGivenGenotypeMendelianFactor.m 基因型决定性状的孟德尔因子

 

输入:

isDominant = 1;

genotypeVar = 1;

phenotypeVar = 3;

 

期望输出:

phenotypeFactor = struct('var', [3,1], 'card', [2,3], 'val', [1,0,1,0,0,1]);

 

因子phenotypeFactor的结构及意义如下:

var:表示变量(variables)的名称及它们之间的关系,这里('var', [3,1])表示因子描述的是phenotypeVar = 3的性状对genotypeVar = 1的基因型的条件概率分布。

card:是基数(cardinalities)的缩写,描述了各变量的状态数量,这里表示性状可能的状态数为2(有或没有此性状),基因型可能的状态数为3(AA,Aa,aa三种)。

val:表示因子中各变量可能组合对应的概率值(values)。由于这个问题是基于孟德尔遗传理论,同时isDominant = 1,表示该等位基因为显性基因,因此当基因型为AA、Aa的时候,表现出对应性状,而基因型为aa时不表现出对应性状。

 

参考代码如下:

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, genotypeVar];

% phenotypeFactor.card(1) : number of possible phenotypes

% phenotypeFactor.card(2) : number of genotypes

phenotypeFactor.card = [2, 3];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



% trait = 1, no trait = 2 | AA = 1, Aa = 2, aa = 3

Index = [find(assignments(:, 1) == 1 & assignments(:, 2) == 1);

         find(assignments(:, 1) == 1 & assignments(:, 2) == 2);

         find(assignments(:, 1) == 2 & assignments(:, 2) == 3)]';



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

phenotypeFactor.val(Index) = 1;

if isDominant == 0

    phenotypeFactor.val = ~phenotypeFactor.val;

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 

phenotypeGivenGenotypeFactor.m 基因型决定性状的因子

 

很多基因型与性状之间的关系并不严格遵守孟德尔遗传定律,而是仅仅影响体现某性状的可能性。这一节就是在孟德尔遗传定律的基础上,对这种更为合理的情况进行建模。

 

输入:

alphaList = [0.8; 0.6; 0.1];

genotypeVar = 1;

phenotypeVar = 3;

 

期望输出:

phenotypeFactorAlpha = struct('var', [3,1], 'card', [2,3], 'val', [0.8,0.2,0.6,0.4,0.1,0.9]);

 

参考代码如下:

 

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% The number of genotypes is the length of alphaList.

% The number of phenotypes is 2.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, genotypeVar];

% phenotypeFactor.card(1) : number of possible phenotypes

% phenotypeFactor.card(2) : number of genotypes

phenotypeFactor.card = [2, length(alphaList)];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

for ii = 1 : length(alphaList)

    phenotypeFactor.val(find(assignments(:, 1) == 1 & assignments(:, 2) == ii)) = alphaList(ii);

    phenotypeFactor.val(find(assignments(:, 1) == 2 & assignments(:, 2) == ii)) = 1 - alphaList(ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 

 

genotypeGivenAlleleFreqsFactor.m 等位基因决定基因型的因子

 

本节要求对给定等位基因概率的情况下,计算基因型的概率分布。注意下基因型Aa与aA相同就好。

 

输入:

alleleFreqs = [0.1; 0.9];

genotypeVar = 1;

 

期望输出:

genotypeFactor = struct('var', [1], 'card', [3], 'val', [0.01,0.18,0.81]);

 

参考代码如下:

 

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeFactor.var = genotypeVar;

genotypeFactor.card = size(genotypesToAlleles, 1);



genotypeFactor.val = zeros(1, prod(genotypeFactor.card));



for ii = 1 : genotypeFactor.card

    if genotypesToAlleles(ii, 1) == genotypesToAlleles(ii, 2)

        genotypeFactor.val(ii) = alleleFreqs(genotypesToAlleles(ii, 1)) ^ 2;

    else

        genotypeFactor.val(ii) = 2 * alleleFreqs(genotypesToAlleles(ii, 1)) * ...

                                  alleleFreqs(genotypesToAlleles(ii, 2));

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
 

 

 

genotypeGivenParentsGenotypesFactor.m 父母基因型决定子女基因型的因子

 

这里var中存在3个参数,表示[子女的基因型编号,父母1的基因型编号,父母2的基因型编号]。

 

输入:

numAlleles = 2;

genotypeVarChild = 3;

genotypeVarParentOne = 1;

genotypeVarParentTwo = 2;

 

期望输出:

genotypeFactorPar = struct('var', [3,1,2], 'card', [3,3,3], 'val', [1,0,0,0.5,0.5,0,0,1,0,0.5,0.5,0,0.25,0.5,0.25,0,0.5,0.5,0,1,0,0,0.5,0.5,0,0,1]);

 

当然,numAlleles也可能等于3、4、5之类的,合理利用题目中提供的generateAlleleGenotypeMappers函数即可。

 

参考代码如下:

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeFactor.var = genotypeVar;

genotypeFactor.card = size(genotypesToAlleles, 1);



genotypeFactor.val = zeros(1, prod(genotypeFactor.card));



for ii = 1 : genotypeFactor.card

    if genotypesToAlleles(ii, 1) == genotypesToAlleles(ii, 2)

        genotypeFactor.val(ii) = alleleFreqs(genotypesToAlleles(ii, 1)) ^ 2;

    else

        genotypeFactor.val(ii) = 2 * alleleFreqs(genotypesToAlleles(ii, 1)) * ...

                                  alleleFreqs(genotypesToAlleles(ii, 2));

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

 

constructGeneticNetwork.m 构建基因贝叶斯网络

 

对比之前的贝叶斯网络结构图,我们可以看到构成网络的三个模板因子已经完成了,接下来的工作就是把它们拼起来……

 

输入:

pedigree = struct('parents', [0,0;1,3;0,0]);

pedigree.names = {'Ira','James','Robin'};

alleleFreqs = [0.1; 0.9];

alphaList = [0.8; 0.6; 0.1];

 

期望输出:

sampleFactorList = load('sampleFactorList.mat'); % 这个输出太多了,跑下程序就好,我们主要看输入结构哈~

 

这里,pedigree描述了参数直接的结构关系,就是谁是谁的爸爸谁是谁的儿子,他们这几个人叫什么……其它的参考上述练习即可。

 

参考代码如下:

  

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% Variable numbers:

% 1 - numPeople: genotype variables

% numPeople+1 - 2*numPeople: phenotype variables

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

for ii = 1 : numPeople

    if sum(pedigree.parents(ii, :)) == 0

        factorList(ii) = genotypeGivenAlleleFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = genotypeGivenParentsGenotypesFactor(numAlleles, ii, pedigree.parents(ii, 1), pedigree.parents(ii, 2));

    end

end

for ii = numPeople + 1 : 2 * numPeople

    factorList(ii) = phenotypeGivenGenotypeFactor(alphaList, ii - numPeople, ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

 

2.构建解耦贝叶斯网络

 

前面我们构建的贝叶斯网络,并没有考虑到染色体对的影响。这里我们在考虑基因成对出现的情况下,构建新的贝叶斯网络如下图所示:

 

Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析_第2张图片

 

可以看到,在加入基因对的概念后,整个网络各变量之间存在比较复杂的相互影响关系,也就是耦合关系。这大概也就是题目称之为"解耦网络"的原因——我们要通过新的模板因子将网络进行简化,以实现对这种复杂关系的建模。

 

phenotypeGivenCopiesFactor.m 成对基因决定性状的因子

 

输入:

alphaListThree = [0.8; 0.6; 0.1; 0.5; 0.05; 0.01];

numAllelesThree = 3;

genotypeVarMotherCopy = 1;

genotypeVarFatherCopy = 2;

phenotypeVar = 3;

 

期望输出:

phenotypeFactorPar = struct('var', [3,1,2], 'card', [2,3,3], 'val', [0.8,0.2,0.6,0.4,0.1,0.9,0.6,0.4,0.5,0.5,0.05,0.95,0.1,0.9,0.05,0.95,0.01,0.99]);

 

注意alphaList的长度等于numAlleles的组合数,也就是genotypesToAlleles的列数。

 

参考代码如下:

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% The number of phenotypes is 2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, geneCopyVarOne, geneCopyVarTwo];

phenotypeFactor.card = [2, numAlleles, numAlleles];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));



allelesCombination = sort(assignments(:, [2, 3]), 2);

for ii = 1 : size(genotypesToAlleles, 1)

    phenotypeFactor.val(find(sum(allelesCombination == genotypesToAlleles(ii, :), 2) == 2 & assignments(:, 1) == 1)) = alphaList(ii);

    phenotypeFactor.val(find(sum(allelesCombination == genotypesToAlleles(ii, :), 2) == 2 & assignments(:, 1) == 2)) = 1 - alphaList(ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 

constructDecoupledGeneticNetwork.m 构建解耦基因网络

 

之后,我们只要用成对基因决定性状的因子替代基因型决定性状的因子,之后调整下之前网络的输入输出结构就好~

 

输入:

pedigree = struct('parents', [0,0;1,3;0,0]);

pedigree.names = {'Ira','James','Robin'};

alleleFreqsThree = [0.1; 0.7; 0.2];

alleleListThree = {'F', 'f', 'n'};

alphaListThree = [0.8; 0.6; 0.1; 0.5; 0.05; 0.01];

 

期望输出:

sampleFactorListDecoupled = load('sampleFactorListDecoupled.mat'); % 这个输出也太多了……

 

参考代码如下:

 

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% INSERT YOUR CODE HERE

% Variable numbers:

% 1 - numPeople: first parent copy of gene variables

% numPeople+1 - 2*numPeople: second parent copy of gene variables

% 2*numPeople+1 - 3*numPeople: phenotype variables

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

for ii = 1 : numPeople

    if sum(pedigree.parents(ii, :)) == 0

        factorList(ii) = childCopyGivenFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = childCopyGivenParentalsFactor(numAlleles, ii, pedigree.parents(ii, 1), pedigree.parents(ii, 1) + numPeople);

    end

end

for ii = numPeople + 1 : 2 * numPeople

    if sum(pedigree.parents(ii - numPeople, :)) == 0

        factorList(ii) = childCopyGivenFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = childCopyGivenParentalsFactor(numAlleles, ii, pedigree.parents(ii - numPeople, 2), pedigree.parents(ii - numPeople, 2) + numPeople);

    end

end

for ii = 2 * numPeople + 1 : 3 * numPeople

    factorList(ii) = phenotypeGivenCopiesFactor(alphaList, numAlleles, ii - 2 * numPeople, ii - numPeople, ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

 

 

可喜可贺,到这里我们就只剩下一个sigmoid贝叶斯网络啦~

 

3.构建性状由多基因决定的贝叶斯网络

 

说到sigmoid,我想听了机器学习相关课程的同学应该都很熟,这里就不累述了,总之原理和逻辑回归(或者是单sigmoid核的感知机)一模一样,理解起来也没啥难度的说。

 

Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析_第3张图片

 

Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析_第4张图片

 

上边第二个图出自林轩田老师的机器学习基石课程ppt,模型是一模一样的。

 

constructSigmoidPhenotypeFactor.m 构建sigmoid性状因子

 

输入:

alleleWeights = {[3, -3], [0.9, -0.8]};

geneCopyVarParentOneList = [1; 2];

geneCopyVarParentTwoList = [4; 5];

phenotypeVar = 3;

 

期望输出:

phenotypeFactorSigmoid = struct('var', [3,1,2,4,5], 'card', [2,2,2,2,2], 'val', [0.999590432835014,0.000409567164986080,0.858148935099512,0.141851064900488,0.997762151478724,0.00223784852127629,0.524979187478940,0.475020812521060,0.858148935099512,0.141851064900488,0.0147740316932731,0.985225968306727,0.524979187478940,0.475020812521060,0.00273196076301106,0.997268039236989,0.997762151478724,0.00223784852127629,0.524979187478940,0.475020812521060,0.987871565015726,0.0121284349842742,0.167981614866076,0.832018385133925,0.524979187478940,0.475020812521060,0.00273196076301106,0.997268039236989,0.167981614866076,0.832018385133925,0.000500201107079564,0.999499798892920]);

 

嘛嘛,注意下基因排列的顺序就好啦~(我才不会说我看错顺序了……)

 

参考代码如下:

 

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% INSERT YOUR CODE HERE

% Note that computeSigmoid.m will be useful for this function.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, geneCopyVarOneList', geneCopyVarTwoList'];

phenotypeFactor.card = [2, repmat(length(alleleWeights{1}), [1, length(geneCopyVarOneList)]), ...

                        repmat(length(alleleWeights{2}), [1, length(geneCopyVarTwoList)])];



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



for ii = 1 : prod(phenotypeFactor.card)

    sumWeights = 0;

    for k1 = 1 : length(geneCopyVarOneList')

        sumWeights = sumWeights + alleleWeights{k1}(assignments(ii, k1 + 1));

    end

    for k2 = 1 : length(geneCopyVarTwoList')

        sumWeights = sumWeights + alleleWeights{k2}(assignments(ii, k2 + length(geneCopyVarOneList') + 1));

    end

    if assignments(ii, 1) == 1

        phenotypeFactor.val(ii) = computeSigmoid(sumWeights);

    else

        phenotypeFactor.val(ii) = 1 - computeSigmoid(sumWeights);

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 

 

最后附上成绩截图~哦哈哈哈哈哈……好像也没啥了不起(0.0'),继续加油~

 

Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析_第5张图片

转载于:https://www.cnblogs.com/polaris-shi/p/9499424.html

你可能感兴趣的:(Coursera概率图模型(Probabilistic Graphical Models)第二周编程作业分析)