胡笨笨硕士期间项目心得之项目一体系结构化推理场景

项目一:体系结构化推理场景:

苟利国家生死以,岂因祸福避趋之。本算法适用于一切体系结构推理。重点,一切!!!!这就是个例子。


需求分析及数据集构建

在参加地学口的项目中,最令人头大的不是算法构建,也不需求分析(这句话是违心的,通常他们也不知道需求是什么),最令人头大的是没有数据集。现在深度学习能做吗,当然能做,好做吗,不好做。为啥?没有数据集呗,所以搞定数据集,让数据集可用,才是项目的第一奥义。别人只会给你认为可能有用的,但是不会给你觉得有用的,所以诸君且行且珍惜。
好了,言归正传当需求给出给河流相体系结构时候,输入任意个河流相名称,需要正确返回最大可能性的次父类节点到父类节点。节点是层级是明确的,那如何依次寻找最大可能性成为了问题,这里面分为三种情况,第一种,全部名称都是存在节点,且节点都在一条父子链路上,那样就很简单。第二种,有些名称在兄弟节点或者,最大可能性父类节点上的兄弟节点。第三种存在干扰名称,这些名称不存在与节点中。
根据项目要求需要将数据集存入知识图谱(这个在下一章知道图谱构建和国产Neo4j中也有案例详细涉及),我们可以将知识图谱简单的认为是Mysql。构建完成效果如下:

胡笨笨硕士期间项目心得之项目一体系结构化推理场景_第1张图片

推理思想解析:

这时候一定有很多想法,比如用决策树,用哈希树等等, 而我那时候拿到该小案例项目时候,想到的解决策略是用占比的思想,简单来说,就是有的都有,没有的都没有,如果输入错误或构建中构建节点错误,那么查的时候整体比重将不会改变,所有什么用占比来构建,那下一步就是聚类算法,聚类算法字面意思是玩的来的就聚成一类,玩不来的就滚,莫挨老子。所以当输入一堆名称时候,我们先不急着一个个推出所有路径,我们先看这一堆词占哪些父类下的子类节点,那就是先把这些父类加入提取出来,同时把占比量提取出来。如此一下,再把父类提取一下,看一下,父类占父类的父类占比,这样就能一层层推出来,最后一定是那唯一一个最大总父类节点。这样这个项目就简单了很多,而不是不知道该干什么。
这样我们将图谱转换为一层层,层级关系,这边我们轻举例子,这是简单罗列三道四层。

0就是代表最大父类节点的一个分支,之后一层类推
# 0
table.add_document("Fluvial Facies", ["Meandering river","Braided river","Straight river","Anastomosed river"])
# 1
table.add_document("Meandering river", ["River channel", "Point bar", "Erosion ditch", "Natural levee", "Embankment","Flood fan","Flood plain","Meander river","Oxbow lake","Coarse clastic material",
                                        "Coarse gravelly material","Plant trunk","Lenticle","Gravel","Sand","Conglomerate","Lmbricate structure","Crosslamination",
                                        "Arcose", "Lithic sandstone","Sand body","Scour surface","Large and medium water bedding","Small sand lamination","Large scale lateral accretion structure","Conglomerate",
                                        "Coarse grained sand","Medium fine sand","Fine grained sand","Fine sand or coarse silt","Large trough cross bedding","Parallel bedding","Clintheriform bedding","Sand grain bedding",
                                        "Gravelly coarse sandstone","Medium grained sandstone","Thin layers of siltstone","Sandstone","Siltstone","Small clintheriform progradation cross bedding",
                                        "Gravelly sandstone","Gravel","Large scale progradational cross bedding","Conglomeratic sandstone","Progradational cross bedding",
                                        "Mudstone lenticle","Medium-coarse grained sandstone","Large clintheriform type foreset series","Foreset bed","Silty mudstone",
                                        "Lenticular","Sand","Parallel bedding","Thin layers of forebedding cross bedding",
                                        "Small trough cross-bedding","Silty sand","Small sand laminae","Small trough layer","Large clintheriform cross bedding",
                                        "Large trough bedding","Fine point bar","Rough point bar","Erosion ditch sand bar", "Siltation and deposition in erosion ditch","Coarse to medium debris",
                                        "Lenticle","Argillaceous","Retained sediment","Caliche nodule", "Breccia structure","Small fold and fault","Suspended load", "Siltstone","Mudstone",
                                        "A thin bedded interbed of silty and argillaceous rocks","The level of laminated","Dry crack","Plant roots","Worm boring","Fine sandstone", "Siltstone","Medium cross-bedding",
                                        "Small cross-bedding","Scour structure","Filling structure","Graded bedding","Fine silt and clay", "Caliche nodule","Horizontal bedding","Small sand grain bedding",
                                        "Overlying sand grain bedding","Mud crack","Wormtrail","Back swamp","Peat layer","Freshwater lake life","River flood lake",
                                        "Siltstone","Argillaceous rock","Horizontal lamina","Intermittent sand grain bedding","Caliche nodule","Iron nodules","Erosion ditch cutoff", "Neck cutoff","Sand grain bedding","Mud","Silt","Siltstone", "Mudstone"])
# 1
table.add_document("Braided river", ["Channel bar", "Diara","Lenticular","Bottom scour structure","Trough structure","Impact pit structure","Large trough cross bedding","Large wedge cross bedding",
                                     "Tabular cross bedding","Scour surface","Retrograde sand grain bedding","Longitudinal bar","Transverse bar","Diagonal bar","Imbricate structure", "Coarse crumb material",
                                     "Grain structure","Gravel","Massive","Coarse sand","Reacting surface structure","Sand","Interlayers","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples",
                                     "Massive structure","Parallel Bedding"])
# 1
table.add_document("Straight river", ["Scour pit", "Shoal","deep groove"])
# 1 4
table.add_document("Anastomosed river", ["Channel subfacies","Diara subfacies","Flood plain subfacies"])
# 2
table.add_document("River channel", ["Coarse clastic material", "Coarse gravelly material","Plant trunk","Lenticle","Gravel","Sand","Conglomerate","Lmbricate structure","Crosslamination","Lag conglomerate"])
# 2
table.add_document("Point bar", ["Arcose", "Lithic sandstone","Sand body","Scour surface","Large and medium water bedding","Small sand lamination","Large scale lateral accretion structure","Conglomerate",
                                 "Coarse grained sand","Medium fine sand","Fine grained sand","Fine sand or coarse silt","Large trough cross bedding","Parallel bedding","Clintheriform bedding","Sand grain bedding",
                                 "Gravelly coarse sandstone","Medium grained sandstone","Thin layers of siltstone","Sandstone","Siltstone","Small clintheriform progradation cross bedding",
                                 "Gravelly sandstone","Gravel","Large scale progradational cross bedding","Conglomeratic sandstone","Progradational cross bedding",
                                 "Mudstone lenticle","Medium-coarse grained sandstone","Large clintheriform type foreset series","Foreset bed","Silty mudstone",
                                 "Lenticular","Sand","Parallel to the bedding","Thin layers of forebedding cross bedding",
                                 "Small trough cross-bedding","Silty sand","Small sand laminae","Small trough layer","Large clintheriform cross bedding",
                                 "Large trough bedding","Fine point bar","Rough point bar","Lag conglomerate"])
# 2
table.add_document("Erosion ditch", ["Erosion ditch sand bar", "Siltation and deposition in erosion ditch","Coarse to medium debris","Lenticle","Argillaceous","Retained sediment"])
# 2
table.add_document("Natural levee", ["Caliche nodule", "Breccia structure","Small fold and fault"])
# 2
table.add_document("Embankment", ["Suspended load", "Siltstone","Mudstone","A thin bedded interbed of silty and argillaceous rocks","The level of laminated","Dry crack","Plant roots","Worm boring"])
# 2
table.add_document("Flood fan", ["Fine sandstone", "Siltstone","Medium cross-bedding","Small cross-bedding","Scour structure","Filling structure","Graded bedding"])
# 2
table.add_document("Flood plain", ["Fine silt and clay", "Caliche nodule","Horizontal bedding","Small sand grain bedding","Overlying sand grain bedding","Mud crack","Wormtrail",
                                   "Back swamp","Peat layer","Freshwater lake life","River flood lake","Siltstone","Argillaceous rock","Horizontal lamina","Intermittent sand grain bedding",
                                   "Caliche nodule","Iron nodules"])
# 2
table.add_document("Meander river", ["Erosion ditch cutoff", "Neck cutoff","Sand grain bedding","Mud","Silt"])
# 2
table.add_document("Oxbow lake", ["Siltstone", "Mudstone"])
# Diara2
table.add_document("Diara", ["Sand body", "Lenticular","Bottom scour structure","Trough structure","Impact pit structure","Large trough cross bedding","Large wedge cross bedding","Tabular cross bedding",
                             "Scour surface","Gravel","Retrograde sand grain bedding"])
# Channel bar2
table.add_document("Channel bar", ["Longitudinal bar","Transverse bar","Diagonal bar","Imbricate structure", "Coarse crumb material","Grain structure","Gravel","Massive",
                                   "Coarse sand","Reacting surface structure","Sand","Interlayers","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples",
                                   "Massive structure","Parallel Bedding"])
# Channel bar3
table.add_document("Longitudinal bar", ["Imbricate structure", "Coarse crumb material","Grain structure","Gravel","Massive","Coarse sand","Reacting surface structure"])
# 3
table.add_document("Transverse bar", ["Gravel", "Massive","Sand","Interlayers"])
# 3
table.add_document("Diagonal bar", ["Gravel","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples","Imbricate structure","Massive structure","Parallel Bedding","Sand","Interlayers"])


解决方案:

提示:这里填写该问题的具体解决方案:

当构建好层级关系后,最后一步就是选一个聚类算法,这里我用的是tf-idf为什么呢,因为我搞自然语言处理,我觉得整个命名体识别,全靠一个Tf-Idf和词性标注撑着(这个后面博客会陆续更新)。所以很熟悉这个算法原理,这样就变的很简单了起来,用于一个Tf-Idf然后稍微改一下下,在加上结构化需求,给每一层最大可能性排序,然后根据最大可能性往下连接最大可能性,排序就用python里面sort函数就行,为什么不用什么算法排序呢,求求了不要装了,代码写的简洁大家看得懂就好了。别问我Tf-Idf怎么实现的,这个也太弱了吧,网上很多帖子,查一下就可以了,甚至只要知道它是个聚类算法,代码能看懂,找到在哪改就好了。最后还需要加一个如果都不是的情况下案例输出,代码案例如下

from tfidf import TfIdf
table = TfIdf()
# 0
table.add_document("Fluvial Facies", ["Meandering river","Braided river","Straight river","Anastomosed river"])
# 1
table.add_document("Meandering river", ["River channel", "Point bar", "Erosion ditch", "Natural levee", "Embankment","Flood fan","Flood plain","Meander river","Oxbow lake","Coarse clastic material",
                                        "Coarse gravelly material","Plant trunk","Lenticle","Gravel","Sand","Conglomerate","Lmbricate structure","Crosslamination",
                                        "Arcose", "Lithic sandstone","Sand body","Scour surface","Large and medium water bedding","Small sand lamination","Large scale lateral accretion structure","Conglomerate",
                                        "Coarse grained sand","Medium fine sand","Fine grained sand","Fine sand or coarse silt","Large trough cross bedding","Parallel bedding","Clintheriform bedding","Sand grain bedding",
                                        "Gravelly coarse sandstone","Medium grained sandstone","Thin layers of siltstone","Sandstone","Siltstone","Small clintheriform progradation cross bedding",
                                        "Gravelly sandstone","Gravel","Large scale progradational cross bedding","Conglomeratic sandstone","Progradational cross bedding",
                                        "Mudstone lenticle","Medium-coarse grained sandstone","Large clintheriform type foreset series","Foreset bed","Silty mudstone",
                                        "Lenticular","Sand","Parallel bedding","Thin layers of forebedding cross bedding",
                                        "Small trough cross-bedding","Silty sand","Small sand laminae","Small trough layer","Large clintheriform cross bedding",
                                        "Large trough bedding","Fine point bar","Rough point bar","Erosion ditch sand bar", "Siltation and deposition in erosion ditch","Coarse to medium debris",
                                        "Lenticle","Argillaceous","Retained sediment","Caliche nodule", "Breccia structure","Small fold and fault","Suspended load", "Siltstone","Mudstone",
                                        "A thin bedded interbed of silty and argillaceous rocks","The level of laminated","Dry crack","Plant roots","Worm boring","Fine sandstone", "Siltstone","Medium cross-bedding",
                                        "Small cross-bedding","Scour structure","Filling structure","Graded bedding","Fine silt and clay", "Caliche nodule","Horizontal bedding","Small sand grain bedding",
                                        "Overlying sand grain bedding","Mud crack","Wormtrail","Back swamp","Peat layer","Freshwater lake life","River flood lake",
                                        "Siltstone","Argillaceous rock","Horizontal lamina","Intermittent sand grain bedding","Caliche nodule","Iron nodules","Erosion ditch cutoff", "Neck cutoff","Sand grain bedding","Mud","Silt","Siltstone", "Mudstone"])
# 1
table.add_document("Braided river", ["Channel bar", "Diara","Lenticular","Bottom scour structure","Trough structure","Impact pit structure","Large trough cross bedding","Large wedge cross bedding",
                                     "Tabular cross bedding","Scour surface","Retrograde sand grain bedding","Longitudinal bar","Transverse bar","Diagonal bar","Imbricate structure", "Coarse crumb material",
                                     "Grain structure","Gravel","Massive","Coarse sand","Reacting surface structure","Sand","Interlayers","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples",
                                     "Massive structure","Parallel Bedding"])
# 1
table.add_document("Straight river", ["Scour pit", "Shoal","deep groove"])
# 1 4
table.add_document("Anastomosed river", ["Channel subfacies","Diara subfacies","Flood plain subfacies"])
# 2
table.add_document("River channel", ["Coarse clastic material", "Coarse gravelly material","Plant trunk","Lenticle","Gravel","Sand","Conglomerate","Lmbricate structure","Crosslamination","Lag conglomerate"])
# 2
table.add_document("Point bar", ["Arcose", "Lithic sandstone","Sand body","Scour surface","Large and medium water bedding","Small sand lamination","Large scale lateral accretion structure","Conglomerate",
                                 "Coarse grained sand","Medium fine sand","Fine grained sand","Fine sand or coarse silt","Large trough cross bedding","Parallel bedding","Clintheriform bedding","Sand grain bedding",
                                 "Gravelly coarse sandstone","Medium grained sandstone","Thin layers of siltstone","Sandstone","Siltstone","Small clintheriform progradation cross bedding",
                                 "Gravelly sandstone","Gravel","Large scale progradational cross bedding","Conglomeratic sandstone","Progradational cross bedding",
                                 "Mudstone lenticle","Medium-coarse grained sandstone","Large clintheriform type foreset series","Foreset bed","Silty mudstone",
                                 "Lenticular","Sand","Parallel to the bedding","Thin layers of forebedding cross bedding",
                                 "Small trough cross-bedding","Silty sand","Small sand laminae","Small trough layer","Large clintheriform cross bedding",
                                 "Large trough bedding","Fine point bar","Rough point bar","Lag conglomerate"])
# 2
table.add_document("Erosion ditch", ["Erosion ditch sand bar", "Siltation and deposition in erosion ditch","Coarse to medium debris","Lenticle","Argillaceous","Retained sediment"])
# 2
table.add_document("Natural levee", ["Caliche nodule", "Breccia structure","Small fold and fault"])
# 2
table.add_document("Embankment", ["Suspended load", "Siltstone","Mudstone","A thin bedded interbed of silty and argillaceous rocks","The level of laminated","Dry crack","Plant roots","Worm boring"])
# 2
table.add_document("Flood fan", ["Fine sandstone", "Siltstone","Medium cross-bedding","Small cross-bedding","Scour structure","Filling structure","Graded bedding"])
# 2
table.add_document("Flood plain", ["Fine silt and clay", "Caliche nodule","Horizontal bedding","Small sand grain bedding","Overlying sand grain bedding","Mud crack","Wormtrail",
                                   "Back swamp","Peat layer","Freshwater lake life","River flood lake","Siltstone","Argillaceous rock","Horizontal lamina","Intermittent sand grain bedding",
                                   "Caliche nodule","Iron nodules"])
# 2
table.add_document("Meander river", ["Erosion ditch cutoff", "Neck cutoff","Sand grain bedding","Mud","Silt"])
# 2
table.add_document("Oxbow lake", ["Siltstone", "Mudstone"])
# Diara2
table.add_document("Diara", ["Sand body", "Lenticular","Bottom scour structure","Trough structure","Impact pit structure","Large trough cross bedding","Large wedge cross bedding","Tabular cross bedding",
                             "Scour surface","Gravel","Retrograde sand grain bedding"])
# Channel bar2
table.add_document("Channel bar", ["Longitudinal bar","Transverse bar","Diagonal bar","Imbricate structure", "Coarse crumb material","Grain structure","Gravel","Massive",
                                   "Coarse sand","Reacting surface structure","Sand","Interlayers","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples",
                                   "Massive structure","Parallel Bedding"])
# Channel bar3
table.add_document("Longitudinal bar", ["Imbricate structure", "Coarse crumb material","Grain structure","Gravel","Massive","Coarse sand","Reacting surface structure"])
# 3
table.add_document("Transverse bar", ["Gravel", "Massive","Sand","Interlayers"])
# 3
table.add_document("Diagonal bar", ["Gravel","Sandy thallus","Gravel sheet","Gravel layer","Sand body","Sand ripples","Imbricate structure","Massive structure","Parallel Bedding","Sand","Interlayers"])

inplist = []
char =input()
while char!="":
    inplist.append(char)
    print(char)
    char=input()
# ["Gravel","Parallel bedding"]
# a = table.similarities(["Gravel","Parallel bedding"])
a = table.similarities(inplist)
# print(a)
def takeSecond(list1):
    return list1[1]
strFF1 = []
strFF2 = []
strFF3 = []
for i in range(0,4):
    # if a[i][1] !=0.0:
    strFF1.append(a[i])
strFF1.sort(key=takeSecond,reverse=True)
for i in range(5,16):
    # if a[i][1] !=0.0:
    strFF2.append(a[i])
strFF2.sort(key=takeSecond,reverse=True)
for i in range(16,19):
    # if a[i][1] !=0.0:
    strFF3.append(a[i])
strFF3.sort(key=takeSecond,reverse=True)

strF1=[]
strF2=[]
strF3=[]
for i in strFF1:
    if i[1]!=0.0:
        strF1.append(str(i[0]) + " " + str(i[1]))
for i in strFF2:
    if i[1]!=0.0:
        strF2.append(str(i[0]) + " " + str(i[1]))
for i in strFF3:
    if i[1]!=0.0:
        strF3.append(str(i[0]) + " " + str(i[1]))
# print(strFF1)
# print("=======================")
# print(strFF2)
# print("=======================")
# print(strFF3)
print("=======================")
print("The probability that the first hierarchy is not 0:")
print(strF1)
print("=======================")
print("The probability that the second hierarchy is not 0:")
print(strF2)
print("=======================")
print("The probability that the third hierarchy is not 0:")
print(strF3)
print("=========================")
print("The reasoning is as follows:")
MeanderingRiverPool=["River channel", "Point bar", "Erosion ditch", "Natural levee", "Embankment","Flood fan","Flood plain","Meander river","Oxbow lake"]

BraidedRiverPool=["Channel bar", "Diara"]

StraightRiverPool=["Scour pit","Shoal","deep groove"]

AnastomosedRiverPool=["Channel subfacies","Diara subfacies","Flood plain subfacies"]

TLLPool=["Longitudinal bar","Transverse bar","Diagonal bar"]

flagF1=0
F1=''
flagF2=0
F2=''
flagF3=0
F3=''
flagF4=0
F4=''
for i in range(len(strF1)):
    if strF1[i].find("Meandering river")!=-1:
        flagF1=1
        F1=F1+strF1[i]
for i in range(len(strF1)):
    if strF1[i].find("Braided river")!=-1:
        flagF2=1
        F2 = F2 + strF1[i]
for i in range(len(strF1)):
    if strF1[i].find("Straight river")!=-1:
        flagF3=1
        F3 = F3 + strF1[i]
for i in range(len(strF1)):
    if strF1[i].find("Anastomosed river")!=-1:
        flagF4=1
        F4 = F4 + strF1[i]
if strF1:
    if strF2:
        for f2 in range(len(strF2)):
            # MeanderingRiver
            for MP in range(len(MeanderingRiverPool)):
                S1=''
                S1=F1
                if (strF2[f2].find(MeanderingRiverPool[MP]) != -1):
                    S1 = S1+'--->'+strF2[f2]
                    print(S1)
            # BraidedRiver
            for BP in range(len(BraidedRiverPool)):
                S2=''
                S2=F2
                if (strF2[f2].find(BraidedRiverPool[BP]) != -1):
                    S2 = S2+'--->'+strF2[f2]
                    print(S2)
                    if strF3:
                        for f3 in range(len(strF3)):
                            for TL in range(len(TLLPool)):
                                M21 = ''
                                M21 = S2
                                if (strF3[f3].find(TLLPool[TL]) != -1 and strF2[f2].find("Channel bar") != -1):
                                    M21 = M21 + '--->' + strF3[f3]
                                    print(M21)
    if flagF3==1:
        print(F3)
    if flagF4==1:
        print(F4)

    # # AnastomosedRiver
    # for AP in range(len(AnastomosedRiverPool)):
    #     S4 = ''
    #     S4 = F4
    #     if (strF2[f2].find(AnastomosedRiverPool[AP]) != -1):
    #         S4 = S4 + '--->' + strF2[f2]
    #         print(S4)




else:
    print("抱歉请重新运行输入")

实现效果案例如下

胡笨笨硕士期间项目心得之项目一体系结构化推理场景_第2张图片

胡笨笨硕士期间项目心得之项目一体系结构化推理场景_第3张图片

这是项目实现的一个阶段,本人对此发表一项专利“基于数据挖掘和树状结构的河流相知识图谱反推方法”,介绍了项目大致技术流程,之后还会详细介绍所有的技术。本博客系列主要介绍面对问题处理方式,思维理解,和一些解决问题技巧。因为技术诸君都能git到,我也无需多言。数据集可私聊我要,因为在此,github还没有上传好,哦嘿嘿嘿,下一个博客就会有这份代码数据集全部连接。

你可能感兴趣的:(python,知识图谱,人工智能,算法,聚类)