天空没有任何界限

机器学习实战决策树(附数据集)

运行环境：Anaconda——Jupyter Notebook
Python版本为：3.6.6

数据集:lense.txt
提取码：9wsp

1.决策树

决策树也是最经常使用的数据挖掘算法，长方形代表判断模块（decision block），椭圆形代表终止模块（terminating block），表示已经得出结论，可以终止运行。从判断模块引出的左右箭头称作分支（branch），它可以到达另一个判断模块或者终止模块。

k-近邻算法最大的缺点就是无法给出数据的内在含义，决策树的主要优势就在于数据形式非常容易理解。

决策树算法能够读取数据集合，决策树的一个重要任务是为了数据中所蕴含的知识信息，因此决策树可以使用不熟悉的数据集合，并从中提取出一系列规则，在这些机器根据数据集创建规则时，就是机器学习的过程。

1.1 决策树的构造

决策树

优点：计算复杂度不高，输出结果易于理解，对中间值的缺失不敏感，可以处理不相关特征数据。

缺点：可能会产生过度匹配问题。

适用数据类型：数值型和标称型。

首先我们讨论数学上如何使用信息论划分数据集，然后编写代码将理论应用到具体的数据集上，最后编写代码构建决策树。

创建分支的伪代码函数createBranch()如下所示：

检测数据集中的每个子项是否属于同一分类：

If so return 类标签；

Else

    寻找划分数据集的最好特征
    
    划分数据集
    
    创建分支节点
    
        for 每个划分的子集
        
            调用函数createBranch并增加返回结果到分支节点中
            
    return 分支节点

决策树的一般流程

(1) 收集数据：可以使用任何方法。

(2) 准备数据：树构造算法只适用于标称型数据，因此数值型数据必须离散化。

(3) 分析数据：可以使用任何方法，构造树完成之后，我们应该检查图形是否符合预期。

(4) 训练算法：构造树的数据结构。

(5) 测试算法：使用经验树计算错误率。

(6) 使用算法：此步骤可以适用于任何监督学习算法，而使用决策树可以更好地理解数据的内在含义。

本文使用ID3算法划分数据集，该算法处理如何划分数据集，何时停止划分数据集，每次划分数据集时我们只选取一个特征属性。
现在我们想要决定依据第一个特征还是第二个特征划分数据。在回答这个问题之前，我们必须采用量化的方法判断如何划分数据。
在日常生活中，极少发生的事件一旦发生是容易引起人们关注的（新闻说发生空难了，那必然会引起人们很大的关注，但事实是发生空难的概率很小很小），而司空见惯的事不会引起注意，也就是说，极少见的事件所带来的信息量多。如果用统计学的术语来描述，就是出现概率小的事件信息量多。因此，事件出现得概率越小，信息量愈大。即信息量的多少是与事件发生频繁（即概率大小）成反比。

2.1.1 信息增益

划分数据集的大原则是：将无序的数据变得更加有序。

我们可以在划分数据之前或之后使用信息论量化度量信息的内容。

在划分数据集之前之后信息发生的变化称为信息增益。

获得信息增益最高的特征就是最好的选择。

在可以评测哪种数据划分方式是最好的数据划分之前，我们必须学习如何计算信息增益。集合信息的度量方式称为香农熵或者简称为熵。

我们日常生活中会接收到无数的消息，但是只有那些你关心在意（或对你有用）的才叫做信息。

熵定义为信息的期望值，如果待分类的事务可能划分在多个分类之中，则符号xi的信息定义为：
其中p(xi)是选择该分类的概率。
为了计算熵，我们需要计算所有类别所有可能值包含的信息期望值:

def dataSet():
    dataSet = [[1,1,'yes'],[1,1,'yes'],[1,0,'no'],[0,1,'no'],[0,1,'no']]
    labels = ['no surfacing','flippers']
    return dataSet,labels

myDat,labels = dataSet()

myDat

[[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]

labels

['no surfacing', 'flippers']

# 计算给定数据集的香农熵
from math import log
def caclShannonEnt(dataSet):
    #计算实例总数
    numEntries = len(dataSet)
    labelCounts = {}
    # 1.为所有可能分类创建字典
    for featVec in dataSet:
        currentLabel = featVec[-1]
        # 为所有可能的分类创建字典，如果当前的键值不存在，则扩展字典并将当前键值加入字典。每个键值都记录了当前类别出现的次数。
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1
    shannonEnv = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries
        # 2.以2为底求对数
        shannonEnv -= prob*log(prob,2)
    return shannonEnv

caclShannonEnt(myDat)

0.9709505944546686

熵越高，则混合的数据也越多，在数据集中添加更多的分类，观察熵是如何变化的。

得到熵之后，我们就可以按照获取最大信息增益的方法划分数据集

myDat[0][-1] = 'maybe'
myDat

[[1, 1, 'maybe'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]

caclShannonEnt(myDat)

1.3709505944546687

1.1.2 划分数据集

分类算法除了需要测量信息熵，还需要划分数据集，度量划分数据集的熵，以便判断当前是否正确地划分了数据集。我们将对每个特征划分数据集的结果计算一次信息熵，然后判断按照哪个特征划分数据集是最好的划分方式。

# 程序清单 按照给定特征划分数据集
def splitDataSet(dataSet,axis,value):
    # 1.创建新的list对象
    retDataSet = []
    for featVec in dataSet:
        if (featVec[axis] == value):
            # 2.抽取
            reducedFeatVec = featVec[:axis]
            reducedFeatVec.extend(featVec[axis+1:])
            retDataSet.append(reducedFeatVec)
    return retDataSet

splitDataSet(myDat,0,1)

[[1, 'maybe'], [1, 'yes'], [0, 'no']]

splitDataSet(myDat,0,0)

[[1, 'no'], [1, 'no']]

遍历整个数据集，循环计算香农熵和splitDataSet()函数，找到最好的特征划分方式

def chooseBestFeatureToSplit(dataSet):
    numFeatures = len(dataSet[0])-1
    baseEntropy = caclShannonEnt(dataSet)
    bestInfoGain = 0.0
    bestFeature = -1
    for i in range(numFeatures):
        # print('i:',i)
        featList = [example[i] for example in dataSet]
        # print(featList)
        uniqueVals = set(featList)
        # print(uniqueVals)
        newEntropy = 0.0
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet,i,value)
            prob = len(subDataSet)/float(len(dataSet))
            newEntropy += prob*log(prob,2)
            # print(value,prob,subDataSet)
        if (baseEntropy - newEntropy > bestInfoGain):
            bestInfoGain = baseEntropy - newEntropy
            bestFeature = i
    return bestFeature

chooseBestFeatureToSplit(myDat)

1.1.3 递归构建决策树

# 多数表决
import operator
def majorityCnt(classList):
    classCount = {}
    for vote in classCount:
        if vote not in classCount.keys():
            classCount[vote] = 0
        classCount[vote] += 1
        sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
        return sortedClassCount[0][0]

def createTree(dataSet,labels):
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0])==len(classList):
        return classList[0]
    if len(dataSet[0])==1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    print('bestFeat:',bestFeat)
    bestFeatLabel = labels[bestFeat]
    print('bestFeatLabel:',bestFeatLabel)
    myTree = {bestFeatLabel:{}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    print('featValues:',featValues)
    uniqueVals = set(featValues)
    print('uniqueVals:',uniqueVals)
    for value in uniqueVals:
        subLabels = labels[:]
        print(splitDataSet(dataSet,bestFeat,value))
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,bestFeat,value),subLabels)
        print('myTree:',myTree)
    return myTree

myTree = createTree(myDat,labels)

bestFeat: 0
bestFeatLabel: no surfacing
featValues: [1, 1, 1, 0, 0]
uniqueVals: {0, 1}
[[1, 'no'], [1, 'no']]
myTree: {'no surfacing': {0: 'no'}}
[[1, 'maybe'], [1, 'yes'], [0, 'no']]
bestFeat: 0
bestFeatLabel: flippers
featValues: [1, 1, 0]
uniqueVals: {0, 1}
[['no']]
myTree: {'flippers': {0: 'no'}}
[['maybe'], ['yes']]
myTree: {'flippers': {0: 'no', 1: None}}
myTree: {'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: None}}}}

1.22 在Python中使用Matplotlib注解绘制树形图

import matplotlib.pyplot as plt

decisionNode = dict(boxstyle='sawtooth',fc='0.8')
leafNode = dict(boxstyle='round4',fc='0.8')
arrow_args = dict(arrowstyle='<-')

def plotNode(nodeText,centerPt,parentPt,nodeType):
# nodeTxt为要显示的文本，centerPt为文本的中心点，parentPt为指向文本的点 
    createPlot.ax1.annotate(nodeText,xytext=centerPt,textcoords="axes fraction",\
                        xy=parentPt,xycoords="axes fraction",\
                       va="center",ha="center",bbox=nodeType,arrowprops=arrow_args)
def createPlot():
    fig = plt.figure(1,facecolor='white')
    fig.clf()
    createPlot.ax1 = plt.subplot(111,frameon=False)
    plotNode(U"决策节点",(0.5,0.1),(0.1,0.5),decisionNode)
    plotNode(U"叶子节点",(0.8,0.1),(0.3,0.8),leafNode)
    plt.show()

# 求叶子节点数
def getNumLeafs(myTree):
    numNode = 0
    firstStr = list(myTree.keys())[0]
    secondDict = myTree[firstStr]
    for key in secondDict.keys():
        if type(secondDict[key]).__name__ == 'dict':
            numNode += getNumLeafs(secondDict[key])
        else:
            numNode += 1
    return numNode

getNumLeafs(myTree)

#获取决策树的深度
def getTreeDepth(myTree):
    maxDepth = 0
    firstStr = list(myTree.keys())[0]
    secondDict = myTree[firstStr]
    for key in secondDict.keys():
        if type(secondDict[key]).__name__ == 'dict':
            thisDepth = 1 + getTreeDepth(secondDict[key])
        else:
            thisDepth = 1
    return thisDepth

getTreeDepth(myTree)

接着，函数retrieveTree输出预先存储的树信息，避免每次测试代码时都要从数据中创建树的函数。

#预定义的树，用来测试
def retrieveTree(i):
    listOfTrees = [
        {'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}},
        {'no surfacing': {0: 'no', 1: {'flippers': {0: {'head': {0: 'no', 1: 'yes'}}, 1: 'no'}}}}
    ]
    return listOfTrees[i]

myTree = retrieveTree(0)

myTree

{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}

print('getNumLeaf: %d,getNumDepth: %d' %(getNumLeafs(myTree),getTreeDepth(myTree)))

getNumLeaf: 3,getNumDepth: 2

labels = ['no surfacing', 'flippers']

#绘制中间文本（在父子节点间填充文本信息）
def plotMidText(cntrPt,parentPt,txtString):
    #求中间点的横坐标
    xMid = (parentPt[0]- cntrPt[0])/2.0 + cntrPt[0]
    #求中间点的纵坐标
    yMid = (parentPt[1] - cntrPt[1])/2.0 + cntrPt[1]
    #绘制树节点
    createPlot.ax1.text(xMid,yMid,txtString,va='center',ha='center',rotation=30)
#绘制决策树
def plotTree(myTree,parentPt,nodeTxt):
    #获得决策树的叶子节点数与深度
    numLeafs = getNumLeafs(myTree)
    depth = getTreeDepth(myTree)
    #firstStr = myTree.keys()[0]
    firstSides = list(myTree.keys())
    firstStr = firstSides[0]
    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalw,plotTree.yOff)
    print('c:',cntrPt)
    plotMidText(cntrPt,parentPt,nodeTxt)
    plotNode(firstStr,cntrPt,parentPt,decisionNode)
    secondDict = myTree[firstStr]
    plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD
    print('d:',plotTree.yOff)
    for key in secondDict.keys():
        #如果secondDict[key]是一颗子决策树，即字典
        if type(secondDict[key]) is dict:
            #递归地绘制决策树
            plotTree(secondDict[key],cntrPt,str(key))
        else:
            plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalw
            print('e:',plotTree.xOff)
            plotNode(secondDict[key],(plotTree.xOff,plotTree.yOff),cntrPt,leafNode)
            plotMidText((plotTree.xOff,plotTree.yOff),cntrPt,str(key))
    plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD
    print('f:',plotTree.yOff)

def createPlot(inTree):
    fig = plt.figure(1,facecolor='white')
    fig.clf()
    axprops = dict(xticks=[],yticks=[])
    createPlot.ax1 = plt.subplot(111,frameon=False, **axprops)
    plotTree.totalw = float(getNumLeafs(inTree))
    plotTree.totalD = float(getTreeDepth(inTree))
    plotTree.xOff = -0.5/plotTree.totalw
    plotTree.yOff = 1.0
    plotTree(inTree,(0.5,1.0),'')
    plt.show()

axprops = dict(xticks=[],yticks=[])

xticks是一个列表，其中的元素就是x轴上将显示的坐标，yticks是y轴上显示的坐标，这里空列表则不显示坐标。

createPlot(retrieveTree(0))

c: (0.5, 1.0)
d: 0.5
e: 0.16666666666666666
c: (0.6666666666666666, 0.5)
d: 0.0
e: 0.5
e: 0.8333333333333333
f: 0.5
f: 1.0

#参数：inputTree--决策树模型 
#      featLabels--Feature标签对应的名称
#   testVec--测试输入的数据
#返回结果 classLabel分类的结果值(需要映射label才能知道名称)
def classify(inTree,featLabels,testVec):
    firstStr = list(inTree.keys())[0]
    secondDict = inTree[firstStr]
    featIndex = featLabels.index(firstStr)
    key = testVec[featIndex]
    valueOfFeat = secondDict[key]
    if isinstance(valueOfFeat,dict):
        classLabel = classify(valueOfFeat,featLabels,testVec)
    else:
        classLabel = valueOfFeat
    return classLabel

classify(myTree,labels,(1,0))

'no'

myTree = retrieveTree(1)

#使用pickle模块存储决策树
def storeTree(inputTree,filename):
    import pickle
    #创建一个可以'写'的文本文件
    #这里，如果按树中写的'w',将会报错write() argument must be str,not bytes
    #所以这里改为二进制写入'wb'
    with open(filename,'wb') as fw:
        pickle.dump(inputTree,fw) #将inputTree保存到fw中
    fw.close()

def grabTree(filename):
    import pickle
    #对应于二进制方式写入数据，'rb'采用二进制形式读出数据
    fr = open(filename,'rb')
    return pickle.load(fr) #读取

storeTree(myTree,'classifierStorage.txt')

grabTree('classifierStorage.txt')

{'no surfacing': {0: 'no',
  1: {'flippers': {0: {'head': {0: 'no', 1: 'yes'}}, 1: 'no'}}}}

fr = open('lenses.txt')
lenses = [inst.strip().split('\t') for inst in fr.readlines()]
lenseLabels = ['age','prescript','astigmatic','tearRate']
lenseTree = createTree(lenses,lenseLabels)
lenseTree

bestFeat: 0
bestFeatLabel: age
featValues: ['young', 'young', 'young', 'young', 'young', 'young', 'young', 'young', 'pre', 'pre', 'pre', 'pre', 'pre', 'pre', 'pre', 'pre', 'presbyopic', 'presbyopic', 'presbyopic', 'presbyopic', 'presbyopic', 'presbyopic', 'presbyopic', 'presbyopic']
uniqueVals: {'pre', 'presbyopic', 'young'}
[['myope', 'no', 'reduced', 'no lenses'], ['myope', 'no', 'normal', 'soft'], ['myope', 'yes', 'reduced', 'no lenses'], ['myope', 'yes', 'normal', 'hard'], ['hyper', 'no', 'reduced', 'no lenses'], ['hyper', 'no', 'normal', 'soft'], ['hyper', 'yes', 'reduced', 'no lenses'], ['hyper', 'yes', 'normal', 'no lenses']]
bestFeat: 0
bestFeatLabel: prescript
featValues: ['myope', 'myope', 'myope', 'myope', 'hyper', 'hyper', 'hyper', 'hyper']
uniqueVals: {'hyper', 'myope'}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'soft'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'no lenses']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'no lenses']]
myTree: {'astigmatic': {'yes': 'no lenses'}}
[['reduced', 'no lenses'], ['normal', 'soft']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['soft']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}
myTree: {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'soft'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'hard']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'hard']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['hard']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}}}
[['reduced', 'no lenses'], ['normal', 'soft']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['soft']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}
myTree: {'age': {'pre': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}}}
[['myope', 'no', 'reduced', 'no lenses'], ['myope', 'no', 'normal', 'no lenses'], ['myope', 'yes', 'reduced', 'no lenses'], ['myope', 'yes', 'normal', 'hard'], ['hyper', 'no', 'reduced', 'no lenses'], ['hyper', 'no', 'normal', 'soft'], ['hyper', 'yes', 'reduced', 'no lenses'], ['hyper', 'yes', 'normal', 'no lenses']]
bestFeat: 0
bestFeatLabel: prescript
featValues: ['myope', 'myope', 'myope', 'myope', 'hyper', 'hyper', 'hyper', 'hyper']
uniqueVals: {'hyper', 'myope'}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'soft'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'no lenses']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'no lenses']]
myTree: {'astigmatic': {'yes': 'no lenses'}}
[['reduced', 'no lenses'], ['normal', 'soft']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['soft']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}
myTree: {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'no lenses'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'hard']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'hard']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['hard']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}}}
[['reduced', 'no lenses'], ['normal', 'no lenses']]
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': 'no lenses'}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': 'no lenses'}}}}
myTree: {'age': {'pre': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}, 'presbyopic': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': 'no lenses'}}}}}}
[['myope', 'no', 'reduced', 'no lenses'], ['myope', 'no', 'normal', 'soft'], ['myope', 'yes', 'reduced', 'no lenses'], ['myope', 'yes', 'normal', 'hard'], ['hyper', 'no', 'reduced', 'no lenses'], ['hyper', 'no', 'normal', 'soft'], ['hyper', 'yes', 'reduced', 'no lenses'], ['hyper', 'yes', 'normal', 'hard']]
bestFeat: 0
bestFeatLabel: prescript
featValues: ['myope', 'myope', 'myope', 'myope', 'hyper', 'hyper', 'hyper', 'hyper']
uniqueVals: {'hyper', 'myope'}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'soft'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'hard']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'hard']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['hard']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}}}
[['reduced', 'no lenses'], ['normal', 'soft']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['soft']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}
[['no', 'reduced', 'no lenses'], ['no', 'normal', 'soft'], ['yes', 'reduced', 'no lenses'], ['yes', 'normal', 'hard']]
bestFeat: 0
bestFeatLabel: astigmatic
featValues: ['no', 'no', 'yes', 'yes']
uniqueVals: {'yes', 'no'}
[['reduced', 'no lenses'], ['normal', 'hard']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['hard']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}}}
[['reduced', 'no lenses'], ['normal', 'soft']]
bestFeat: 0
bestFeatLabel: tearRate
featValues: ['reduced', 'normal']
uniqueVals: {'reduced', 'normal'}
[['no lenses']]
myTree: {'tearRate': {'reduced': 'no lenses'}}
[['soft']]
myTree: {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}
myTree: {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}
myTree: {'prescript': {'hyper': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}
myTree: {'age': {'pre': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}, 'presbyopic': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses', 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': 'no lenses'}}}}, 'young': {'prescript': {'hyper': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}, 'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses', 'normal': 'hard'}}, 'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}}}

{'age': {'pre': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses',
      'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}},
    'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses',
        'normal': 'hard'}},
      'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}},
  'presbyopic': {'prescript': {'hyper': {'astigmatic': {'yes': 'no lenses',
      'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}},
    'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses',
        'normal': 'hard'}},
      'no': 'no lenses'}}}},
  'young': {'prescript': {'hyper': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses',
        'normal': 'hard'}},
      'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}},
    'myope': {'astigmatic': {'yes': {'tearRate': {'reduced': 'no lenses',
        'normal': 'hard'}},
      'no': {'tearRate': {'reduced': 'no lenses', 'normal': 'soft'}}}}}}}}

createPlot(lenseTree)

c: (0.5, 1.0)
d: 0.75
c: (0.16666666666666666, 0.75)
d: 0.5
c: (0.07142857142857142, 0.5)
d: 0.25
e: 0.023809523809523808
c: (0.09523809523809523, 0.25)
d: 0.0
e: 0.07142857142857142
e: 0.11904761904761904
f: 0.25
f: 0.5
c: (0.23809523809523808, 0.5)
d: 0.25
c: (0.19047619047619047, 0.25)
d: 0.0
e: 0.16666666666666666
e: 0.21428571428571427
f: 0.25
c: (0.2857142857142857, 0.25)
d: 0.0
e: 0.26190476190476186
e: 0.3095238095238095
f: 0.25
f: 0.5
f: 0.75
c: (0.47619047619047616, 0.75)
d: 0.5
c: (0.4047619047619047, 0.5)
d: 0.25
e: 0.3571428571428571
c: (0.4285714285714285, 0.25)
d: 0.0
e: 0.4047619047619047
e: 0.45238095238095233
f: 0.25
f: 0.5
c: (0.5476190476190476, 0.5)
d: 0.25
c: (0.5238095238095237, 0.25)
d: 0.0
e: 0.49999999999999994
e: 0.5476190476190476
f: 0.25
e: 0.5952380952380951
f: 0.5
f: 0.75
c: (0.8095238095238094, 0.75)
d: 0.5
c: (0.7142857142857142, 0.5)
d: 0.25
c: (0.6666666666666665, 0.25)
d: 0.0
e: 0.6428571428571428
e: 0.6904761904761905
f: 0.25
c: (0.7619047619047619, 0.25)
d: 0.0
e: 0.7380952380952381
e: 0.7857142857142858
f: 0.25
f: 0.5
c: (0.9047619047619049, 0.5)
d: 0.25
c: (0.8571428571428572, 0.25)
d: 0.0
e: 0.8333333333333335
e: 0.8809523809523812
f: 0.25
c: (0.9523809523809526, 0.25)
d: 0.0
e: 0.9285714285714288
e: 0.9761904761904765
f: 0.25
f: 0.5
f: 0.75
f: 1.0

你可能感兴趣的:(机器学习)

机器学习是怎么一步一步由神经网络发展到今天的Transformer架构的？ yuanpan 机器学习神经网络 transformer
机器学习和神经网络的发展经历了一系列重要的架构和技术阶段。以下是更全面的总结，涵盖了从早期神经网络到卷积神经网络之前的架构演变：1.早期神经网络：感知机（Perceptron）时间：1950年代末至1960年代。背景：感知机由FrankRosenblatt提出，是第一个具有学习能力的神经网络模型。它由单层神经元组成，可以用于简单的二分类任务。特点：输入层和输出层之间直接连接，没有隐藏层。使用简单的
奇异值分解（SVD）文弱_书生乱七八糟神经网络人工智能
奇异值分解(SVD)介绍奇异值分解(SVD)，这是最强大的矩阵分解技术之一。SVD广泛应用于机器学习、数据科学和其他计算领域，用于降维、降噪和矩阵近似等应用。与仅适用于方阵的特征分解不同，SVD可以应用于任何矩阵，使其成为一种多功能工具。在这里煮啵将分解SVD背后的理论，通过手动计算示例进行分析，并展示如何在Python中实现SVD。在本节结束时，您将清楚地了解SVD的强大功能及其在机器学习中的应
yum install locate出现Error: Unable to find match: locate解决方案爱编程的喵喵 Linux解决方案 linux locate yum 解决方案
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了yuminstalllocate出现
【人工智能机器学习基础篇】——深入详解无监督学习之降维：PCA与t-SNE的关键概念与核心原理猿享天开人工智能数学基础专讲人工智能机器学习无监督学习降维
深入详解无监督学习之降维：PCA与t-SNE的关键概念与核心原理在当今数据驱动的世界中，数据维度的增多带来了计算复杂性和存储挑战，同时也可能导致模型性能下降，这一现象被称为“维度诅咒”（CurseofDimensionality）。降维作为一种重要的特征提取和数据预处理技术，旨在通过减少数据的维度，保留其主要信息，从而简化数据处理过程，并提升模型的性能。本文将深入探讨两种广泛应用于无监督学习中的降
Flink启动任务 swg321321 flink 大数据
Flink以本地运行作为解读例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录Flink前言StreamExecutionEnvironmentLocalExecutorMiniClusterStreamGraph二、使用步骤1.引入库2.读入数据总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发
计算机专业毕业设计题目推荐（新颖选题）本科计算机人工智能专业相关毕业设计选题大全✅ 会写代码的羊毕设选题课程设计人工智能毕业设计毕设题目毕业设计题目 ai AI编程
文章目录前言最新毕设选题（建议收藏起来）本科计算机人工智能专业相关的毕业设计选题毕设作品推荐前言2025全新毕业设计项目博主介绍：✌全网粉丝10W+,CSDN全栈领域优质创作者，博客之星、掘金/华为云/阿里云等平台优质作者。技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、大数据、机器学习等设计与开发。主要内容：免费功能
【机器学习】建模流程 CH3_CH2_CHO 什么？！是机器学习！！机器学习人工智能线性回归逻辑回归
1、数据获取1.1来源数据获取是机器学习建模的第一步，常见的数据来源包括数据库、API、网络爬虫等。数据库是企业内部常见的数据存储方式，例如：MySQL、Oracle等关系型数据库，以及MongoDB等非关系型数据库，它们能够存储大量的结构化和非结构化数据API（应用程序编程接口）提供了从外部获取数据的便捷方式，例如：社交媒体平台的API可以获取用户发布的内容和互动信息网络爬虫则适用于从网页中提取
机器学习课堂4线性回归模型+特征缩放木尘152132 机器学习线性回归 python
一、实验2-2，线性回归模型，计算模型在训练数据集和测试数据集上的均方根误差代码：#2-2线性回归模型importpandasaspdimportnumpyasnpimportmatplotlib.pyplotasplt#参数设置iterations=3000#迭代次数learning_rate=0.0001#学习率m_train=3000#训练样本的数量flag_plot_lines=False
【机器学习】模型拟合 CH3_CH2_CHO 什么？！是机器学习！！机器学习人工智能欠拟合过拟合
1、欠拟合1.1现象欠拟合是机器学习和统计建模中的一种常见问题，表现为模型无法充分捕捉数据中的潜在规律和模式。无论是训练数据还是测试数据，模型的预测误差都居高不下。在实际应用中，欠拟合的模型往往显得过于简单和粗糙，无法对数据进行有效的拟合和描述。1.2原因模型过于简单是导致欠拟合的主要原因：例如，使用直线去拟合具有明显曲线趋势的数据，或者使用低阶多项式去拟合高阶的复杂函数关系。这种情况下，模型的表
基于Python的智能决策支持系统：实现智能化决策的关键要素 AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型自然语言处理人工智能语言模型编程实践开发语言架构设计
文章目录基于Python的智能决策支持系统：实现智能化决策的关键要素11.背景介绍2.核心概念与联系数据收集与预处理模型构建与训练决策规则生成与优化决策结果评估与反馈3.核心算法原理具体操作步骤数据挖掘算法机器学习算法优化算法4.数学模型和公式详细讲解举例说明线性回归模型最小二乘法5.项目实践：代码实例和详细解释说明6.实际应用场景金融领域医疗领域供应链管理智能制造7.工具和资源推荐编程语言和开发
下一代模型技术演进与场景应用突破智能计算研究中心其他
内容概要当前模型技术正经历多维度的范式跃迁，可解释性模型与自动化机器学习（AutoML）成为突破传统黑箱困境的核心路径。在底层架构层面，边缘计算与量子计算的融合重构了算力分配模式，联邦学习技术则为跨域数据协作提供了安全可信的解决方案。主流框架如TensorFlow和PyTorch持续迭代优化能力，通过动态参数压缩与自适应超参数调优策略，显著提升模型部署效率。应用层创新呈现垂直化特征，医疗诊断模型通
TypeScript语言的计算机视觉苏墨瀚包罗万象 golang 开发语言后端
使用TypeScript进行计算机视觉：一个现代化的探索引言随着人工智能和机器学习的快速发展，计算机视觉（ComputerVision）成为了一个极具活力的研究领域。计算机视觉旨在使计算机能够“看”和“理解”数字图像或视频中的内容。近年来，TypeScript作为一种现代化的编程语言，因其类型安全和更好的开发体验，逐渐在前端和后端开发中得到了广泛应用。本文将探讨如何使用TypeScript进行计算
人工智能之数学基础：数学对人工智能技术发展的作用每天五分钟玩转人工智能机器学习深度学习之数学基础人工智能深度学习机器学习神经网络自然语言处理数学
本文重点数学是人工智能技术发展的基础，它提供了人工智能技术所需的数学理论和算法，包括概率论、统计学、线性代数、微积分、图论等等。本文将从以下几个方面探讨数学对人工智能技术发展的作用。概率论和统计学概率论和统计学是人工智能技术中最为重要的数学分支之一。概率论和统计学的应用范围非常广泛，包括机器学习、数据挖掘、自然语言处理、计算机视觉等领域。在人工智能技术中，概率论和统计学主要用于处理不确定性的问题，
人工智能之数学基础：线性子空间每天五分钟玩转人工智能机器学习深度学习之数学基础人工智能深度学习线性代数线性子空间线性空间
本文重点在前面的课程中，我们学习了线性空间，本文我们我们在此基础上学习线性子空间。在应用中，线性子空间的概念被广泛应用于信号处理、机器学习、图像处理等领域。子空间的性质子空间是线性空间的一部分，它需要满足下面的性质：设V是数域F上的线性空间，W是V的一个非空子集。如果W对于V中的加法运算和数乘运算也构成F上的一个线性空间，则称W为V的线性子空间（或称向量子空间）。具体来说，设V是一个线性空间，W是
详解离线安装Python库爱编程的喵喵 Python基础课程 python 离线安装 requirements
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了详解离线安装Python库，希望能对
ESG证书：AI预测未来十年职场人的黄金入场券 ESG学习圈 pandas python django
当ChatGPT开始撰写ESG报告，当机器学习模型精准预测企业碳排放轨迹，一场由AI驱动的ESG革命正在颠覆传统可持续发展领域。根据彭博新能源财经预测，到2030年全球ESG资产管理规模将突破50万亿美元，而AI技术将成为撬动这个万亿级市场的核心杠杆。一、AI透视下的ESG黄金时代在微软开发的AI模型ESG-NOW系统中，通过分析全球4300家上市公司近十年的环境数据，成功预测2025年新能源行业
【Dive Into Stable Diffusion v3.5】1：开源项目正式发布——深入探索SDv3.5模型全参/LoRA/RLHF训练 Donvink 大模型 #AIGC stable diffusion AIGC 人工智能机器学习深度学习
目录1引言2项目简介3快速上手3.1下载代码3.2环境配置3.3项目结构3.4下载模型与数据集3.5运行指令3.6核心参数说明3.6.1通用参数3.6.2优化器/学习率3.6.3数据相关4结语1引言在人工智能和机器学习领域，生成模型的应用越来越广泛。StableDiffusion作为其中的佼佼者，因其强大的图像生成能力而备受关注。今天，我的开源项目DiveIntoStableDiffusionv3
知识库在意图识别中扮演着**数据支撑**和**语义理解辅助**的双重角色 PersistDZ 大数据与AI 人工智能
知识库在意图识别中扮演着数据支撑和语义理解辅助的双重角色，而训练智能客服的意图识别Agent需要结合知识库的结构化数据与机器学习技术。以下是详细解析：一、知识库在意图识别中的作用1.提供标注数据意图标签定义：知识库中存储了预先定义的意图分类体系（如“订单查询”“退换货”“投诉”等），为模型提供明确的训练目标。标注样本：知识库包含大量用户对话历史及其对应的意图标签，是训练监督学习模型的核心数据源。2
近期计算机领域的热点技术 0dayNu1L 云计算量子计算人工智能
随着科技的飞速发展，计算机领域的新技术、新趋势层出不穷。本文将探讨近期计算机领域的几个热点技术趋势，并对它们进行简要的分析和展望。一、人工智能与机器学习人工智能（AI）和机器学习（ML）是近年来计算机领域最为热门的话题之一。AI和ML技术已经广泛应用于图像识别、自然语言处理、智能推荐等领域，并取得了显著的成果。随着技术的不断进步，AI和ML将更深入地渗透到各个行业，为人类社会带来更多便利和效益。在
计算机专业毕业设计题目推荐（新颖选题）本科计算机科学专业相关毕业设计选题大全✅ 会写代码的羊毕设选题课程设计计算机网络毕设选题毕设系统毕设题目计算机科学专业
文章目录前言最新毕设选题（建议收藏起来）本科计算机科学专业相关的毕业设计选题毕设作品推荐前言2025全新毕业设计项目博主介绍：✌全网粉丝10W+,CSDN全栈领域优质创作者，博客之星、掘金/华为云/阿里云等平台优质作者。技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、大数据、机器学习等设计与开发。主要内容：免费功能设计
Linux安装Anaconda和Jupyter 硬水果糖人工智能 Linux linux jupyter 运维
一、了解Anaconda和Jupyter引言：Anaconda是一个流行的开源数据科学平台，广泛用于数据分析、机器学习、人工智能等领域。它是一个集成了大量科学计算和数据科学工具的Python和R编程语言环境。Anaconda的主要目标是简化数据科学和机器学习的开发流程，提供一个易于安装和管理的环境。而预装了大量常用的Python和R库，这些库涵盖了数据科学的各个方面，包括：数据分析：Pandas、
ChatGPT、DeepSeek、GIS与Python机器学习强强联合！地质灾害风险评估、易发性分析、信息化建库及灾后重建 WangYan2022 DeepSeek ChatGPT 地下水地质灾害 DeepSeek ChatGPT GIS 灾后重建
在地质灾害频繁肆虐的当下，精准开展风险评价刻不容缓。如今，一门极具创新性的教程震撼登场，它将ChatGPT、DeepSeek等前沿技术与GIS、Python以及机器学习深度交融，为学员打造出前所未有的学习体验，助力大家在地质灾害风险评价领域强势突围，一路领先。前沿技术融合，铸就智能学习核心动力教程最闪耀的亮点之一，便是大胆引入了ChatGPT和DeepSeek技术。它们恰似无所不能的“数据魔法师”
Hessian 矩阵是什么 ZhangJiQun&MXP 教学 2021 AI python 2024大模型以及算力矩阵线性代数算法人工智能机器学习
Hessian矩阵是什么目录Hessian矩阵是什么Hessian矩阵的性质及举例说明**1.对称性****2.正定性决定极值类型****特征值为2（正），因此原点(0,0)(0,0)(0,0)是极小值点。****3.牛顿法中的应用****4.特征值与曲率方向****5.机器学习中的实际意义**一、定义与公式二、实例分析Hessian矩阵是多元函数二阶偏导数构成的方阵，用于分析函数局部曲率、判断极
LoRA中黑塞矩阵、Fisher信息矩阵是什么 ZhangJiQun&MXP 教学 2021 论文 2024大模型以及算力矩阵机器学习人工智能 transformer 深度学习算法线性代数
LoRA中黑塞矩阵、Fisher信息矩阵是什么1.三者的核心概念黑塞矩阵（Hessian）二阶导数矩阵，用于优化问题中判断函数的凸性（如牛顿法），或计算参数更新方向（如拟牛顿法）。Fisher信息矩阵（FisherInformationMatrix,FIM）统计学中衡量参数估计的不确定性，反映数据中包含的关于参数的信息量。在机器学习中常用于自然梯度下降（NaturalGradientDescent
神经网络基础之正则化硬水果糖人工智能神经网络人工智能机器学习
引言：正则化（Regularization）是机器学习中一种用于防止模型过拟合技术。核心思想是通过在模型损失函数中添加一个惩罚项（PenaltyTerm），对模型的复杂度进行约束，从而提升模型在新数据上的泛化能力。一、正则化目的防止过拟合：当模型过于复杂（例如神经网络层数过多、参数过多）时，容易在训练数据上“记忆”噪声或细节，导致在测试数据上表现差。简化模型：正则化通过限制模型参数的大小或数量，迫
决策树算法全解析：从零基础到Titanic实战，一文搞定机器学习经典模型吴师兄大模型 0基础实现机器学习入门到精通算法机器学习决策树人工智能深度学习编程开发语言
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
图像处理篇---图像预处理 Ronin-Lotus 图像处理篇深度学习篇程序代码篇图像处理人工智能 opencv python 深度学习计算机视觉
文章目录前言一、通用目的1.1数据标准化目的实现1.2噪声抑制目的实现高斯滤波中值滤波双边滤波1.3尺寸统一化目的实现1.4数据增强目的实现1.5特征增强目的实现：边缘检测直方图均衡化锐化二、分领域预处理2.1传统机器学习（如SVM、随机森林）2.1.1特点2.1.2预处理重点灰度化二值化形态学操作特征工程2.2深度学习（如CNN、Transformer）2.2.1特点2.2.2预处理重点通道顺序
【大模型科普】AIGC技术发展与应用实践（一文读懂AIGC）人工智能
【专栏介绍】⌈⌈⌈人工智能与大模型应用⌋⌋⌋人工智能（AI）通过算法模拟人类智能，利用机器学习、深度学习等技术驱动医疗、金融等领域的智能化。大模型是千亿参数的深度神经网络（如ChatGPT），经海量数据训练后能完成文本生成、图像创作等复杂任务，显著提升效率，但面临算力消耗、数据偏见等挑战。当前正加速与教育、科研融合，未来需平衡技术创新与伦理风险，推动可持续发展。文章目录一、AIGC概述（一）什么是
【产品小白】什么是AI产品经理百事不可口y 产品经理的一步一步人工智能产品经理学习产品运营内容运营用户运营
一、AI产品经理的定义与角色定位AI产品经理是人工智能技术与商业应用之间的核心桥梁，负责将复杂的AI技术转化为满足市场需求的产品。需同时具备技术理解力、商业洞察力和用户思维，既要参与算法选型与数据建模，又要定义产品功能与市场策略，是贯穿产品全生命周期的关键角色。与传统互联网产品经理相比，AI产品经理的独特之处在于：技术深度参与：需理解机器学习、自然语言处理（NLP）、计算机视觉等技术原理，并参与数
数据增强：扩充数据集提升模型泛化能力 AI天才研究院计算 AI大模型企业级应用开发实战 ChatGPT 计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.1.数据增强的重要性在机器学习领域，模型的泛化能力至关重要。一个泛化能力强的模型能够在未见数据上表现良好，而过拟合的模型则会在训练数据上表现出色，但在新数据上表现糟糕。数据增强是一种有效提升模型泛化能力的技术，它通过对现有数据进行各种变换，人为地扩充数据集，从而增加训练数据的数量和多样性。1.2.数据增强的应用场景数据增强广泛应用于各种机器学习任务中，包括：图像识别:对图像进行旋转
Spring中@Value注解，需要注意的地方无量 spring bean @Value xml
Spring 3以后,支持@Value注解的方式获取properties文件中的配置值，简化了读取配置文件的复杂操作 1、在applicationContext.xml文件(或引用文件中)中配置properties文件 <bean id="appProperty" class="org.springframework.beans.fac
mongoDB 分片开窍的石头 mongodb
mongoDB的分片。要mongos查询数据时候先查询configsvr看数据在那台shard上，configsvr上边放的是metar信息，指的是那条数据在那个片上。由此可以看出mongo在做分片的时候咱们至少要有一个configsvr,和两个以上的shard（片）信息。第一步启动两台以上的mongo服务 &nb
OVER(PARTITION BY)函数用法 0624chenhong oracle
这篇写得很好，引自 http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html OVER(PARTITION BY)函数用法 2010年10月26日 OVER(PARTITION BY)函数介绍开窗函数 &nb
Android开发中，ADB server didn't ACK 解决方法一炮送你回车库 Android开发
首先通知：凡是安装360、豌豆荚、腾讯管家的全部卸载，然后再尝试。一直没搞明白这个问题咋出现的，但今天看到一个方法，搞定了！原来是豌豆荚占用了 5037 端口导致。参见原文章：一个豌豆荚引发的血案——关于ADB server didn't ACK的问题简单来讲，首先将Windows任务进程中的豌豆荚干掉，如果还是不行，再继续按下列步骤排查。 &nb
canvas中的像素绘制问题换个号韩国红果果 JavaScript canvas
pixl的绘制，1.如果绘制点正处于相邻像素交叉线，绘制x像素的线宽，则从交叉线分别向前向后绘制x/2个像素，如果x/2是整数，则刚好填满x个像素，如果是小数，则先把整数格填满，再去绘制剩下的小数部分，绘制时，是将小数部分的颜色用来除以一个像素的宽度，颜色会变淡。所以要用整数坐标来画的话（即绘制点正处于相邻像素交叉线时），线宽必须是2的整数倍。否则会出现不饱满的像素。 2.如果绘制点为一个像素的
编码乱码问题灵静志远 java jvm jsp 编码
1、JVM中单个字符占用的字节长度跟编码方式有关，而默认编码方式又跟平台是一一对应的或说平台决定了默认字符编码方式；2、对于单个字符：ISO-8859-1单字节编码，GBK双字节编码，UTF-8三字节编码；因此中文平台(中文平台默认字符集编码GBK)下一个中文字符占2个字节，而英文平台(英文平台默认字符集编码Cp1252(类似于ISO-8859-1))。 3、getBytes()、getByte
java 求几个月后的日期 darkranger calendar getinstance
Date plandate = planDate.toDate(); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Calendar cal = Calendar.getInstance(); cal.setTime(plandate); // 取得三个月后时间 cal.add(Calendar.M
数据库设计的三大范式（通俗易懂） aijuans 数据库复习
关系数据库中的关系必须满足一定的要求。满足不同程度要求的为不同范式。数据库的设计范式是数据库设计所需要满足的规范。只有理解数据库的设计范式，才能设计出高效率、优雅的数据库，否则可能会设计出错误的数据库. 目前，主要有六种范式：第一范式、第二范式、第三范式、BC范式、第四范式和第五范式。满足最低要求的叫第一范式，简称1NF。在第一范式基础上进一步满足一些要求的为第二范式，简称2NF。其余依此类推。
想学工作流怎么入手 atongyeye jbpm
工作流在工作中变得越来越重要，很多朋友想学工作流却不知如何入手。很多朋友习惯性的这看一点，那了解一点，既不系统，也容易半途而废。好比学武功，最好的办法是有一本武功秘籍。研究明白，则犹如打通任督二脉。系统学习工作流，很重要的一本书《JBPM工作流开发指南》。本人苦苦学习两个月，基本上可以解决大部分流程问题。整理一下学习思路，有兴趣的朋友可以参考下。 1 首先要
Context和SQLiteOpenHelper创建数据库百合不是茶 android Context创建数据库
一直以为安卓数据库的创建就是使用SQLiteOpenHelper创建,但是最近在android的一本书上看到了Context也可以创建数据库,下面我们一起分析这两种方式创建数据库的方式和区别,重点在SQLiteOpenHelper 一:SQLiteOpenHelper创建数据库: 1,SQLi
浅谈group by和distinct bijian1013 oracle 数据库 group by distinct
group by和distinct只了去重意义一样，但是group by应用范围更广泛些，如分组汇总或者从聚合函数里筛选数据等。譬如：统计每id数并且只显示数大于3 select id ,count(id) from ta
vi opertion 征客丶 mac opration vi
进入 command mode （命令行模式）按 esc 键再按 shift + 冒号注：以下命令中带 $ 【在命令行模式下进行】，不带 $ 【在非命令行模式下进行】一、文件操作 1.1、强制退出不保存 $ q! 1.2、保存 $ w 1.3、保存并退出 $ wq 1.4、刷新或重新加载已打开的文件 $ e 二、光标移动 2.1、跳到指定行数字
【Spark十四】深入Spark RDD第三部分RDD基本API bit1129 spark
对于K/V类型的RDD,如下操作是什么含义？ val rdd = sc.parallelize(List(("A",3),("C",6),("A",1),("B",5)) rdd.reduceByKey(_+_).collect reduceByKey在这里的操作，是把
java类加载机制 BlueSkator java 虚拟机
java类加载机制 1.java类加载器的树状结构引导类加载器 ^ | 扩展类加载器 ^ | 系统类加载器 java使用代理模式来完成类加载，java的类加载器也有类似于继承的关系，引导类是最顶层的加载器，它是所有类的根加载器，它负责加载java核心库。当一个类加载器接到装载类到虚拟机的请求时，通常会代理给父类加载器，若已经是根加载器了，就自己完成加载。虚拟机区分一个Cla
动态添加文本框 BreakingBad 文本框
<script> var num=1; function AddInput() { var str=""; str+="<input
读《研磨设计模式》-代码笔记-单例模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ public class Singleton { } /* * 懒汉模式。注意，getInstance如果在多线程环境中调用，需要加上synchronized，否则存在线程不安全问题 */ class LazySingleton
iOS应用打包发布常见问题 chenhbc ios iOS发布 iOS上传 iOS打包
这个月公司安排我一个人做iOS客户端开发，由于急着用，我先发布一个版本，由于第一次发布iOS应用，期间出了不少问题，记录于此。 1、使用Application Loader 发布时报错：Communication error.please use diagnostic mode to check connectivity.you need to have outbound acc
工作流复杂拓扑结构处理新思路 comsci 设计模式工作算法企业应用 OO
我们走的设计路线和国外的产品不太一样，不一样在哪里呢？国外的流程的设计思路是通过事先定义一整套规则(类似XPDL)来约束和控制流程图的复杂度(我对国外的产品了解不够多，仅仅是在有限的了解程度上面提出这样的看法)，从而避免在流程引擎中处理这些复杂的图的问题，而我们却没有通过事先定义这样的复杂的规则来约束和降低用户自定义流程图的灵活性，这样一来，在引擎和流程流转控制这一个层面就会遇到很
oracle 11g新特性Flashback data archive daizj oracle
1. 什么是flashback data archive Flashback data archive是oracle 11g中引入的一个新特性。Flashback archive是一个新的数据库对象，用于存储一个或多表的历史数据。Flashback archive是一个逻辑对象，概念上类似于表空间。实际上flashback archive可以看作是存储一个或多个表的所有事务变化的逻辑空间。
多叉树:2-3-4树 dieslrae 树
平衡树多叉树,每个节点最多有4个子节点和3个数据项,2,3,4的含义是指一个节点可能含有的子节点的个数,效率比红黑树稍差.一般不允许出现重复关键字值.2-3-4树有以下特征: 1、有一个数据项的节点总是有2个子节点(称为2-节点) 2、有两个数据项的节点总是有3个子节点(称为3-节
C语言学习七动态分配 malloc的使用 dcj3sjt126com c language malloc
/* 2013年3月15日15:16:24 malloc 就memory(内存) allocate(分配)的缩写本程序没有实际含义，只是理解使用 */ # include <stdio.h> # include <malloc.h> int main(void) { int i = 5; //分配了4个字节静态分配 int * p
Objective-C编码规范[译] dcj3sjt126com 代码规范
原文链接 : The official raywenderlich.com Objective-C style guide 原文作者 : raywenderlich.com Team 译文出自 : raywenderlich.com Objective-C编码规范译者 : Sam Lau
0.性能优化-目录 frank1234 性能优化
从今天开始笔者陆续发表一些性能测试相关的文章，主要是对自己前段时间学习的总结，由于水平有限，性能测试领域很深，本人理解的也比较浅，欢迎各位大咖批评指正。主要内容包括：一、性能测试指标吞吐量、TPS、响应时间、负载、可扩展性、PV、思考时间 http://frank1234.iteye.com/blog/2180305 二、性能测试策略生产环境相同基准测试预热等 htt
Java父类取得子类传递的泛型参数Class类型 happyqing java 泛型父类子类 Class
import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import org.junit.Test; abstract class BaseDao<T> { public void getType() { //Class<E> clazz =
跟我学SpringMVC目录汇总贴、PDF下载、源码下载 jinnianshilongnian springMVC
----广告-------------------------------------------------------------- 网站核心商详页开发掌握Java技术，掌握并发/异步工具使用，熟悉spring、ibatis框架；掌握数据库技术，表设计和索引优化，分库分表/读写分离；了解缓存技术，熟练使用如Redis/Memcached等主流技术；了解Ngin
the HTTP rewrite module requires the PCRE library 流浪鱼 rewrite
./configure: error: the HTTP rewrite module requires the PCRE library. 模块依赖性Nginx需要依赖下面3个包 1. gzip 模块需要 zlib 库 ( 下载: http://www.zlib.net/ ) 2. rewrite 模块需要 pcre 库 ( 下载: http://www.pcre.org/ ) 3. s
第12章 Ajax（中） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Optimize query with Query Stripping in Web Intelligence blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Optimize+query+with+Query+Stripping+in+Web+Intelligence and a very straightfoward video http://www.sdn.sap.com/irj/scn/events?rid=/library/uuid/40ec3a0c-936
Java开发者写SQL时常犯的10个错误 tomcat_oracle java sql
1、不用PreparedStatements 　　有意思的是，在JDBC出现了许多年后的今天，这个错误依然出现在博客、论坛和邮件列表中，即便要记住和理解它是一件很简单的事。开发者不使用PreparedStatements的原因可能有如下几个：　　他们对PreparedStatements不了解　　他们认为使用PreparedStatements太慢了　　他们认为写Prepar
世纪互联与结盟有感阿尔萨斯
10月10日，世纪互联与（Foxcon）签约成立合资公司，有感。全球电子制造业巨头（全球500强企业）与世纪互联共同看好IDC、云计算等业务在中国的增长空间，双方迅速果断出手，在资本层面上达成合作，此举体现了全球电子制造业巨头对世纪互联IDC业务的欣赏与信任，另一方面反映出世纪互联目前良好的运营状况与广阔的发展前景。众所周知，精于电子产品制造（世界第一），对于世纪互联而言，能够与结盟

机器学习实战 决策树(附数据集)