基本采样算法及Python实现

本文代码见于 code

时不时地我们会在机器学习乃至深度神经网络中瞥见采样算法的身影。本文只关键简单的入门级的采样算法及其Python实现。

  • use more samples for the more complicated model

单纯随机抽样(simple random sampling)

将调查总体全部观察单位进行编号,再用抽签法随机抽取(不放回)部分观察单位组成样本。( Ckn

  • 优点:操作简单

  • 缺点:总体较大时,难以编号。


import random

def loadDataset(filename):
    dataMat = []
    with open(filename) as fp:
        for line in fp.readlines():
            curLine = line.strip().split('\t')
            dataMat.append(curLine)
    return dataMat
def simple_sampling(dataMat, num):
    try:
        samples = random.sample(dataMat, num)
        return samples
    except:
        print('sample larger than population')

系统抽样(systematic sampling)

又称机械抽样、等距抽样,即先将总体的观察单位按某一顺序号分成 n 个部分,依次用相等间距,从每一部分个抽取一个观察单位组成样本。

  • 优点:易于理解,简便易行

  • 缺点:总体有周期或增减趋势时,易产生偏性。(抽样之前,先shuffle?)

def systematic_sampling(dataMat, num):
    k = int(len(dataMat)/num)
    samples = [random.sample(dataMat[i*k:(i+1)*k], 1) for i in range(num)]
    return samples

有放回随机抽样(Repetition Random Sampling)

def repretition_random_sampling(dataMat, num):
    samples = [random_sampling(dataMat, 1) for i in range(num)]
    return samples

测试

数据集:

a   1
b   2
c   3
d   4
e   5
f   6

客户端代码:

def main():
    dataMat = loadDataSet('./data.txt')
    # print(random_sampling(dataSet, 3))
    # random_sampling(dataSet, 7)
    # print(systematic_sampling(dataMat, 2))
    print(repetition_random_sampling(dataMat, 3))
if __name__ == '__main__':
    main()

references

[1]采样算法的简单实现

你可能感兴趣的:(基本采样算法及Python实现)