python直方图的拟合_在Python中将曲线拟合为直方图

从^{}的文档中:Returns

n : array or list of arrays

The values of the histogram bins. See normed and weights for a description of the possible semantics. If input x is an array, then this is an array of length nbins. If input is a sequence arrays [data1, data2,..], then this is a list of arrays with the values of the histograms for each of the arrays in the same order.

bins : array

The edges of the bins. Length nbins + 1 (nbins left edges and right edge of last bin). Always a single array even when multiple data sets are passed in.

patches : list or list of lists

Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.

正如您所看到的,第二个返回实际上是容器的边缘,因此它包含的项目比容器多一个。

获得垃圾箱中心的最简单方法是:import numpy as np

bin_center = bin_borders[:-1] + np.diff(bin_borders) / 2

它只是将两个边框之间的一半宽度(带^{})添加到左边的边框(箱子的宽度)。排除最后一个bin边框,因为它是最右边bin的右边框。

所以这实际上会返回bin中心-一个与n长度相同的数组。

请注意,如果有numba,则可以加快边界到中心的计算:import numba as nb

@nb.njit

def centers_from_borders_numba(b):

centers = np.empty(b.size - 1, np.float64)

for idx in range(b.size - 1):

centers[idx] = b[idx] + (b[idx+1] - b[idx]) / 2

return centers

def centers_from_borders(borders):

return borders[:-1] + np.diff(borders) / 2

速度要快得多:bins = np.random.random(100000)

bins.sort()

# Make sure they are identical

np.testing.assert_array_equal(centers_from_borders_numba(bins), centers_from_borders(bins))

# Compare the timings

%timeit centers_from_borders_numba(bins)

# 36.9 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit centers_from_borders(bins)

# 150 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

即使速度更快,numba也是一个很重的依赖,你不能轻易地添加。不过,玩起来很有趣,而且很快,但在下面我将使用NumPy版本,因为它将对大多数未来的访问者更有帮助。

至于将函数拟合到直方图的一般任务:需要定义一个函数来拟合数据,然后可以使用^{}。例如,如果要拟合高斯曲线:import numpy as np

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

然后定义要拟合的函数和一些示例数据集。示例数据集正是为了解决此问题,您应该使用数据集并定义要适合的函数:def gaussian(x, mean, amplitude, standard_deviation):

return amplitude * np.exp( - ((x - mean) / standard_deviation) ** 2)

x = np.random.normal(10, 5, size=10000)

拟合曲线并绘制:bin_heights, bin_borders, _ = plt.hist(x, bins='auto', label='histogram')

bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit')

plt.legend()

请注意,也可以使用NumPys^{}和Matplotlibs^{}-plot。区别在于np.histogram不返回“patches”数组,您需要Matplotlibs bar plot的bin宽度:bin_heights, bin_borders = np.histogram(x, bins='auto')

bin_widths = np.diff(bin_borders)

bin_centers = bin_borders[:-1] + bin_widths / 2

popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')

plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit', c='red')

plt.legend()

当然,您也可以将其他函数拟合到直方图中。我通常喜欢Astropysmodels for fitting,因为您不需要自己创建函数,而且它还支持复合模型和不同的fitter。

例如,要使用Astropy将高斯曲线拟合到数据集:from astropy.modeling import models, fitting

bin_heights, bin_borders = np.histogram(x, bins='auto')

bin_widths = np.diff(bin_borders)

bin_centers = bin_borders[:-1] + bin_widths / 2

t_init = models.Gaussian1D()

fit_t = fitting.LevMarLSQFitter()

t = fit_t(t_init, bin_centers, bin_heights)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.figure()

plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')

plt.plot(x_interval_for_fit, t(x_interval_for_fit), label='fit', c='red')

plt.legend()

可以将不同的模型拟合到数据,然后只需替换:t_init = models.Gaussian1D()

换个型号。例如a ^{}(类似于高斯分布,但具有更宽的尾部):t_init = models.Lorentz1D()

从我的样本数据来看,这并不是一个很好的模型,但是如果已经有一个符合需要的天体模型,那么它就很容易使用了。

你可能感兴趣的:(python直方图的拟合)