sklearn.cluster.estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0, n_jobs=1)
字面意思:预估带宽
Estimate the bandwidth to use with the mean-shift algorithm.
预估带宽,用在mean-shift算法中
That this function takes time at least quadratic in n_samples. For large datasets, it’s wise to set that parameter to a small value.
这个算法花费的时间和样本数的2次方成正比。
对于大的数据集,最好把参数设小一点。
Parameters:
参数
X : array-like, shape=[n_samples, n_features]
X: 数组,格式[n个样本,n个特征]
传入的是这样的参数
>>> X
array([[ 0.5337214 , -0.32436143],
[ 0.9196253 , -0.14691451],
[-0.92399022, -0.74531192],
...,
[-1.07208459, -0.82480682],
[ 0.82008655, 1.87431013],
[-1.57024603, -2.00017509]])
quantile : float, default 0.3
should be between [0, 1] 0.5 means that the median of all pairwise distances is used.
quantile:浮点数,默认0.3
范围要在01之间,0.5意味着成对使用的距离大小是中等的
n_samples : int, optional
The number of samples to use. If not given, all samples are used.
使用的样本数,不指定,就使用所有的样本
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
字面意思:随机状态
参数格式:整型,实例??,没有,默认是没有
如果是整型,这个整数会被用于随机数生成,猜测相同的整数生成的是相同的
如果是实例,就用实例生成的结果??
如果没有,就用系统指定的实例
n_jobs : int, optional (default = 1)
The number of parallel jobs to run for neighbors search. If -1, then the number of jobs is set to the number of CPU cores.
临近点搜索的并行任务个数,如果是-1,并行数就是cpu数
相当于线程数??
Returns:
返回值
bandwidth : float
浮点型