Python数据处理pandas、numpy等第三方库函数笔记(持续更新)

说明

因为在平时学习中,对于pandas、numpy等python库的一些函数用法时常忘记,特在此做个汇总与整理,便于下次查找~

numpy

  1. nonzero(a)
    :返回非零的数组下标
    举例:

    >>x = np.array([[1,0,0], [0,2,0], [1,1,0]])
    >>x
    array([[1, 0, 0],
    [0, 2, 0],
    [1, 1, 0]])
    >>np.nonzero(x)
    (array([0, 1, 2, 2], dtype=int64), array([0, 1, 0, 1], dtype=int64))

  2. in1d
    :Test whether each element of a 1-D array is also present in a second array.
    Returns a boolean array the same length as ar1 that is True where an element of ar1 is in ar2 and False otherwise.
    举例:

    >> test = np.array([0, 1, 2, 5, 0])
    >> states = [0, 2]
    >> mask = np.in1d(test, states)
    >> mask
    array([ True, False, True, False, True], dtype=bool)
    >> test[mask]
    array([0, 2, 0])
    >> mask = np.in1d(test, states, invert=True)
    >> mask
    array([False, True, False, True, False], dtype=bool)
    >> test[mask]
    array([1, 5])

train_test_split


  1. sklearn.model_selection.train_test_split(*arrays, **options)
    举例:

>> import numpy as np
>> from sklearn.model_selection import train_test_split
>> X, y = np.arange(10).reshape((5, 2)), range(5)
>> X
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>> list(y)
[0, 1, 2, 3, 4]
>> X_train, X_test, y_train, y_test = train_test_split(
… X, y, test_size=0.33, random_state=42)

>> X_train
array([[4, 5],
[0, 1],
[6, 7]])
>> y_train
[2, 0, 3]
>> X_test
array([[2, 3],
[8, 9]])
>> y_test
[1, 4]

GroupKFold


  1. sklearn.model_selection.GroupKFold(n_splits=3)
    举例:

>> from sklearn.model_selection import GroupKFold
>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
>> y = np.array([1, 2, 3, 4])
>> groups = np.array([0, 0, 2, 2])
>> group_kfold = GroupKFold(n_splits=2)
>> group_kfold.get_n_splits(X, y, groups)
>> print(group_kfold)
GroupKFold(n_splits=2)
>> for train_index, test_index in group_kfold.split(X, y, groups):
… print(“TRAIN:”, train_index, “TEST:”, test_index)
… X_train, X_test = X[train_index], X[test_index]
… y_train, y_test = y[train_index], y[test_index]
… print(X_train, X_test, y_train, y_test)

TRAIN: [0 1] TEST: [2 3]
[[1 2]
[3 4]] [[5 6]
[7 8]] [1 2] [3 4]
TRAIN: [2 3] TEST: [0 1]
[[5 6]
[7 8]] [[1 2]
[3 4]] [3 4] [1 2]

shuffle()

返回随机排序后的序列。
numpy.random.shuffle(x)
Examples

>> arr = np.arange(10)
>> np.random.shuffle(arr)
>> arr
[1 7 5 2 9 4 3 6 0 8]
Multi-dimensional arrays are only shuffled along the first axis:

>> arr = np.arange(9).reshape((3, 3))
>> np.random.shuffle(arr)
>> arr
array([[3, 4, 5],
[6, 7, 8],
[0, 1, 2]])

array、asarray

array和asarray都可以将结构数据转化为ndarray,但是主要区别就是当数据源是ndarray时,array仍然会copy出一个副本,占用新的内存,但asarray不会。所以如下:
Python数据处理pandas、numpy等第三方库函数笔记(持续更新)_第1张图片

range()、arange()

range生成一个序列,arange生成一个ndarray

numpy.random.rand(m,n)

返回m*n维的数组,数值取件(0,1)

你可能感兴趣的:(机器学习,python,numpy,pandas)