利用Python进行采样的几种方式

1) random.sample(population,k):

    Chooses k unique random elements from a population sequence or set.
    
    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).
    
    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.
    
    To choose a sample in a range of integers, use range as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(range(10000000), 60)

2) numpy.random.choice(a,size=None,replace=None,p=None):

    Parameters
    -----------
    a : 1-D array-like or int
        If an ndarray, a random sample is generated from its elements.
        If an int, the random sample is generated as if a were np.arange(a)
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    replace : boolean, optional
        Whether the sample is with or without replacement
    p : 1-D array-like, optional
        The probabilities associated with each entry in a.
        If not given the sample assumes a uniform distribution over all
        entries in a.
    
    Returns
    --------
    samples : single item or ndarray
        The generated random samples
    
    Raises
    -------
    ValueError
        If a is an int and less than zero, if a or p are not 1-dimensional,
        if a is an array-like of size 0, if p is not a vector of
        probabilities, if a and p have different lengths, or if
        replace=False and the sample size is greater than the population
        size

3) pandas.DataFrame.sample(n=None,frac=None,replace=False,

                                                     weights=None,random_state=None,axis=None):

    Parameters
    ----------
    n : int, optional
        Number of items from axis to return. Cannot be used with `frac`.
        Default = 1 if `frac` = None.
    frac : float, optional
        Fraction of axis items to return. Cannot be used with `n`.
    replace : boolean, optional
        Sample with or without replacement. Default = False.
    weights : str or ndarray-like, optional
        Default 'None' results in equal probability weighting.
        If passed a Series, will align with target object on index. Index
        values in weights not found in sampled object will be ignored and
        index values in sampled object not in weights will be assigned
        weights of zero.
        If called on a DataFrame, will accept the name of a column
        when axis = 0.
        Unless weights are a Series, weights must be same length as axis
        being sampled.
        If weights do not sum to 1, they will be normalized to sum to 1.
        Missing values in the weights column will be treated as zero.
        inf and -inf values not allowed.
    random_state : int or numpy.random.RandomState, optional
        Seed for the random number generator (if int), or numpy RandomState
        object.
    axis : int or string, optional
        Axis to sample. Accepts axis number or name. Default is stat axis
        for given data type (0 for Series and DataFrames, 1 for Panels).
    
    Returns
    -------
    A new object of same type as caller.
    Examples
    --------
    Generate an example ``Series`` and ``DataFrame``:
    
    >>> s = pd.Series(np.random.randn(50))
    >>> s.head()
    0   -0.038497
    1    1.820773
    2   -0.972766
    3   -1.598270
    4   -1.095526
    dtype: float64
    >>> df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
    >>> df.head()
              A         B         C         D
    0  0.016443 -2.318952 -0.566372 -1.028078
    1 -1.051921  0.438836  0.658280 -0.175797
    2 -1.243569 -0.364626 -0.215065  0.057736
    3  1.768216  0.404512 -0.385604 -1.457834
    4  1.072446 -1.137172  0.314194 -0.046661
    
    Next extract a random sample from both of these objects...
    
    3 random elements from the ``Series``:
    
    >>> s.sample(n=3)
    27   -0.994689
    55   -1.049016
    67   -0.224565
    dtype: float64
    
    And a random 10% of the ``DataFrame`` with replacement:
    
    >>> df.sample(frac=0.1, replace=True)
               A         B         C         D
    35  1.981780  0.142106  1.817165 -0.290805
    49 -1.336199 -0.448634 -0.789640  0.217116
    40  0.823173 -0.078816  1.009536  1.015108
    15  1.421154 -0.055301 -1.922594 -0.019696
    6  -0.148339  0.832938  1.787600 -1.383767
    
    You can use `random state` for reproducibility:
    
    >>> df.sample(random_state=1)
    A         B         C         D
    37 -2.027662  0.103611  0.237496 -0.165867
    43 -0.259323 -0.583426  1.516140 -0.479118
    12 -1.686325 -0.579510  0.985195 -0.460286
    8   1.167946  0.429082  1.215742 -1.636041
    9   1.197475 -0.864188  1.554031 -1.505264

你可能感兴趣的:(Python,Python采样)