skewness = skewness[abs(skewness) > 0.75]
print("There are {} skewed numerical features to Box Cox transform".format(skewness.shape[0]))
from scipy.special import boxcox1p
skewed_features = skewness.index
lam = 0.15
for feat in skewed_features:
#all_data[feat] += 1
all_data[feat] = boxcox1p(all_data[feat], lam)
help(boxcox1p)
Help on ufunc object:
boxcox1p = class ufunc(builtins.object)
| Functions that operate element by element on whole arrays.
|
| To see the documentation for a specific ufunc, use `info`. For
| example, ``np.info(np.sin)``. Because ufuncs are written in C
| (for speed) and linked into Python with NumPy's ufunc facility,
| Python's help() function finds this page whenever help() is called
| on a ufunc.
|
| A detailed explanation of ufuncs can be found in the docs for :ref:`ufuncs`.
|
| Calling ufuncs:
| ===============
|
| op(*x[, out], where=True, **kwargs)
| Apply `op` to the arguments `*x` elementwise, broadcasting the arguments.
|
| The broadcasting rules are:
|
| * Dimensions of length 1 may be prepended to either array.
| * Arrays may be repeated along dimensions of length 1.
|
| Parameters
| ----------
| *x : array_like
| Input arrays.
| out : ndarray, None, or tuple of ndarray and None, optional
| Alternate array object(s) in which to put the result; if provided, it
| must have a shape that the inputs broadcast to. A tuple of arrays
| (possible only as a keyword argument) must have length equal to the
| number of outputs; use `None` for outputs to be allocated by the ufunc.
| where : array_like, optional
| Values of True indicate to calculate the ufunc at that position, values
| of False indicate to leave the value in the output alone.
| **kwargs
| For other keyword-only arguments, see the :ref:`ufunc docs `.
|
| Returns
| -------
| r : ndarray or tuple of ndarray
| `r` will have the shape that the arrays in `x` broadcast to; if `out` is
| provided, `r` will be equal to `out`. If the function has more than one
| output, then the result will be a tuple of arrays.
|
| Methods defined here:
|
| __call__(self, /, *args, **kwargs)
| Call self as a function.
|
| __repr__(self, /)
| Return repr(self).
|
| __str__(self, /)
| Return str(self).
|
| accumulate(...)
| accumulate(array, axis=0, dtype=None, out=None, keepdims=None)
|
| Accumulate the result of applying the operator to all elements.
|
| For a one-dimensional array, accumulate produces results equivalent to::
|
| r = np.empty(len(A))
| t = op.identity # op = the ufunc being applied to A's elements
| for i in range(len(A)):
| t = op(t, A[i])
| r[i] = t
| return r
|
| For example, add.accumulate() is equivalent to np.cumsum().
|
| For a multi-dimensional array, accumulate is applied along only one
| axis (axis zero by default; see Examples below) so repeated use is
| necessary if one wants to accumulate over multiple axes.
|
| Parameters
| ----------
| array : array_like
| The array to act on.
| axis : int, optional
| The axis along which to apply the accumulation; default is zero.
| dtype : data-type code, optional
| The data-type used to represent the intermediate results. Defaults
| to the data-type of the output array if such is provided, or the
| the data-type of the input array if no output array is provided.
| out : ndarray, None, or tuple of ndarray and None, optional
| A location into which the result is stored. If not provided or `None`,
| a freshly-allocated array is returned. For consistency with
| :ref:`ufunc.__call__`, if given as a keyword, this may be wrapped in a
| 1-element tuple.
|
| .. versionchanged:: 1.13.0
| Tuples are allowed for keyword argument.
| keepdims : bool
| Has no effect. Deprecated, and will be removed in future.
|
| Returns
| -------
| r : ndarray
| The accumulated values. If `out` was supplied, `r` is a reference to
| `out`.
|
| Examples
| --------
| 1-D array examples:
|
| >>> np.add.accumulate([2, 3, 5])
| array([ 2, 5, 10])
| >>> np.multiply.accumulate([2, 3, 5])
| array([ 2, 6, 30])
|
| 2-D array examples:
|
| >>> I = np.eye(2)
| >>> I
| array([[ 1., 0.],
| [ 0., 1.]])
|
| Accumulate along axis 0 (rows), down columns:
|
| >>> np.add.accumulate(I, 0)
| array([[ 1., 0.],
| [ 1., 1.]])
| >>> np.add.accumulate(I) # no axis specified = axis zero
| array([[ 1., 0.],
| [ 1., 1.]])
|
| Accumulate along axis 1 (columns), through rows:
|
| >>> np.add.accumulate(I, 1)
| array([[ 1., 1.],
| [ 0., 1.]])
|
| at(...)
| at(a, indices, b=None)
|
| Performs unbuffered in place operation on operand 'a' for elements
| specified by 'indices'. For addition ufunc, this method is equivalent to
| `a[indices] += b`, except that results are accumulated for elements that
| are indexed more than once. For example, `a[[0,0]] += 1` will only
| increment the first element once because of buffering, whereas
| `add.at(a, [0,0], 1)` will increment the first element twice.
|
| .. versionadded:: 1.8.0
|
| Parameters
| ----------
| a : array_like
| The array to perform in place operation on.
| indices : array_like or tuple
| Array like index object or slice object for indexing into first
| operand. If first operand has multiple dimensions, indices can be a
| tuple of array like index objects or slice objects.
| b : array_like
| Second operand for ufuncs requiring two operands. Operand must be
| broadcastable over first operand after indexing or slicing.
|
| Examples
| --------
| Set items 0 and 1 to their negative values:
|
| >>> a = np.array([1, 2, 3, 4])
| >>> np.negative.at(a, [0, 1])
| >>> print(a)
| array([-1, -2, 3, 4])
|
| ::
|
| Increment items 0 and 1, and increment item 2 twice:
|
| >>> a = np.array([1, 2, 3, 4])
| >>> np.add.at(a, [0, 1, 2, 2], 1)
| >>> print(a)
| array([2, 3, 5, 4])
|
| ::
|
| Add items 0 and 1 in first array to second array,
| and store results in first array:
|
| >>> a = np.array([1, 2, 3, 4])
| >>> b = np.array([1, 2])
| >>> np.add.at(a, [0, 1], b)
| >>> print(a)
| array([2, 4, 3, 4])
|
| outer(...)
| outer(A, B, **kwargs)
|
| Apply the ufunc `op` to all pairs (a, b) with a in `A` and b in `B`.
|
| Let ``M = A.ndim``, ``N = B.ndim``. Then the result, `C`, of
| ``op.outer(A, B)`` is an array of dimension M + N such that:
|
| .. math:: C[i_0, ..., i_{M-1}, j_0, ..., j_{N-1}] =
| op(A[i_0, ..., i_{M-1}], B[j_0, ..., j_{N-1}])
|
| For `A` and `B` one-dimensional, this is equivalent to::
|
| r = empty(len(A),len(B))
| for i in range(len(A)):
| for j in range(len(B)):
| r[i,j] = op(A[i], B[j]) # op = ufunc in question
|
| Parameters
| ----------
| A : array_like
| First array
| B : array_like
| Second array
| kwargs : any
| Arguments to pass on to the ufunc. Typically `dtype` or `out`.
|
| Returns
| -------
| r : ndarray
| Output array
|
| See Also
| --------
| numpy.outer
|
| Examples
| --------
| >>> np.multiply.outer([1, 2, 3], [4, 5, 6])
| array([[ 4, 5, 6],
| [ 8, 10, 12],
| [12, 15, 18]])
|
| A multi-dimensional example:
|
| >>> A = np.array([[1, 2, 3], [4, 5, 6]])
| >>> A.shape
| (2, 3)
| >>> B = np.array([[1, 2, 3, 4]])
| >>> B.shape
| (1, 4)
| >>> C = np.multiply.outer(A, B)
| >>> C.shape; C
| (2, 3, 1, 4)
| array([[[[ 1, 2, 3, 4]],
| [[ 2, 4, 6, 8]],
| [[ 3, 6, 9, 12]]],
| [[[ 4, 8, 12, 16]],
| [[ 5, 10, 15, 20]],
| [[ 6, 12, 18, 24]]]])
|
| reduce(...)
| reduce(a, axis=0, dtype=None, out=None, keepdims=False)
|
| Reduces `a`'s dimension by one, by applying ufunc along one axis.
|
| Let :math:`a.shape = (N_0, ..., N_i, ..., N_{M-1})`. Then
| :math:`ufunc.reduce(a, axis=i)[k_0, ..,k_{i-1}, k_{i+1}, .., k_{M-1}]` =
| the result of iterating `j` over :math:`range(N_i)`, cumulatively applying
| ufunc to each :math:`a[k_0, ..,k_{i-1}, j, k_{i+1}, .., k_{M-1}]`.
| For a one-dimensional array, reduce produces results equivalent to:
| ::
|
| r = op.identity # op = ufunc
| for i in range(len(A)):
| r = op(r, A[i])
| return r
|
| For example, add.reduce() is equivalent to sum().
|
| Parameters
| ----------
| a : array_like
| The array to act on.
| axis : None or int or tuple of ints, optional
| Axis or axes along which a reduction is performed.
| The default (`axis` = 0) is perform a reduction over the first
| dimension of the input array. `axis` may be negative, in
| which case it counts from the last to the first axis.
|
| .. versionadded:: 1.7.0
|
| If this is `None`, a reduction is performed over all the axes.
| If this is a tuple of ints, a reduction is performed on multiple
| axes, instead of a single axis or all the axes as before.
|
| For operations which are either not commutative or not associative,
| doing a reduction over multiple axes is not well-defined. The
| ufuncs do not currently raise an exception in this case, but will
| likely do so in the future.
| dtype : data-type code, optional
| The type used to represent the intermediate results. Defaults
| to the data-type of the output array if this is provided, or
| the data-type of the input array if no output array is provided.
| out : ndarray, None, or tuple of ndarray and None, optional
| A location into which the result is stored. If not provided or `None`,
| a freshly-allocated array is returned. For consistency with
| :ref:`ufunc.__call__`, if given as a keyword, this may be wrapped in a
| 1-element tuple.
|
| .. versionchanged:: 1.13.0
| Tuples are allowed for keyword argument.
| keepdims : bool, optional
| If this is set to True, the axes which are reduced are left
| in the result as dimensions with size one. With this option,
| the result will broadcast correctly against the original `arr`.
|
| .. versionadded:: 1.7.0
|
| Returns
| -------
| r : ndarray
| The reduced array. If `out` was supplied, `r` is a reference to it.
|
| Examples
| --------
| >>> np.multiply.reduce([2,3,5])
| 30
|
| A multi-dimensional array example:
|
| >>> X = np.arange(8).reshape((2,2,2))
| >>> X
| array([[[0, 1],
| [2, 3]],
| [[4, 5],
| [6, 7]]])
| >>> np.add.reduce(X, 0)
| array([[ 4, 6],
| [ 8, 10]])
| >>> np.add.reduce(X) # confirm: default axis value is 0
| array([[ 4, 6],
| [ 8, 10]])
| >>> np.add.reduce(X, 1)
| array([[ 2, 4],
| [10, 12]])
| >>> np.add.reduce(X, 2)
| array([[ 1, 5],
| [ 9, 13]])
|
| reduceat(...)
| reduceat(a, indices, axis=0, dtype=None, out=None)
|
| Performs a (local) reduce with specified slices over a single axis.
|
| For i in ``range(len(indices))``, `reduceat` computes
| ``ufunc.reduce(a[indices[i]:indices[i+1]])``, which becomes the i-th
| generalized "row" parallel to `axis` in the final result (i.e., in a
| 2-D array, for example, if `axis = 0`, it becomes the i-th row, but if
| `axis = 1`, it becomes the i-th column). There are three exceptions to this:
|
| * when ``i = len(indices) - 1`` (so for the last index),
| ``indices[i+1] = a.shape[axis]``.
| * if ``indices[i] >= indices[i + 1]``, the i-th generalized "row" is
| simply ``a[indices[i]]``.
| * if ``indices[i] >= len(a)`` or ``indices[i] < 0``, an error is raised.
|
| The shape of the output depends on the size of `indices`, and may be
| larger than `a` (this happens if ``len(indices) > a.shape[axis]``).
|
| Parameters
| ----------
| a : array_like
| The array to act on.
| indices : array_like
| Paired indices, comma separated (not colon), specifying slices to
| reduce.
| axis : int, optional
| The axis along which to apply the reduceat.
| dtype : data-type code, optional
| The type used to represent the intermediate results. Defaults
| to the data type of the output array if this is provided, or
| the data type of the input array if no output array is provided.
| out : ndarray, None, or tuple of ndarray and None, optional
| A location into which the result is stored. If not provided or `None`,
| a freshly-allocated array is returned. For consistency with
| :ref:`ufunc.__call__`, if given as a keyword, this may be wrapped in a
| 1-element tuple.
|
| .. versionchanged:: 1.13.0
| Tuples are allowed for keyword argument.
|
| Returns
| -------
| r : ndarray
| The reduced values. If `out` was supplied, `r` is a reference to
| `out`.
|
| Notes
| -----
| A descriptive example:
|
| If `a` is 1-D, the function `ufunc.accumulate(a)` is the same as
| ``ufunc.reduceat(a, indices)[::2]`` where `indices` is
| ``range(len(array) - 1)`` with a zero placed
| in every other element:
| ``indices = zeros(2 * len(a) - 1)``, ``indices[1::2] = range(1, len(a))``.
|
| Don't be fooled by this attribute's name: `reduceat(a)` is not
| necessarily smaller than `a`.
|
| Examples
| --------
| To take the running sum of four successive values:
|
| >>> np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]
| array([ 6, 10, 14, 18])
|
| A 2-D example:
|
| >>> x = np.linspace(0, 15, 16).reshape(4,4)
| >>> x
| array([[ 0., 1., 2., 3.],
| [ 4., 5., 6., 7.],
| [ 8., 9., 10., 11.],
| [ 12., 13., 14., 15.]])
|
| ::
|
| # reduce such that the result has the following five rows:
| # [row1 + row2 + row3]
| # [row4]
| # [row2]
| # [row3]
| # [row1 + row2 + row3 + row4]
|
| >>> np.add.reduceat(x, [0, 3, 1, 2, 0])
| array([[ 12., 15., 18., 21.],
| [ 12., 13., 14., 15.],
| [ 4., 5., 6., 7.],
| [ 8., 9., 10., 11.],
| [ 24., 28., 32., 36.]])
|
| ::
|
| # reduce such that result has the following two columns:
| # [col1 * col2 * col3, col4]
|
| >>> np.multiply.reduceat(x, [0, 3], 1)
| array([[ 0., 3.],
| [ 120., 7.],
| [ 720., 11.],
| [ 2184., 15.]])
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| identity
| The identity value.
|
| Data attribute containing the identity element for the ufunc, if it has one.
| If it does not, the attribute value is None.
|
| Examples
| --------
| >>> np.add.identity
| 0
| >>> np.multiply.identity
| 1
| >>> np.power.identity
| 1
| >>> print(np.exp.identity)
| None
|
| nargs
| The number of arguments.
|
| Data attribute containing the number of arguments the ufunc takes, including
| optional ones.
|
| Notes
| -----
| Typically this value will be one more than what you might expect because all
| ufuncs take the optional "out" argument.
|
| Examples
| --------
| >>> np.add.nargs
| 3
| >>> np.multiply.nargs
| 3
| >>> np.power.nargs
| 3
| >>> np.exp.nargs
| 2
|
| nin
| The number of inputs.
|
| Data attribute containing the number of arguments the ufunc treats as input.
|
| Examples
| --------
| >>> np.add.nin
| 2
| >>> np.multiply.nin
| 2
| >>> np.power.nin
| 2
| >>> np.exp.nin
| 1
|
| nout
| The number of outputs.
|
| Data attribute containing the number of arguments the ufunc treats as output.
|
| Notes
| -----
| Since all ufuncs can take output arguments, this will always be (at least) 1.
|
| Examples
| --------
| >>> np.add.nout
| 1
| >>> np.multiply.nout
| 1
| >>> np.power.nout
| 1
| >>> np.exp.nout
| 1
|
| ntypes
| The number of types.
|
| The number of numerical NumPy types - of which there are 18 total - on which
| the ufunc can operate.
|
| See Also
| --------
| numpy.ufunc.types
|
| Examples
| --------
| >>> np.add.ntypes
| 18
| >>> np.multiply.ntypes
| 18
| >>> np.power.ntypes
| 17
| >>> np.exp.ntypes
| 7
| >>> np.remainder.ntypes
| 14
|
| signature
|
| types
| Returns a list with types grouped input->output.
|
| Data attribute listing the data-type "Domain-Range" groupings the ufunc can
| deliver. The data-types are given using the character codes.
|
| See Also
| --------
| numpy.ufunc.ntypes
|
| Examples
| --------
| >>> np.add.types
| ['??->?', 'bb->b', 'BB->B', 'hh->h', 'HH->H', 'ii->i', 'II->I', 'll->l',
| 'LL->L', 'qq->q', 'QQ->Q', 'ff->f', 'dd->d', 'gg->g', 'FF->F', 'DD->D',
| 'GG->G', 'OO->O']
|
| >>> np.multiply.types
| ['??->?', 'bb->b', 'BB->B', 'hh->h', 'HH->H', 'ii->i', 'II->I', 'll->l',
| 'LL->L', 'qq->q', 'QQ->Q', 'ff->f', 'dd->d', 'gg->g', 'FF->F', 'DD->D',
| 'GG->G', 'OO->O']
|
| >>> np.power.types
| ['bb->b', 'BB->B', 'hh->h', 'HH->H', 'ii->i', 'II->I', 'll->l', 'LL->L',
| 'qq->q', 'QQ->Q', 'ff->f', 'dd->d', 'gg->g', 'FF->F', 'DD->D', 'GG->G',
| 'OO->O']
|
| >>> np.exp.types
| ['f->f', 'd->d', 'g->g', 'F->F', 'D->D', 'G->G', 'O->O']
|
| >>> np.remainder.types
| ['bb->b', 'BB->B', 'hh->h', 'HH->H', 'ii->i', 'II->I', 'll->l', 'LL->L',
| 'qq->q', 'QQ->Q', 'ff->f', 'dd->d', 'gg->g', 'OO->O']
相关参考
转载:https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.special.boxcox1p.html
scipy.special.boxcox1p(x, lmbda) =
Compute the Box-Cox transformation of 1 + x.
The Box-Cox transformation computed by boxcox1p is:
y = ((1+x)**lmbda - 1) / lmbda if lmbda != 0 log(1+x) if lmbda == 0
Returns nan if x < -1. Returns -inf if x == -1 and lmbda < 0.
New in version 0.14.0.
Parameters: | x : array_like
lmbda : array_like
|
---|---|
Returns: | y : array
|
Examples
>> boxcox1p(1e-4, [0, 0.5, 1]) array([ 9.99950003e-05, 9.99975001e-05, 1.00000000e-04]) >>> boxcox1p([0.01, 0.1], 0.25) array([ 0.00996272, 0.09645476])