pandas教程:Dataframe筛选数据

  1. 原文连接: https://blog.csdn.net/hnanxihotmail/article/details/81625854

  2. >>> import pandas as pd

  3. >>> import numpy as np

  4. >>> #今天还是用到了DataFrame,如果你用一下它的筛选数据的功能,你会大吃一惊,它非常擅长筛选数据,可以极大提高你的工作效率,废话不多说,下面看看几个进行复杂数据筛选的例子。

  5. >>> #首先我们创建一个DataFrame,该DataFrame包含的数据如下

  6. >>> df=pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))

  7. >>> df

  8. A B C D

  9. 0 -1.108935 1.187163 1.546778 0.246329

  10. 1 -0.015045 1.367264 -0.617322 -1.068358

  11. 2 0.502788 0.305497 -0.819171 -0.331027

  12. 3 2.585354 -0.043285 1.056259 -0.079882

  13. 4 0.316549 -1.464567 1.504431 0.803362

  14. 5 -1.097251 -0.706594 -1.393058 -0.251690

  15. >>> #假如我们想要筛选D列数据中大于0的行

  16. >>> df[df.D>0]

  17. A B C D

  18. 0 -1.108935 1.187163 1.546778 0.246329

  19. 4 0.316549 -1.464567 1.504431 0.803362

  20. >>> #使用&符号可以实现多条件筛选,当然是用"|"符号也可以实现多条件,只不过他是或的关系。

  21. >>> df[(df.D>0)&(df.C<0)]

  22. Empty DataFrame

  23. Columns: [A, B, C, D]

  24. Index: []

  25. >>> df[(df.D<0)&(df.C>0)]

  26. A B C D

  27. 3 2.585354 -0.043285 1.056259 -0.079882

  28. >>> df[(df.D<0.5)&(df.C>1.5)]

  29. A B C D

  30. 0 -1.108935 1.187163 1.546778 0.246329

  31. >>> df[(df.D<0.5)|(df.C>1.5)]

  32. A B C D

  33. 0 -1.108935 1.187163 1.546778 0.246329

  34. 1 -0.015045 1.367264 -0.617322 -1.068358

  35. 2 0.502788 0.305497 -0.819171 -0.331027

  36. 3 2.585354 -0.043285 1.056259 -0.079882

  37. 4 0.316549 -1.464567 1.504431 0.803362

  38. 5 -1.097251 -0.706594 -1.393058 -0.251690

  39. >>> df[(df.D<0.5)|(df.C>1.52)]

  40. A B C D

  41. 0 -1.108935 1.187163 1.546778 0.246329

  42. 1 -0.015045 1.367264 -0.617322 -1.068358

  43. 2 0.502788 0.305497 -0.819171 -0.331027

  44. 3 2.585354 -0.043285 1.056259 -0.079882

  45. 5 -1.097251 -0.706594 -1.393058 -0.251690

  46. >>> #假如我们只需要A和B列数据,而D和C列数据都是用于筛选的,可以这样写:只返回了AB两列数据

  47. >>> df[['A','B']][(df.D>0)&(df.C<0)]

  48. Empty DataFrame

  49. Columns: [A, B]

  50. Index: []

  51. >>> df[['A','B']][(df.D<0)&(df.C>0)]

  52. A B

  53. 3 2.585354 -0.043285

  54. >>> index = (df.D<0)&(df.C>0)

  55. >>> index

  56. 0 False

  57. 1 False

  58. 2 False

  59. 3 True

  60. 4 False

  61. 5 False

  62. dtype: bool

  63. >>> df(index)

  64. Traceback (most recent call last):

  65. File "", line 1, in

  66. df(index)

  67. TypeError: 'DataFrame' object is not callable

  68. >>> df[index]

  69. A B C D

  70. 3 2.585354 -0.043285 1.056259 -0.079882

  71. >>> #我们还可以使用insin方法来筛选特定的值,把要筛选的值写到一个列表里,如alist

  72. >>> alist=[-0.079882,0.687050,0.3685412]

  73. >>> df['D'].isin(alist)

  74. 0 False

  75. 1 False

  76. 2 False

  77. 3 False

  78. 4 False

  79. 5 False

  80. Name: D, dtype: bool

  81. >>> alist=[0.246329]

  82. >>> df['D'].isin(alist)

  83. 0 False

  84. 1 False

  85. 2 False

  86. 3 False

  87. 4 False

  88. 5 False

  89. Name: D, dtype: bool

  90. >>> df[df['D'].isin(alist)]

  91. Empty DataFrame

  92. Columns: [A, B, C, D]

  93. Index: []

  94. >>> df=pd.DataFrame(np.random.normal(6,4),columns=list('ABCD'))

  95. Traceback (most recent call last):

  96. File "", line 1, in

  97. df=pd.DataFrame(np.random.normal(6,4),columns=list('ABCD'))

  98. File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 422, in __init__

  99. raise ValueError('DataFrame constructor not properly called!')

  100. ValueError: DataFrame constructor not properly called!

  101. >>> df=pd.DataFrame(np.arange(16).reshape(4,4),columns=list('ABCD'))

  102. >>> df

  103. A B C D

  104. 0 0 1 2 3

  105. 1 4 5 6 7

  106. 2 8 9 10 11

  107. 3 12 13 14 15

  108. >>> alist=[11]

  109. >>> df['D'].isin(alist)

  110. 0 False

  111. 1 False

  112. 2 True

  113. 3 False

  114. Name: D, dtype: bool

  115. >>> df[df['D'].isin(alist)]

  116. A B C D

  117. 2 8 9 10 11

  118. >>>

你可能感兴趣的:(pandas,dataframe)