Basics: read in .csv; partition;

[pandas cheet sheet] (https://www.dataquest.io/blog...

import pandas as pd

Task 1: read in and partition

df_original = pd.read_csv("csv_file_name")
count_rec = df_original['attri_1'].count()
# must specify one column; otherwise cannot count
df_train = df_original[: int(0.8*count_rec)]
df_test = df_original[int(0.8*count_rec) :]
# use int(...) to convert

Task 2: divide or count by specific column value

Method1: boolean indexing
# Find rows with attri_1 = certain_value
# df.loc('index') selects by index
df_c1 = df_train.loc[df_train['attri_1'] == certain_value]
# '==' can also be substituted by '!='

# Or if we want to specify a range of values
df_c1 = df_train.loc[df_train['attri_1'].isin(some_values)]
# Not in certain values, add '~' at begining
df_c1 = df_train.loc[~ df_train['attri_1'].isin(some_values)]

# Combined conditions, note that '( )' is needed
df_c1 = df_train.loc[(df_train['attri_1']b)]
Method2: Label indexing
df_c1 = df_train.set_index('attri_1', append = True, drop = False).xs(value, level = 1)
Method3: df.query()
df_c1 = df_train.query(' attri_1 == value')

Reference

Stackcverflow. Available at https://stackoverflow.com/que...

你可能感兴趣的:(basic)