在处理公开数据集时,常用pandas读取数据,某些列可能为list数据,具体形式为:
df = pd.read_csv(csv_file)
##############################
[[41.145633, -8.610822], [41.145732, -8.610885], [41.146065, -8.610588], [41.146632, -8.609616], [41.147055, -8.608833], [41.147892, -8.608266], [41.148441, -8.607501], [41.148351, -8.606718], [41.148261, -8.60607], [41.147883, -8.604756], [41.147541, -8.603874], [41.146551, -8.603838], [41.145948, -8.604864], [41.1453, -8.604612], [41.145327, -8.604603], [41.144733, -8.604603], [41.144157, -8.604846], [41.143428, -8.603208], [41.142033, -8.601903], [41.139279, -8.601336], [41.137857, -8.601228], [41.137569, -8.601363], [41.137533, -8.602587], [41.13792, -8.604027], [41.137164, -8.606124], [41.136381, -8.60796], [41.135679, -8.608176], [41.134734, -8.607888], [41.132988, -8.607321], [41.131386, -8.606835], [41.131044, -8.608194], [41.129523, -8.608662], [41.128884, -8.60994], [41.128857, -8.611281], [41.128389, -8.611272], [41.127921, -8.611263], [41.127039, -8.611308], [41.126211, -8.611452], [41.125113, -8.611875], [41.124663, -8.612028], [41.123862, -8.612946], [41.123421, -8.614134], [41.122818, -8.613702], [41.12199, -8.615079], [41.120343, -8.615286], [41.118759, -8.615214], [41.118156, -8.616375], [41.117823, -8.618256], [41.117823, -8.618238], [41.117814, -8.618247], [41.117724, -8.618778], [41.117319, -8.6211], [41.118147, -8.621892], [41.118831, -8.621685], [41.118849, -8.621685], [41.11884, -8.621676], [41.118894, -8.62146], [41.118984, -8.62092], [41.119191, -8.620182], [41.119308, -8.619759], [41.119326, -8.619444], [41.118552, -8.618841], [41.117832, -8.617653], [41.118273, -8.615322]]
在数据预处理后,通常使用DataFrame自带的写入csv文件方法保存清洗数据
df.to_csv(csv_wri_file, index = False)
但这种方法并不智能,在后续pandas读取csv_wri_file文件时,该列是以str形式读入的,具体形式如下:
df = pd.read_csv(csv_wri_file)
##############################
[(41.160942, -8.621019), (41.160753, -8.621037), (41.160375, -8.621658), (41.160285, -8.621703), (41.16024, -8.621379), (41.160825, -8.621154), (41.161257, -8.621064), (41.161563, -8.620533), (41.161707, -8.620245), (41.162049, -8.619471), (41.162157, -8.619291), (41.16249, -8.618472), (41.162427, -8.617221), (41.162319, -8.616213), (41.162013, -8.613486), (41.161743, -8.611551), (41.161725, -8.611542), (41.161725, -8.611551), (41.161743, -8.611542), (41.161743, -8.611524), (41.161554, -8.61003), (41.161203, -8.608221), (41.160924, -8.606781), (41.160645, -8.605188), (41.159988, -8.602965), (41.159565, -8.601543), (41.159448, -8.601246), (41.159448, -8.601264), (41.159466, -8.601273), (41.159331, -8.600724), (41.158872, -8.599005), (41.158845, -8.598978), (41.158107, -8.598771), (41.156091, -8.599176), (41.154444, -8.599482), (41.153715, -8.599653), (41.152869, -8.599788), (41.151636, -8.599977), (41.151186, -8.600094), (41.151213, -8.600085), (41.150727, -8.600166), (41.149503, -8.600382), (41.149368, -8.600139), (41.149179, -8.599095), (41.148666, -8.598645), (41.148423, -8.598501), (41.148972, -8.597367), (41.149782, -8.59608), (41.150268, -8.595099), (41.150277, -8.595099), (41.150565, -8.594514), (41.150727, -8.593272), (41.150241, -8.591472), (41.149728, -8.589564), (41.149215, -8.587566), (41.148927, -8.586378), (41.14881, -8.585658), (41.1489, -8.585604), (41.148891, -8.585586)]
from ast import literal_eval
for idx in range(df.shape[0]):
curr_row = literal_eval(df.iloc[idx,'列名'])
literal_eval可以将list形式的字符串转化为list,参数为str类型
转化后为:
<class 'list'>
[(41.160942, -8.621019), (41.160753, -8.621037), (41.160375, -8.621658), (41.160285, -8.621703), (41.16024, -8.621379), (41.160825, -8.621154), (41.161257, -8.621064), (41.161563, -8.620533), (41.161707, -8.620245), (41.162049, -8.619471), (41.162157, -8.619291), (41.16249, -8.618472), (41.162427, -8.617221), (41.162319, -8.616213), (41.162013, -8.613486), (41.161743, -8.611551), (41.161725, -8.611542), (41.161725, -8.611551), (41.161743, -8.611542), (41.161743, -8.611524), (41.161554, -8.61003), (41.161203, -8.608221), (41.160924, -8.606781), (41.160645, -8.605188), (41.159988, -8.602965), (41.159565, -8.601543), (41.159448, -8.601246), (41.159448, -8.601264), (41.159466, -8.601273), (41.159331, -8.600724), (41.158872, -8.599005), (41.158845, -8.598978), (41.158107, -8.598771), (41.156091, -8.599176), (41.154444, -8.599482), (41.153715, -8.599653), (41.152869, -8.599788), (41.151636, -8.599977), (41.151186, -8.600094), (41.151213, -8.600085), (41.150727, -8.600166), (41.149503, -8.600382), (41.149368, -8.600139), (41.149179, -8.599095), (41.148666, -8.598645), (41.148423, -8.598501), (41.148972, -8.597367), (41.149782, -8.59608), (41.150268, -8.595099), (41.150277, -8.595099), (41.150565, -8.594514), (41.150727, -8.593272), (41.150241, -8.591472), (41.149728, -8.589564), (41.149215, -8.587566), (41.148927, -8.586378), (41.14881, -8.585658), (41.1489, -8.585604), (41.148891, -8.585586)]