1、Pandas实现数据的合并concat,增加一行(https://blog.csdn.net/weixin_47661174/article/details/124698328)
pd.concat([df1,df2])
2、Series、DataFrame(pandas)和ndarray(numpy)三者相互转换(https://blog.csdn.net/qq_36743482/article/details/114678409)
ndarray => Series
npa = np.arange(12)
ser = pd.Series(npa)
Series => ndarray
npa_s = np.array(ser)
ndarray => DataFrame
npa2 = npa.reshape(3, -1)
df = pd.DataFrame(npa2)
DataFrame => ndarray
npa_d = np.array(df)
npa_v = df.values # npa_d npa_v 一样
DataFrame -> Series
type(df[0]) # pandas.core.series.Series
Series -> DataFrame
pd.DataFrame(ser)
3、python中series转dataframe的两种方法(https://zhuanlan.zhihu.com/p/469512251)
pd.DataFrame([j.to_dict()]) #series有转frame dict等方法
4、pandas读取某几行(https://blog.csdn.net/weixin_39025679/article/details/109216669)
https://blog.csdn.net/bianxia123456/article/details/111396760
np.loc[0:m]
python.pandas.DataFrame初始化,dic写入,切片写入,存csv问题合集(https://zhuanlan.zhihu.com/p/489099818)
df3.loc[['No.1','No.3'],['name','color']] # '[]', 索引特定行列
name color
No.1 apple red
No.3 watermelon green
5、pandas定位某一行、选取列、列累加
for i in range(len(all_data)):
# print(all_data['飞靶号'][0])
# print(all_data[i])
if all_data['飞靶号'][0]=='退电品':
print(i)
print(all_data.iloc[i])
# all_data = np.delete(all_data,i,axis=0)
exit(1)
X=all_data[['温度','PH','L','A','B']]
y=all_data[['二次染时']]
X.head()
x_train[['温度']].apply(lambda x:x.sum())
6、使用numpy初始化数据类型为object的空数组(https://www.cnpython.com/qa/1341898)
a = np.empty((12,), dtype=object)
7、Python: numpy数组添加一行或者一列, numpy数组的增删查改(https://blog.csdn.net/qq_40765537/article/details/105869910)
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([2,5,8])
print(np.r_[a,[b]])
输出:
[[1 2 3]
[4 5 6]
[7 8 9]
[2 5 8]]
8、【Python数据处理】用pandas将dataframe写入excel中(https://blog.csdn.net/chengyikang20/article/details/90139384)
将pycharm生成的数据用pandas库中的to_excel保存为excel文档时,报错:numpy.ndarray object has no attribute to_excel(https://blog.csdn.net/m0_67870771/article/details/124603745)
import pandas as pd
file_path = 'E:/data/2.xlsx' #想要保存到的位置和文件名称、文件类型。
df = pd.DataFrame(data)
dt.to_excel(file_path)
9、numpy列相加 python(https://www.csdn.net/tags/MtTaAgxsMDQ2MjQtYmxvZwO0O0OO0O0O.html)
x.sum(axis=0)
10、Python中numpy如何提取矩阵的某一行或某一列(https://www.yisu.com/zixun/179241.html)
矩阵的某一行
a[1]
Out[32]: array([3, 4, 5])
矩阵的某一列
a[:,1]
Out[33]: array([1, 4, 7])
11、numpy选择特定的行列(https://blog.csdn.net/goodxin_ie/article/details/109659893)
x[[0,1]][:,[0,3]]
Out[31]:
array([[0, 3],
[4, 7]])
x_test = np.empty((1, 15), dtype=object) # Test数据集
test_feibahao = x_test[:, 0] # 取测试集飞靶号一列
12、Numpy删除行(多行操作)(https://blog.csdn.net/God_WZH/article/details/122575683)
https://blog.csdn.net/A_JI_97/article/details/116235753
删除行:
x1 = np.delete(x, 0, axis=0)
y1 = np.delete(y, 1, axis=0)
print(x1)
print(y1)
13、numpy行列转换(https://blog.csdn.net/m0_37294838/article/details/102743533)
14、如何轻松地将numpy数组(矩阵)从python提取到excel?(https://www.cnpython.com/qa/1678585)
import numpy
numpy.savetxt('your\location\yourfile.csv', numpy_array, delimiter=',')
15、Python合并两个numpy矩阵(http://t.zoukankan.com/itdyb-p-5735911.html)
我们随机生成了a,b这两个矩阵,下面进行合并操作:
hstack()在行上合并
np.hstack((a,b))
array([[ 8., 5., 1., 9.],
[ 1., 6., 8., 5.]])
vstack()在列上合并
np.vstack((a,b))
array([[ 8., 5.],
[ 1., 6.],
[ 1., 9.],
[ 8., 5.]])
16、Python教程:numpy数组初始化为相同的值(https://blog.csdn.net/sinat_38682860/article/details/111314885)
import numpy as np
a = np.ones((4,4)) * 10
[[10. 10. 10. 10.]
[10. 10. 10. 10.]
[10. 10. 10. 10.]
[10. 10. 10. 10.]]
17、数据分析入门之numpy数组数据大小比较与筛选去重(https://blog.csdn.net/ayouleyang/article/details/103757741)
18、【Numpy】Numpy求均值、中位数、众数的方法(https://blog.csdn.net/u013066730/article/details/108844068)
import numpy as np
均值
np.mean(nums)
中位数
np.median(nums)
from scipy import stats
stats.mode(nums)[0][0]
19、python读取EXCEL表格中有相同列名的值(https://blog.csdn.net/qq_41821067/article/details/121798607)
import pandas as pd
df = pd.read_excel('test1.xls',header=0)#现在Excel表格与py代码放在一个文件夹里
result = []
for s_li in df.columns:
打印列名
print(s_li)
if 'I' in str(s_li):
result.append(df[s_li])
print(result)
pd.DataFrame(result).to_excel(r'F:\python_project\result.xls')#保存的路径
20、
import statistics
l = ['温度','PH','DL1','DA1','DB1','DL2','一次染时']
l = ['wendu','ph','dl1','da1','db1','dl2','yici']
print(pandas_data.shape)
for i in l:
print(i+'方差为:%f' % np.var(pandas_data[i]),i+'标准差为:%f' % np.std(pandas_data[i]),i+'最大值为:%f' % np.max(pandas_data[i]),
i+'最小值为:%f' % np.min(pandas_data[i]),i+'平均值为:%f' % np.mean(pandas_data[i]),i+'中位数为:%f' % np.median(pandas_data[i]))
print(i+'众数为:', statistics.mode(pandas_data[i]))
print()
21、DataFrame 取某一行某一列或取某N行某N列(https://blog.csdn.net/qq_42140717/article/details/124350979)
取已知index的某一行数据:
df.loc[a]
取未知index某一行的数据:
df[1:2]#括号下包含,如取第二行数据则为应为[1:2]
取未知index某N行的数据:
df[0:10]
取已知名称的某一列:
df['name']
取不知名称,但知道第几列的数据:
df.iloc[:,2]
取已知名称的N列:
df[['name','name2']]
取已知名称的N行M列:
df['name'][0:4]
取不知名称的N行M列:
df.iloc[0:N,0:M]
iloc是只取索引值即只取数值。loc取得是index索引值,和列名字。如数据中索引值有重复的情况,loc会报错。不使用loc和iloc则是选择第几行的指定名称的列。
22、# 怎样取numpy数组指定行列
https://blog.csdn.net/goodxin_ie/article/details/109659893
b= a[c]先取想要的行数据
b = b[:,d]
print(b)
x[[0,1]][:,[0,3]]
Out[31]:
array([[0, 3],
[4, 7]])
23、Python中numpy数组的拼接、合并(https://blog.csdn.net/qq_39516859/article/details/80666070)
水平组合
np.hstack((a,b))
array([ 0, 1, 2, 0, 2, 4],
[ 3, 4, 5, 6, 8, 10],
[ 6, 7, 8, 12, 14, 16])
np.concatenate((a,b),axis=1)
array([ 0, 1, 2, 0, 2, 4],
[ 3, 4, 5, 6, 8, 10],
[ 6, 7, 8, 12, 14, 16])
24、# Pandas 读取 csv 文件提示:DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.
data = pd.read_csv(f, low_memory=False)
25、python读取csv文件的几种方式(含实例说明)(https://blog.csdn.net/qq_43160348/article/details/124331781)
import pandas as pd
df = pd.read_csv('../data_pro/audito_whole.csv')
print(df)
26、【Python】——筛选存在空值的行or非空值的行(https://blog.csdn.net/qq_40264559/article/details/124508563)
test = test[test['性别'].notna()] #去掉【性别】为空值的行
test
27、Pandas 创建一个空的Dataframe 并向其添加行与列(https://blog.csdn.net/qq_53817374/article/details/123771713)
import pandas as pd
df = pd.DataFrame(data=None,columns=['时间','车牌','北纬','东经'])
df
拼接(pandas.concat用法详解)(https://cloud.tencent.com/developer/news/372041)
pd.concat([df1,df2,df3]),默认axis=0,在0轴上合并。
28、# 【Python小随笔】Pandas读取每一行数据
for indexs in data.index:
print(data.loc[indexs].values[0:-1])
29、## pandas错误处理:A value is trying to be set on a copy of a slice from a DataFrame
quchong = df_all.drop_duplicates(subset='虚拟飞靶号')
print(quchong.shape)
quchong.insert(loc=6, column='hour', value='')
new_data = quchong.copy()
for i in range(quchong.shape[0]):
new_data['hour'].iloc[i]=int(quchong['一次化抛进槽时间'].iloc[i][11:13])
30、pandas添加新列的5种常见方法(https://www.jb51.net/article/251192.htm)
df.insert(loc=2, column='c', value=3) # 在最后一列后,插入值全为3的c列
print('插入c列:\n', df)
31、python数据去重(pandas)(https://blog.csdn.net/qq_39012566/article/details/98633780)
1、整行去重。
DataFrame.drop_duplicates()
2、按照其中某一列去重
DataFrame.drop_duplicates(subset=‘列名’)
32、pandas通过AND,OR,NOT多个条件提取(选择)行的代码(https://blog.csdn.net/qq_18351157/article/details/105403779)
print((df['age'] < 35) & ~(df['state'] == 'NY'))
33、Pandas数据排序(https://blog.csdn.net/weixin_47661174/article/details/124697231)
df.sort_values(by="aqi")
34、设置不同的图例(https://blog.csdn.net/qq_44039983/article/details/123510020)
plt.legend(['line1', 'line2'])
35、pandas 实现 in 和 not in 的用法及心得(https://blog.csdn.net/weixin_43064185/article/details/91374033)
IN
something.isin(somewhere)
NOT IN
~something.isin(somewhere)
36、pandas: groupby()分组求平均值、最大值等等(https://blog.csdn.net/DoyWang/article/details/109137700)
df.groupby('分组的名字')['求的列名'].mean()
a = sort_new_data.groupby('虚拟飞靶号')[['GLOSS_P1','GLOSS_P3']].mean()
37、pandas join操作详解(https://blog.csdn.net/bitcarmanlee/article/details/113311113)
import pandas as pd
def joindemo():
age_df = pd.DataFrame({'name': ['lili', 'lucy', 'tracy', 'mike'],
'age': [18, 28, 24, 36]})
score_df = pd.DataFrame({'name': ['tony', 'mike', 'akuda', 'tracy'],
'score': ['A', 'B', 'C', 'B']})
result = age_df.join(score_df, on='name')
print(result)
38、pandas求每列的最大值和最小值(https://blog.csdn.net/Mtf007/article/details/108909604)
df.min()
用来求每列的最小值
df.max()
用来求每列的最大值
39、解决Pandas的to_excel()写入不同Sheet,而不会被重写(https://blog.csdn.net/shykevin/article/details/111244838)
with pd.ExcelWriter('789.xlsx') as writer:
df1.to_excel(writer, sheet_name='Sheet1', index=False, header=True)
df2.to_excel(writer, sheet_name='Sheet2', index=False, header=True)
df3.to_excel(writer, sheet_name='Sheet3', index=False, header=True)
40、DataFrame在指定位置插入行和列(https://blog.csdn.net/weixin_46599926/article/details/126164876)
插入数据到第一列
df.insert(0,"col0",[99,99])
41、Numpy 中 np.vstack() 和 np.hstack() 简单解析(https://blog.csdn.net/nanhuaibeian/article/details/100597342)
42、DataFrame在指定位置插入行和列(https://blog.csdn.net/weixin_46599926/article/details/126164876)
插入数据到第一列
df.insert(0,"col0",[99,99])
43、pandas 删除某一行/列(https://blog.csdn.net/weixin_43914402/article/details/121077282)
test_data.drop(test_data[test_data['虚拟飞靶号'] == i].index)
44、设置坐标轴
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False
fig = plt.figure(figsize=(35, 12))
plt.scatter(feiba, pr,c='r', alpha=0.5, marker='.') # 预测值
plt.scatter(feiba, y_test,c='b', alpha=0.5, marker='.') # 真实值
plt.legend(['预测值','真实值'])
plt.xticks(rotation=-90)
plt.xlabel('飞靶号')
plt.ylabel('化抛时间')
45、df.isnull使用细节(https://blog.csdn.net/ningyanggege/article/details/80752299)
46、pandas 如何移动列的位置(https://blog.csdn.net/Ghjkku/article/details/125021162)
https://blog.csdn.net/weixin_43848614/article/details/126315910
mid = df['采集时间'] # 取备采集时间的值
df.pop('采集时间') # 删除备采集时间
df.insert(0, '采集时间', mid) # 插入采集时间列
47、Python-修改Pandas数据表的列名(https://blog.csdn.net/weixin_44556353/article/details/125295463)
直接以属性赋值的方式,一次将全部的列名进行重新定义
data.columns = ['city','name','post','pay','request','number']
48、astype()函数,将DataFrame转换为String
pd_1 = pd_1[need_columns].astype('string')
49、DataFrame中列的顺序改变(https://blog.csdn.net/The_dream1/article/details/122688517)
order = ['date', 'time', 'open', 'high', 'low', 'close', 'volumefrom', 'volumeto']
df = df[order]
50、Python-Pandas-DataFrame对象转置(交换行列)(https://blog.csdn.net/shenyinwudi/article/details/118639251)
df_T = pd.DataFrame(df.values.T,columns=index_row,index=index_colums)
print(df_T)
51、Pandas数据分析25——pandas数据框样式设置(https://blog.csdn.net/weixin_46277779/article/details/126344626)
df.head().style.highlight_null(null_color='blue')
最大值高亮,默认黄色
df.head().style.highlight_max()
52、Python pandas dataframe:计算列中大于或小于阈值的元素数量(https://cloud.tencent.com/developer/ask/sof/356849)
import pandas as pd
df = pd.DataFrame({'c1': ['A', 'B','C','D','E'], 'c2': [3, 1, 0,2,5]})
count=df[df['c2'] >= 3].count().shape[0]
print(count) # prints 2
53、修改某一值
jiaozheng_pd.at[i, '预测值'] = jiaozheng_pd.at[i, '一次化抛时间'] + 10
54、pandas DataFrame 中at , loc ,iloc 区别
at 的列只能写列名,不能用下标
55、Python中numpy数组如何添加元素(https://m.py.cn/jishu/jichu/23441.html)
list_b = np.empty([0,3], dtype=int)
for i in range(10000):
list_b = np.append(list_b,[1,2,3])
56、python dataframe新增一列(https://blog.csdn.net/julyclj55555/article/details/122450287)
指明列名,并赋值即可:
data[‘addlist’]=[1,2]
57、