鸢尾花练习资源(这个资源有瑕疵,index列和Species 都是带”“的字符串 导致一些加载现实问题,从而验证 还是pandas最好用)
"index","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
"1",5.1,3.5,1.4,0.2,"setosa"
"2",4.9,3,1.4,0.2,"setosa"
"3",4.7,3.2,1.3,0.2,"setosa"
"4",4.6,3.1,1.5,0.2,"setosa"
"5",5,3.6,1.4,0.2,"setosa"
"6",5.4,3.9,1.7,0.4,"setosa"
"7",4.6,3.4,1.4,0.3,"setosa"
"8",5,3.4,1.5,0.2,"setosa"
"9",4.4,2.9,1.4,0.2,"setosa"
"10",4.9,3.1,1.5,0.1,"setosa"
下载链接(鸢尾花)
https://download.csdn.net/download/qq_27437073/88475286?spm=1001.2014.3001.5503https://download.csdn.net/download/qq_27437073/88475286?spm=1001.2014.3001.5503
import csv
import numpy as np
path = r"D:\DevelopWorkSpace\vsCodeWorkSpaces\加载数据集示例\iris.csv"
with open(path,'r') as f:
reader = csv.reader(f,delimiter = ',')
headers = next(reader)
data = list(reader)
data = np.array(data)
print(headers)
print(data[0:30])
['index', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species']
[['1' '5.1' '3.5' '1.4' '0.2' 'setosa']
['2' '4.9' '3' '1.4' '0.2' 'setosa']
['3' '4.7' '3.2' '1.3' '0.2' 'setosa']
['4' '4.6' '3.1' '1.5' '0.2' 'setosa']
['5' '5' '3.6' '1.4' '0.2' 'setosa']]
path = r"D:\DevelopWorkSpace\vsCodeWorkSpaces\加载数据集示例\iris.csv"
(1)r 保持url原样输出
(2)reader = csv.reader(f,delimiter = ',')
delimiter = ',' 设置数据集间隔符号
(3)next(reader) 获得下一行内容
from numpy import loadtxt
def add_two(x):
transText=str(x,encoding='utf-8')
if transText == '"setosa"':
return 1.0
elif transText=='"versicolor"':
return 2.0
elif transText=='"virginica"':
return 3.0
path = r"D:\DevelopWorkSpace\vsCodeWorkSpaces\加载数据集示例\iris.csv"
datapath= open(path, 'r')
data = loadtxt(datapath, delimiter=",",skiprows=1,usecols = (1,2,3,4,5),converters={5:add_two})
print(data.shape)
print(data[:3])
(150, 5)
[[5.1 3.5 1.4 0.2 1. ]
[4.9 3. 1.4 0.2 1. ]
[4.7 3.2 1.3 0.2 1. ]]
loadtxt(datapath, delimiter=",",skiprows=1,usecols = (1,2,3,4,5),converters={5:add_two})
(1)skiprows 省略第一行
(2)usecols = (1,2,3,4,5) 显示1,2,3,4,5列
(3)converters={5:add_two}) 第五列内容使用 add_two方法进行转换
from pandas import read_csv
from pandas import set_option
path=r"D:\DevelopWorkSpace\vsCodeWorkSpaces\加载数据集示例\iris.csv"
data = read_csv(path)
#(150, 6)数据大小 150行6列
print(data.shape)
#显示前0-10行
print(data[:10])
#显示前50行
#print(data.head(50))
#显示每列数据类型
print(data.dtypes)
#set_option参数一览
# pd.set_option('display.max_rows',xxx) # 最大行数
# pd.set_option('display.min_rows',xxx) # 最小显示行数
# pd.set_option('display.max_columns',xxx) # 最大显示列数
# pd.set_option ('display.max_colwidth',xxx) #最大列字符数
# pd.set_option( 'display.precision',2) # 浮点型精度
# pd.set_option('display.float_format','{:,}'.format) #逗号分隔数字
# pd.set_option('display.float_format', '{:,.2f}'.format) #设置浮点精度
# pd.set_option('display.float_format', '{:.2f}%'.format) #百分号格式化
# pd.set_option('plotting.backend', 'altair') # 更改后端绘图方式
# pd.set_option('display.max_info_columns', 200) # info输出最大列数
# pd.set_option('display.max_info_rows', 5) # info计数null时的阈值
# pd.describe_option() #展示所有设置和描述
# pd.reset_option('all') #重置所有设置选项
set_option('display.max_colwidth', 100)
set_option('precision', 2)
#统计数据
# 总数
# 平均值
# 标准偏差
# 最低价值
# 最大值
# 25%
# 中位数,即50%
# 75%
print(data.describe())
#查看类分布情况
# Petal.Length 属性名称
# 1.0 1 值 出现次数
# 1.1 1
count_class = data.groupby('Petal.Length').size()
#count_class2 = data.groupby('Petal.Length')
print(count_class)
#属性之间的关联性
# 系数值= 1 -它表示变量之间的完全正相关。
# 系数值= -1 -它表示变量之间完全负的相关性。
# 系数值= 0 -它表示变量之间完全没有相关性。
correlations = data.corr(method='pearson')
print(correlations)
(150, 6)
index Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 1 5.1 3.5 1.4 0.2 setosa
1 2 4.9 3.0 1.4 0.2 setosa
2 3 4.7 3.2 1.3 0.2 setosa
3 4 4.6 3.1 1.5 0.2 setosa
4 5 5.0 3.6 1.4 0.2 setosa
5 6 5.4 3.9 1.7 0.4 setosa
6 7 4.6 3.4 1.4 0.3 setosa
7 8 5.0 3.4 1.5 0.2 setosa
8 9 4.4 2.9 1.4 0.2 setosa
9 10 4.9 3.1 1.5 0.1 setosa
index int64
Sepal.Length float64
Sepal.Width float64
Petal.Length float64
Petal.Width float64
Species object
dtype: object
index Sepal.Length Sepal.Width Petal.Length Petal.Width
count 150.00 150.00 150.00 150.00 150.00
mean 75.50 5.84 3.06 3.76 1.20
std 43.45 0.83 0.44 1.77 0.76
min 1.00 4.30 2.00 1.00 0.10
25% 38.25 5.10 2.80 1.60 0.30
50% 75.50 5.80 3.00 4.35 1.30
75% 112.75 6.40 3.30 5.10 1.80
max 150.00 7.90 4.40 6.90 2.50
Petal.Length
1.0 1
1.1 1
1.2 2
1.3 7
1.4 13
1.5 13
1.6 7
1.7 4
1.9 2
3.0 1
3.3 2
3.5 2
3.6 1
3.7 1
3.8 1
3.9 3
4.0 5
4.1 3
4.2 4
4.3 2
4.4 4
4.5 8
4.6 3
4.7 5
4.8 4
4.9 5
5.0 4
5.1 8
5.2 2
5.3 2
5.4 2
5.5 3
5.6 6
5.7 3
5.8 3
5.9 2
6.0 2
6.1 3
6.3 1
6.4 1
6.6 1
6.7 2
6.9 1
dtype: int64
index Sepal.Length Sepal.Width Petal.Length Petal.Width
index 1.00 0.72 -0.40 0.88 0.90
Sepal.Length 0.72 1.00 -0.12 0.87 0.82
Sepal.Width -0.40 -0.12 1.00 -0.43 -0.37
Petal.Length 0.88 0.87 -0.43 1.00 0.96
Petal.Width 0.90 0.82 -0.37 0.96 1.00
看代码中注释