用pandas读取csv文件,并跳过不规则的非数值行,计算df列的平均值

用pandas读取csv文件,并跳过不规则的非数值行,计算df列的平均值

最近在用pandas处理csv文件时,发现一个头大的问题:系统导出的csv文件表头之前有些不需要的行,这些行在每个csv文件中的数目都不一样

一、待处理的网元csv日志文件

目录:Lange_N41_RSRP0309

CDL-A CASE0.csv

网元名称,BJIGNB01_turn
任务类型,性能监测-小区性能监测
保存时间,2022-03-09 11:01:23
网元版本,BTS5900 V100R017C00SPC100
时间,"NR DU小区标识","下行RLC总吞吐率(bps)","上行RLC总吞吐率(bps)","下行MAC总吞吐率(bps)","上行MAC总吞吐率(bps)"
 03-09 11:01:21(949),"21","1101284672","964120","1219751160","1149152"
 03-09 11:01:22(969),"21","1086584088","914360","1207563360","1206648"
 03-09 11:01:23(949),"21","1093880872","924128","1216729816","1164952"
 03-09 11:01:24(949),"21","1081807848","934496","1204736448","1252160"
 03-09 11:01:25(969),"21","1054864328","904768","1167371600","1196128"
 03-09 11:01:26(939),"21","998016480","976184","1112138088","1240240"
 03-09 11:01:27(949),"21","978282432","910072","1096194072","1166848"
 03-09 11:01:28(939),"21","976951624","841608","1077764224","1134752"
 03-09 11:01:29(938),"21","1026227488","932736","1153665256","1188672"
 03-09 11:01:30(939),"21","1022991936","967576","1141611488","1231488"
 03-09 11:01:31(949),"21","1038911560","896408","1150961320","1179952"
 03-09 11:01:32(969),"21","1078508792","902184","1205576336","1201392"
 03-09 11:01:33(966),"21","1056336608","923544","1196652776","1211680"
 03-09 11:01:34(966),"21","1067465240","1009912","1166485264","1281136"
 03-09 11:01:35(949),"21","1096801368","943936","1210285464","1221208"
 03-09 11:01:36(959),"21","1092690616","926336","1218678328","1203920"
 03-09 11:01:37(959),"21","1070899552","907096","1195140520","1192960"
 03-09 11:01:38(969),"21","1071070040","928384","1185417888","1202424"
 03-09 11:01:39(949),"21","1073769536","939680","1188301792","1211008"
 03-09 11:01:40(949),"21","1024114560","920208","1142975656","1174312"
 03-09 11:01:41(969),"21","978075368","913864","1096556272","1192792"
 03-09 11:01:42(936),"21","959354232","909072","1068592088","1172376"
 03-09 11:01:43(946),"21","959739176","926232","1061650296","1177160"
 03-09 11:01:44(949),"21","942416528","874848","1071083096","1135632"
 03-09 11:01:45(938),"21","879348336","887456","974437352","1121584"

 

二、跳过csv前面不规则的几行

read_csv_pandas_avg.py

# -*- coding: UTF-8 -*-

# 自动读取“网页版csv日志格式”脚本
import os
import pandas as pd

# 给定待读取的csv文件的路径到列表里
csv_filepath = r"D:\\myproject\\read_csv_calculate_avg\\report\\Lange_N41_RSRP0309\\CDL-A CASE0.csv"


# df1 = df.iloc[:, 0:4]  #读取第1列到第4列数据


###1.跳过每个csv文件开头不需要的行的函数
def skip_to(fle,**kwargs):
    if os.stat(fle).st_size == 0:
        raise ValueError("File is empty")
    with open(fle) as f:
        pos = 0
        cur_line = f.readline()
        while not cur_line.find('下行MAC总吞吐率(bps)","上行MAC总吞吐率(bps)')>=0:
            pos = f.tell()
            cur_line = f.readline()
        f.seek(pos)
        return pd.read_csv(f, **kwargs)

# 2.读取csv文件到内存
df = skip_to(csv_filepath,encoding = 'gbk')  # 读取csv用模块skip_to()
df = df["下行MAC总吞吐率(bps)"]
avg = df.mean() #求一列的平均值



print(df)
print(avg)

你可能感兴趣的:(Python,python,pandas,csv)