最近在进行一些实验,需要进行文本处理,提取文本中关键的字段数据,得到表格,进行分析。在此简要的进行记录。
一、需求是这样的:
'gpu_sim_insn',
'gpu_ipc',
'L1I_total_cache_accesses',
'L1D_total_cache_accesses',
'gpgpu_n_tot_thrd_icount',
'gpgpu_n_tot_w_icount',
'gpgpu_n_mem_read_local',
'gpgpu_n_mem_write_local',
'gpgpu_n_mem_read_global',
'gpgpu_n_mem_write_global',
'gpgpu_n_mem_texture',
'gpgpu_n_mem_const',
'gpgpu_n_load_insn',
'gpgpu_n_store_insn',
'gpgpu_n_shmem_insn',
'gpgpu_n_tex_insn',
'gpgpu_n_const_mem_insn',
'gpgpu_n_param_mem_insn'
import re
import sys
import os,glob
#定义目录:目录下有多个文件需要处理
path = 'D:\\GPUClusters\\Stargazer-master\\EXP_RESULT'
#定义输出文件
fout = open("res.txt",'w')
x = [
'gpu_sim_insn',
'gpu_ipc',
'L1I_total_cache_accesses',
'L1D_total_cache_accesses',
'gpgpu_n_tot_thrd_icount',
'gpgpu_n_tot_w_icount',
'gpgpu_n_mem_read_local',
'gpgpu_n_mem_write_local',
'gpgpu_n_mem_read_global',
'gpgpu_n_mem_write_global',
'gpgpu_n_mem_texture',
'gpgpu_n_mem_const',
'gpgpu_n_load_insn',
'gpgpu_n_store_insn',
'gpgpu_n_shmem_insn',
'gpgpu_n_tex_insn',
'gpgpu_n_const_mem_insn',
'gpgpu_n_param_mem_insn'
]
#改变路径
os.chdir(path)
#遍历目录下的所有文件
for filename in os.listdir():
fs = open(filename,'r+')
#处理文件中的每一行数据
for line in fs.readlines():
a = line.split()
if a != [] and a[0] in x:
fout.write(a[-1]+'\t')
if a[0] == 'gpgpu_n_param_mem_insn':
fout.write('\n')
break
fout.write('\n')
fout.close()
#比如d:\work下面是你要读取的文件,代码可以这样写:
import os
path = 'd:\\work' #or path = r'd:\work'
os.chdir(path)
for filename in os.listdir():
file = open(filename,'r')
for eachline in file.readlines():
#process eachline
Python 将文本文件的内容读入可以操作的字符串变量非常容易。文件对象提供了三个“读”方法: .read()、.readline() 和 .readlines()。每种方法可以接受一个变量以限制每次读取的数据量,但它们通常不使用变量。 .read() 每次读取整个文件,它通常用于将文件内容放到一个字符串变量中。然而 .read() 生成文件内容最直接的字符串表示,但对于连续的面向行的处理,它却是不必要的,并且如果文件大于可用内存,则不可能实现这种处理。
.readline() 和 .readlines() 非常相似。它们都在类似于以下的结构中使用:
fh = open('c:\\autoexec.bat')
for line in fh.readlines():
print line
boy:what's your name?
girl:my name is lebaishi,what about you?
boy:my name is wahaha.
girl:i like your name.
==============================================
girl:how old are you?
boy:I'm 16 years old,and you?
girl:I'm 14.what is your favorite color?
boy:My favorite is orange.
girl:I like orange too!
==============================================
boy:where do you come from?
girl:I come from SH.
boy:My home is not far from you,I live in Jiangsu province.
girl:Let's be good friends.
boy:OK!
需求:将文件(record.txt)中的数据进行分割并按照以下规律保存起来:
boy_log = []
girl_log = []
version = 1
def save_to_file(boy_log,girl_log,version):
filename_boy = 'boy_' + str(version) + ".txt"
filename_girl = 'girl_' + str(version) + ".txt"
fb = open(filename_boy,"w")
fg = open(filename_girl,"w")
fb.writelines(boy_log)
fg.writelines(girl_log)
fb.close()
fg.close()
def process(filename):
file = open(filename,"r")
for eachline in file.readlines():
if eachline[:6] != "======":
mylist = eachline.split(":")
if mylist[0] == "boy":
global boy_log
boy_log.append(mylist[-1])
else:
global girl_log
girl_log.append(mylist[-1])
else:
global version
save_to_file(boy_log,girl_log,version)
version += 1
boy_log = []
girl_log = []
save_to_file(boy_log,girl_log,version)
if __name__ == "__main__":
fn = "record.txt"
process(fn)
#/usr/bin/python
import sys
import os
os.system("ifconfig > ip.info")
fs = open("ip.info",'r+')
flag = 0
def get_ip():
for line in fs.readlines():
a = line.split()
if a != [] and a[0] == "eth0":
flag = 1
if a != [] and a[0] == "lo":
flag = 0
if flag == 0:
continue
else:
for item in a:
if a[0] == "inet" and item[0:5] == "addr:":
return item[5:]
ip = get_ip()
print ip