最近在进行一些实验,需要进行文本处理,提取文本中关键的字段数据,得到表格,进行分析。在此简要的进行记录。
一、需求是这样的:
'gpu_sim_insn', 'gpu_ipc', 'L1I_total_cache_accesses', 'L1D_total_cache_accesses', 'gpgpu_n_tot_thrd_icount', 'gpgpu_n_tot_w_icount', 'gpgpu_n_mem_read_local', 'gpgpu_n_mem_write_local', 'gpgpu_n_mem_read_global', 'gpgpu_n_mem_write_global', 'gpgpu_n_mem_texture', 'gpgpu_n_mem_const', 'gpgpu_n_load_insn', 'gpgpu_n_store_insn', 'gpgpu_n_shmem_insn', 'gpgpu_n_tex_insn', 'gpgpu_n_const_mem_insn', 'gpgpu_n_param_mem_insn'
import re import sys import os,glob #定义目录:目录下有多个文件需要处理 path = 'D:\\GPUClusters\\Stargazer-master\\EXP_RESULT' #定义输出文件 fout = open("res.txt",'w') x = [ 'gpu_sim_insn', 'gpu_ipc', 'L1I_total_cache_accesses', 'L1D_total_cache_accesses', 'gpgpu_n_tot_thrd_icount', 'gpgpu_n_tot_w_icount', 'gpgpu_n_mem_read_local', 'gpgpu_n_mem_write_local', 'gpgpu_n_mem_read_global', 'gpgpu_n_mem_write_global', 'gpgpu_n_mem_texture', 'gpgpu_n_mem_const', 'gpgpu_n_load_insn', 'gpgpu_n_store_insn', 'gpgpu_n_shmem_insn', 'gpgpu_n_tex_insn', 'gpgpu_n_const_mem_insn', 'gpgpu_n_param_mem_insn' ] #改变路径 os.chdir(path) #遍历目录下的所有文件 for filename in os.listdir(): fs = open(filename,'r+') #处理文件中的每一行数据 for line in fs.readlines(): a = line.split() if a != [] and a[0] in x: fout.write(a[-1]+'\t') if a[0] == 'gpgpu_n_param_mem_insn': fout.write('\n') break fout.write('\n') fout.close()
#比如d:\work下面是你要读取的文件,代码可以这样写: import os path = 'd:\\work' #or path = r'd:\work' os.chdir(path) for filename in os.listdir(): file = open(filename,'r') for eachline in file.readlines(): #process eachline
Python 将文本文件的内容读入可以操作的字符串变量非常容易。文件对象提供了三个“读”方法: .read()、.readline() 和 .readlines()。每种方法可以接受一个变量以限制每次读取的数据量,但它们通常不使用变量。 .read() 每次读取整个文件,它通常用于将文件内容放到一个字符串变量中。然而 .read() 生成文件内容最直接的字符串表示,但对于连续的面向行的处理,它却是不必要的,并且如果文件大于可用内存,则不可能实现这种处理。
.readline() 和 .readlines() 非常相似。它们都在类似于以下的结构中使用:
fh = open('c:\\autoexec.bat') for line in fh.readlines(): print line
boy:what's your name? girl:my name is lebaishi,what about you? boy:my name is wahaha. girl:i like your name. ============================================== girl:how old are you? boy:I'm 16 years old,and you? girl:I'm 14.what is your favorite color? boy:My favorite is orange. girl:I like orange too! ============================================== boy:where do you come from? girl:I come from SH. boy:My home is not far from you,I live in Jiangsu province. girl:Let's be good friends. boy:OK!需求:将文件(record.txt)中的数据进行分割并按照以下规律保存起来:
boy_log = [] girl_log = [] version = 1 def save_to_file(boy_log,girl_log,version): filename_boy = 'boy_' + str(version) + ".txt" filename_girl = 'girl_' + str(version) + ".txt" fb = open(filename_boy,"w") fg = open(filename_girl,"w") fb.writelines(boy_log) fg.writelines(girl_log) fb.close() fg.close() def process(filename): file = open(filename,"r") for eachline in file.readlines(): if eachline[:6] != "======": mylist = eachline.split(":") if mylist[0] == "boy": global boy_log boy_log.append(mylist[-1]) else: global girl_log girl_log.append(mylist[-1]) else: global version save_to_file(boy_log,girl_log,version) version += 1 boy_log = [] girl_log = [] save_to_file(boy_log,girl_log,version) if __name__ == "__main__": fn = "record.txt" process(fn)
#/usr/bin/python import sys import os os.system("ifconfig > ip.info") fs = open("ip.info",'r+') flag = 0 def get_ip(): for line in fs.readlines(): a = line.split() if a != [] and a[0] == "eth0": flag = 1 if a != [] and a[0] == "lo": flag = 0 if flag == 0: continue else: for item in a: if a[0] == "inet" and item[0:5] == "addr:": return item[5:] ip = get_ip() print ip