背景信息: 昨晚有个朋友需要将一份 .txt 文件输出标准格式csv or excel,如是查看啦相关的操作,在此记录
主要使用到的库 csv re
代码 demo:
import re
import csv
with open('./test.csv', 'w', newline='') as f:
write = csv.writer(f, dialect=('excel'))
write.writerow(['a', 'b', 'c', 'e, f, g'])
需求:将文本中内容按照对应格式输出到 csv 中
如果是相连的 ‘[ ]’ 需要合并为一个 ‘[ ]’
如果 starts 数字后面接啦 ‘[ ]’,则删除
表头部分
title = names | Origin | Brand | Description | Color Options | Pricing
文本格式:
{001}, N, Lia, 5 dr hatchback, [purple, light yellow, white, pink], starts from. CNY 48,000
{002}, CN, Lia, 4 dr saloon, [blue, light yellow, silver], starts from. CNY 50,500
{003}, SN, Lxia, 4 dr saloon, [dark blue, matte silver, beige], [higher trim, better kit], starts from. CNY 53,990
{004}, CN, Ti'an, 5 dr compact SUV, [metallic white, metallic grey, metallic blue, matte silver, metallic red, matte blue], starts from. CNY 135,000
{005}, FN, Ti'an, 5 dr mid-size SUV, [matte silver, matte grey, metallic azure, metallic black], [better kit], starts from. CNY 155,000
...
代码部分:
import os
import re
import sys
import csv
def get_format(path, res_path):
title = 'Code names | Place of Origin | Brand Name | Vehicle Description | Color Options | Pricing Information'
title_head = title.split('|')
with open(res_path, 'w', newline='') as f:
csvwrite = csv.writer(f, dialect=('excel'))
csvwrite.writerow(title_head)
with open(path, mode='r+', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
res = line.split(',')
one = res[0].strip()
two = res[1].strip()
three = res[2].strip()
four = res[3].strip()
fives = re.findall(r'\[.*starts', line)[0].replace('starts', '').replace('[', '').replace(']', '').strip().strip(',')
six = re.findall(r'starts.*\d+', line)[0]
res_six = six.split('[')[0]
csvwrite.writerow([one, two, three, four, [fives], res_six])
if __name__ == "__main__":
if len(sys.argv) < 2:
print('错误:请输入文件正确位置')
sys.exit()
path = sys.argv[1]
res_dir = os.path.dirname(path)
res_path = os.path.join(res_dir, 'result.csv')
print('>>>原始文件位置:{}\n>>>目标文件位置:{}'.format(path, res_path))
get_format(path, res_path)