1.任务描述
## 任务描述
本次任务要处理的数据共101227行,样例如下:
```txt
18 Jogging 102271561469000 -13.53 16.89 -6.4
18 Jogging 102271641608000 -5.75 16.89 -0.46
18 Jogging 102271681617000 -2.18 16.32 11.07
18 Jogging 3.36
18 Downstairs 103260201636000 -4.44 7.06 1.95
18 Downstairs 103260241614000 -3.87 7.55 3.3
18 Downstairs 103260321693000 -4.06 8.08 4.79
18 Downstairs 103260365577000 -6.32 8.66 4.94
18 Downstairs 103260403083000 -5.37 11.22 3.06
18 Downstairs 103260443305000 -5.79 9.92 2.53
6 Walking 0 0 0 3.214402
```### Step 1
将数据集中所有信息异常的行删除。
比如上面的样例中第4行数据只有3个元素,而其他行都有6个元素,所以第4行是信息异常的行,将其删除。再如第12行数据的第3个元素明显也是有问题的,所以它也是信息异常的行,将其删除。
数据集中可能还会存在一些其他异常。
将全部信息处理之后,每行的元素以逗号为分隔符,写入文件`test1`。
文件`test1`共100471行,样例如下:```txt
6,Walking,23445542281000,-0.72,9.62,0.14982383
6,Walking,23445592299000,-4.02,11.03,3.445948
6,Walking,23470662276000,0.95,14.71,3.636633
...
```### Step 2
统计文件`test1`的数据中所有动作的数目并打印到屏幕,然后将动作数目对100取整后写入`test2`文件,多余的信息行抛弃。比如统计出`Jogging`的数量为`3021`次,则在屏幕上打印`Movement: Jogging Amount: 3021`,然后将前3000行信息写入`test2`文件。
文件`test2`共100200行。### Step 3
读取文件`test2`的数据,取每行的后3列元素,以空格为分隔符写入文件`test3`。
文件`test3`共100200行,样例如下:```txt
-0.72 9.62 0.14982383
-4.02 11.03 3.445948
0.95 14.71 3.636633
...
```### Step 4
读取文件`test3`的数据,每行数据为一组,每组组内的元素以空格为分隔符,组与组之间的数据以逗号为分隔符,每20组元素为一行,写入文件`finally`。
文件`finally`共5010行,样例如下:```txt
-0.72 9.62 0.14982383,-4.02 11.03 3.445948,0.95 14.71 3.636633,-3.57 5.75 -5.407278,-5.28 8.85 -9.615966,-1.14 15.02 -3.8681788,7.86 11.22 -1.879608,6.28 4.9 -2.3018389,0.95 7.06 -3.445948,-1.61 9.7 0.23154591,6.44 12.18 -0.7627395,5.83 12.07 -0.53119355,7.21 12.41 0.3405087,6.17 12.53 -6.701211,-1.08 17.54 -6.701211,-1.69 16.78 3.214402,-2.3 8.12 -3.486809,-2.91 0 -4.7535014,-2.91 0 -4.7535014,-4.44 1.84 -2.8330324
```## 验收内容
- 4个`*.py`文件
- `test1.py`
- `test2.py`
- `test3.py`
- `finally.py`- 4个运行Python脚本后生成的文件
- `test1`
- `test2`
- `test3`
- `finally`
2.操作步骤
提炼意思,需要做几步:
1.把文件转换成.csv文件(简单说就是加逗号)
2.删除异常行(判断元素个数,)
3.
4.
1.转化成.csv文件规范,说人话就是把所有空格替换成逗号
fp=open('OriginalData','r')
fp_new=open('OriginalData.csv','w')
for row in fp:
row=row.replace(' ',',')
fp_new.write(row)
fp.close()
fp_new.close()
2.
把csv文件放到相对路径下,引用时如下就好了
with open('OriginalData.csv','r',newline='') as csv_in_file:
判断依据为:长度不等于6&&数据3==0
import csv
with open('OriginalData.csv','r',newline='') as csv_in_file:
with open('text1.csv','w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for row in filereader:
if len(row) == 6 and float(row[2]) != 0 :
filewriter.writerow(row)
以下内容为text1.csv文件内容
3.统计Jogging的出现次数;100取整>>test2
import csv
with open('text1.csv','r',newline='') as csv_in_file:
filereader = csv.reader(csv_in_file)
Walking_count = 0
Jogging_count = 0
Upstairs_count = 0
Downstairs_count = 0
Sitting_count = 0
Standing_count = 0
for row in filereader:
if row[1] == 'Walking':
Walking_count += 1
if row[1] == 'Jogging':
Jogging_count += 1
if row[1] == 'Upstairs':
Upstairs_count += 1
if row[1] == 'Downstairs':
Downstairs_count += 1
if row[1] == 'Sitting':
Sitting_count += 1
if row[1] == 'Standing':
Standing_count += 1
print('%d' % Walking_count)
print('%d' % Jogging_count)
print('%d' % Upstairs_count)
print('%d' % Downstairs_count)
print('%d' % Sitting_count)
print('%d' % Standing_count)
Walking_count = Walking_count // 10000
Jogging_count = Jogging_count // 10000
Upstairs_count = Upstairs_count // 10000
Downstairs_count = Downstairs_count // 10000
Sitting_count = Sitting_count // 10000
Standing_count = Standing_count // 10000
csv_in_file.seek(0,0)
with open('text2.csv','w',newline='') as csv_out_file:
filewriter = csv.writer(csv_out_file)
for row_list in filereader:
if row_list[1] == 'Walking' and Walking_count != 0:
filewriter.writerow(row_list)
Walking_count -= 1
if row_list[1] == 'Jogging' and Jogging_count != 0:
filewriter.writerow(row_list)
Jogging_count -= 1
if row_list[1] == 'Upstairs' and Upstairs_count != 0:
filewriter.writerow(row_list)
Upstairs_count -= 1
if row_list[1] == 'Downstairs' and Downstairs_count != 0:
filewriter.writerow(row_list)
Downstairs_count -= 1
if row_list[1] == 'Sitting' and Sitting_count != 0:
filewriter.writerow(row_list)
Sitting_count -= 1
if row_list[1] == 'Standing' and Standing_count != 0:
filewriter.writerow(row_list)
Standing_count -= 1
以下图片是运行结果
读取test2
的数据,取每行的后3列元素,以空格为分隔符写入文件test3
import csv
value = [3,4,5]
with open('text2.csv','r',newline='') as csv_in_file:
with open('text3.csv','w',newline='') as csv_out_file:
filewriter = csv.writer(csv_out_file)
filereader = csv.reader(csv_in_file)
for row in filereader:
row_output = []
for index in value:
row_output.append(row[index])
filewriter.writerow(row_output)
fp = open('text3.csv','r')
fp_new = open('text3','w')
for row in fp:
row = row.replace(',',' ')
fp_new.write(row)
fp.close()
fp_new.close()
4.读取文件`test3`的数据,每行数据为一组,每组组内的元素以空格为分隔符,组与组之间的数据以逗号为分隔符,每20组元素为一行,写入文件`finally`
count = 0
fp = open('text3','r')
fp_new = open('finally','w')
for row in fp:
count = count + 1
if count % 20 != 0:
row = row.replace('\n', ',')
fp_new.write(row)
fp.close()
fp_new.close()
最终文件生成预览