问题
- 从图数据库中取出能构成三角形的三个节点id,并进行去重。
- 每一列都是三个节点的id,三个id都相同的话,视为相同的三角形。
- 例如,数据第一行与第二行就是相同的两个三角形。
原始数据
# test.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda
0000068d96366d5e052ed65eaecb1112 dfb644784e6292b8d7f499fc53229dda 74a995fb9b79e84232b7510644688cd8
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc
000021d8344e102a857740ee319c40e9 9a29e26feb9d254a8f70c9ad94e8d1dc c27c333b1586b239bb302b23c70ce274
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736
0000434e444cab7991535ab1dc7e0a43 16de98e5c7f9751fc6b66526ce875736 2ab9986c1d02386ca74155e55fcdb64c
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c
00004e274353b7bd829ad789930a799c 18ee36edb4a3ed1c9d35cca2a01af37c a76fdec4ace13644f6f50934c53fb484
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea
00009f5e45eb2f2c37ed8e349ba63e76 1f111ee13d222ab4f4d3d128af201aea c8fbf4f7114038db015eb8debf0bfba8
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f
解决思路:三个相同的值,无论处理顺序如何,都应该生成相同的一个值。
比如:a+b+c = c+b+a,a^b^c = c^b^a
注:庆幸知识没有完全还给老师。
方法一
将三个点的id值先编码再异或得到一个值,根据这个值进行去重。
编码异或之后的值,不方便打印,再用base64再处理成可读字符串。
from base64 import b64encode
vs = set()
with open('test.csv','r') as fr,open('out.csv','w') as fw:
for line in fr:
v0,v1,v2 = line.strip().split(',')
v = b64encode(bytes(i ^ j ^ k for i, j ,k in zip(v0.encode('utf-8'), v1.encode('utf-8'), v2.encode('utf-8')))).decode('utf-8')
if v not in vs:
fw.write(f'{v0},{v1},{v2},{v}\n')
vs.add(v)
else:
continue
最终结果
# output.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda Y2IzPz03aT40MTI9am5jb2cwNmZoPmMwYGJnaDA2MWs=
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOm1mOjk8N2A+NDFiYjFuMj01NGZqNDc2ZTVnN2E=
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305 aTYzMTxqYzIwPzQ+NmAxaDdjYjhjZTYwPDVkaDAyY2Y=
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc amI/NmtvYDQ3YGNnNzZgNGgwYzIxMzcyMDljMmdgYjA=
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f YGFhY21mMGNiYjRmYjw3YjExMmc/NTw1OWxnZTM6P2g=
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc amM1amQwYTxnYzU3YTc1OWIxMzdlazYyaTJsODUzNm4=
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOm9hZmU2NWdtM2VlZjluN29lMjZqZmNsZDBmY2o=
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4 NDMxN2AwbDdtZWZrY2QyNmQ8amMzZjQxZGthYGVkMmE=
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736 M2c2bDQzZzNmZ2JoZW8wPDswYzQ2YTUyMmBsNmdgM2Y=
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c YD9jM2M2NGc3ZDExNGVnM2dgbGE3bWo/OzYyZjM+NjQ=
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113 YzIxMjtlaTAwNmRgZWNmMjNiZzVpOjU+aTo3Z2UxPzE=
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea Ym5nY243NmM2YGNgMz80NWUyNDI+bGVpOmJjZGRmM28=
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOmQ2NzgxNDZqMDNlZDJsZTxpZ2A7bDA0MTczMWU=
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1 NzdgMGAzMDlrYjlnPjo7Y2M/Mj9hMTNnZGE7P2RiNzA=
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f YDY3N25mNWU4MDFiZDtjMTIzZDI2MzI4Nz42aDNgPzQ=
想了一晚,还是觉得方法一太别扭,应该要把3个十六进制数异或成一个十六进制数。
仔细再看了一下bytes的官方文档,看到 bytes.fromhex,hex 方法,果然官方文档是最棒的指导~
方法二
将三个点的id值读取成bytes再异或得到一个值,根据这个值进行去重。
vs = set()
with open('test.csv','r') as fr,open('out.csv','w') as fw:
for line in fr:
v0,v1,v2 = line.strip().split(',')
v = bytes(i ^ j ^ k for i, j ,k in zip(bytes.fromhex(v0), bytes.fromhex(v1), bytes.fromhex(v2))).hex()
# print(f'{v0},{v1},{v2},{v}')
if v not in vs:
fw.write(f'{v0},{v1},{v2},{v}\n')
vs.add(v)
else:
continue
最终结果
# out.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda ab1fd70e432d17a4e06d1ea4b9810010
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2a6fa9e59e41bb172d54f1670c5c3a
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305 2631c1c20d4e6d1378d8ae60c5d142c9
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc 3df6469259ce14b270a4133669c2cbb6
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f 9eaa4f08bb6f9c59130cf7c5b7ee1af1
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc 5855f08cca558759b137a26062787341
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2a48fc47e63ccf9774c241f85b0dc1
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4 4317b0774ef4ab24dc1a3f41b0afcf28
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736 3c6743c79eb7e60af0a46a724d34eb39
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c bf81a60e5b116ce5e95a121ff62f3e64
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113 a272bc0622fde8d23b856a5a0a7ce1f1
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea d7ea754869893d43c260e7c28b8ddf34
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2af67a364303c925ee0c9b5261735e
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1 75b093092998eebc8b2fe11ef8bffd32
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f 96777d5c841bdba123d2652878631bf4
这样看起来就舒服很多,合理多了。
总结
- 无序集合的比较,如果是一个一个对比,想一想都觉得难以实现。因此选择将无序集合生成一个值能够作为代表,再进行比较。
- 多看官方文档