由于明文符合某一语言的语法规范, 因此其字符出现的频率(词频)分布并不是均匀的, 符合一定统计规律. 以英文为例, 四字母词的排版就符合一定规律.
若将通过对密文暴力破解得到的明文进行词频统计, 即计算其四字母词概率的对数, 在长度相同的情况下,概率对数越大(绝对值越小), 与自然语言越接近.
ngram_score.py
如下文件ngram_score.py
中代码用于检测伪明文与真实明文的接近程度, 会对给定的伪明文进行评分, 评分越高(绝对值越小), 则越可能为自然语言, 即越可能为明文.
english_quadgrams.txt
需要与以下破解代码文件置于同一目录.'''
Allows scoring of text using n-gram probabilities
17/07/12
'''
from math import log10
class ngram_score(object):
def __init__(self,ngramfile,sep=' '):
''' load a file containing ngrams and counts, calculate log probabilities '''
self.ngrams = {}
for line in file(ngramfile):
key,count = line.split(sep)
self.ngrams[key] = int(count)
self.L = len(key)
self.N = sum(self.ngrams.itervalues())
#calculate log probabilities
for key in self.ngrams.keys():
self.ngrams[key] = log10(float(self.ngrams[key])/self.N)
self.floor = log10(0.01/self.N)
def score(self,text):
''' compute the score of text '''
score = 0
ngrams = self.ngrams.__getitem__
for i in xrange(len(text)-self.L+1):
if text[i:i+self.L] in self.ngrams: score += ngrams(text[i:i+self.L])
else: score += self.floor
return score
以下为一段用单表代换密码加密的密文:
UNGLCKVVPGTLVDKBPNEWNLMGVMTTLTAZXKIMJMBBANTLCMOMVTNAAMILVTMCGTHMKQTLBMVCMXPIAMTLBMVGLTCKAUILEDMGPVLDHGOMIZWNLMGBZLGKSMAZBMKOMKTWNLMGBZKTLCKAMHMIMDMVGBZLXBLCSAZTBMMOMTVPGMOMVKJLTQPXCBPNEJLBBLUILVDKJKZ
请求出对应的明文(小写字母,单词之间不需空格)
直接利用文件subcipher.ipynb
中的代码, 如下
import random
from ngram_score import ngram_score
#参数初始化
ciphertext ='UNGLCKVVPGTLVDKBPNEWNLMGVMTTLTAZXKIMJMBBANTLCMOMVTNAAMILVTMCGTHMKQTLBMVCMXPIAMTLBMVGLTCKAUILEDMGPVLDHGOMIZWNLMGBZLGKSMAZBMKOMKTWNLMGBZKTLCKAMHMIMDMVGBZLXBLCSAZTBMMOMTVPGMOMVKJLTQPXCBPNEJLBBLUILVDKJKZ'
parentkey = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
#只是用来声明key是个字典
key = {'A':'A'}
#读取quadgram statistics
fitness = ngram_score('english_quadgrams.txt')
parentscore = -99e9
maxscore = -99e9
j = 0
print('---------------------------start---------------------------')
while 1:
j = j+1
#随机打乱key中的元素
random.shuffle(parentkey)
#将密钥做成字典
#密文:明文
for i in range(len(parentkey)):
key[parentkey[i]] = chr(ord('A')+i)
#用字典一一映射解密
decipher = ciphertext
for i in range(len(decipher)):
decipher = decipher[:i]+key[decipher[i]]+decipher[i+1:]
parentscore = fitness.score(decipher)#计算适应度
#在当前密钥下随机交换两个密钥的元素从而寻找是否有更优的解
count = 0
while count < 1000:
a = random.randint(0,25)
b = random.randint(0,25)
#随机交换父密钥中的两个元素生成子密钥,并用其进行解密
parentkey[a],parentkey[b]= parentkey[b],parentkey[a]
key[parentkey[a]],key[parentkey[b]] = key[parentkey[b]],key[parentkey[a]]
decipher = ciphertext
for i in range(len(decipher)):
decipher = decipher[:i]+key[decipher[i]]+decipher[i+1:]
score = fitness.score(decipher)
#此子密钥代替其对应的父密钥,提高明文适应度
if score > parentscore:
parentscore = score
count = 0
else:
#还原
parentkey[a],parentkey[b]=parentkey[b],parentkey[a]
key[parentkey[a]],key[parentkey[b]]=key[parentkey[b]],key[parentkey[a]]
count = count+1
#print(count,parentscore)
#输出该key和明文
if parentscore > maxscore:
maxscore = parentscore
print ('Currrent key: '+''.join(parentkey))
print ('Iteration total:', j)
print ('Plaintext: ', decipher,maxscore)
sys.stdout.flush() HillCryptosyste
buticannotsingaloudquietnessismyfarewellmusicevensummerinsectsheapsilenceformesilentiscambridgetonightveryquietlyitakemyleaveasquietlyasicameheregentlyiflickmysleevesnotevenawispofcloudwillibringaway
以下是一段用HILL密码加密的密文:(密钥是二维矩阵)
RIAIXVVYYOQQBOCFHKBSQGALRWNAXJDSNGNGYLUFUJADTAHANGNGYLDBQGCHOXHAPEITCHGRQQWUSFXVVYAQHOFWMBOBUQILGCMGQTVZRUCRKXITIPLCIQOBDWSIQGHCNAAIWHILVCFSITUEJBQBJBXVEULECHQVYRQIMOELCRYRWHHUMLEDHUCHSFWHXVHKBSAQAPAVHAUEJBWNQBBBAPSUNAQPQGGJ
请解密出对应明文。
需要对原文件subcipher.ipynb
中代码进行修改, 将单表代换部分修改问HILL密码. 修改后代码如下.
import random
from ngram_score import ngram_score
#参数初始化
ciphertext ='RIAIXVVYYOQQBOCFHKBSQGALRWNAXJDSNGNGYLUFUJADTAHANGNGYLDBQGCHOXHAPEITCHGRQQWUSFXVVYAQHOFWMBOBUQILGCMGQTVZRUCRKXITIPLCIQOBDWSIQGHCNAAIWHILVCFSITUEJBQBJBXVEULECHQVYRQIMOELCRYRWHHUMLEDHUCHSFWHXVHKBSAQAPAVHAUEJBWNQBBBAPSUNAQPQGGJ'
#读取quadgram statistics
fitness = ngram_score('english_quadgrams.txt')
score = -99e9
maxscore = -99e9
j = 0 #记录迭代次数
H = HillCryptosystem(AlphabeticStrings(), 2)
C = H.encoding(ciphertext)
print('---------------------------start---------------------------')
for K in H.key_space(): # 遍历密钥空间
if not K.is_invertible(): # 若秘钥的矩阵是不可逆的则跳过
continue
decipher = H.deciphering(K, C)
score = fitness.score(str(decipher))
#输出该key和明文
if score > maxscore:
maxscore = score
print ('Currrent key: ')
print (K)
print ('Plaintext: ', decipher)
print ('Score: ', score)
sys.stdout.flush()
---------------------------start---------------------------
Currrent key:
[1 0]
[0 1]
Plaintext: RIAIXVVYYOQQBOCFHKBSQGALRWNAXJDSNGNGYLUFUJADTAHANGNGYLDBQGCHOXHAPEITCHGRQQWUSFXVVYAQHOFWMBOBUQILGCMGQTVZRUCRKXITIPLCIQOBDWSIQGHCNAAIWHILVCFSITUEJBQBJBXVEULECHQVYRQIMOELCRYRWHHUMLEDHUCHSFWHXVHKBSAQAPAVHAUEJBWNQBBBAPSUNAQPQGGJ
Score: -1761.0604450089113
Currrent key:
[3 0]
[0 1]
Plaintext: XIAIZVHYIOOQJOSFLKJSOGALXWNAZJBSNGNGILYFYJADPALANGNGILBBOGSHWXLAFEUTSHCROQQUGFZVHYAQLOTWEBWBYQULCCEGOTHZXUSRMXUTUPVCUQWBBWGIOGLCNAAIQHULHCTSUTYEDBOBDBZVKUVESHOVIROIEOKLSRIRQHLUELKDLUSHGFQHZVLKJSAQAPAVLAYEDBQNOBJBAPGUNAOPOGCJ
Score: -1696.4826419233734
Currrent key:
[0 3]
[1 2]
Plaintext: ARUAJXUVGYMQEBJCQHOBAQVAWRANFXEDCNCNFYXUHUBAWTEHCNCNFYHDAQBCHOEHAPBIBCTGMQSWHSJXUVOAAHEFBMROSUHIOGUMNQDVERNCBKBIRICLAIROODISAQWHANUAFWHIEVUFBIOUDJHQDJJXEEULBCFQHYSQOMBENCHYFWCHNMHECHBCHSFWJXQHOBOAFAHAEHOUDJHWHQRBFAMSANDQAQZG
Score: -1686.4556916705883
Currrent key:
[0 1]
[3 4]
Plaintext: UXIAZZWHIIMOEJLSSLIJCOLAIXANNZOBGNGNFINYRYDASPILGNGNFIXBCONSNWILKFRUNSJCMOIQHGZZWHQAWLYTLERWYYJUUCQEPOXHGXXSBMRUNUWVOURWSBKGCOKLANIAVQJUAHUTRUMYPDXOPDZZGKYVNSROLIEOYEXKXSLIVQCLVEPKCLNSHGVQZZSLIJQAPAVAILMYPDBQXORJPAWGANLOCOBC
Score: -1666.975608158808
Currrent key:
[4 1]
[5 0]
Plaintext: IXIEVTYLOEQGOPFMKJSRGOLMWRANJNSHGDGDLWFAJCDIAJARGDGDLWBFGOHAXAAREFTCHARYQGUEFKVTYLQIOLWZBMBCQMLYCKGITOZFUDRSXUTCPACLQUBCWJISGOCFANIEHELYCNSXTCEGBBBSBBVTUQEZHAVCRMICOMLSRSRMHEUBLEDOUBHAFKHEVTKJSRQIPOVEAREGBBNUBSBPPOUYANPMGOJU
Score: -1645.804727894536
Currrent key:
[0 3]
[3 6]
Plaintext: AXUAJZUHGIMOEJJSQLOJAOVAWXANFZEBCNCNFIXYHYBAWPELCNCNFIHBAOBSHWELAFBUBSTCMOSQHGJZUHOAALETBERWSYHUOCUENODHEXNSBMBURUCVAURWOBIGAOWLANUAFQHUEHUTBUOYDDHODDJZEKUVBSFOHISOOEBKNSHIFQCLNEHKCLBSHGFQJZQLOJOAFAHAELOYDDHQHORJFAMGANDOAOZC
Score: -1636.6702711616374
Currrent key:
[8 3]
[5 0]
Plaintext: UNUUHJIHWGOMWRTWMDGBCAVIQJANDFGRCPCPVSTKDUBOAJARCPCPVSJHCALOZUARKNPOLOXGOMYSTUHJIHOOWNQRJOJEOSVUSOCUPARDYRXAZOPOFESPOAJEQBUICASJANUULSVUSRGHPOKOJDJUJDHJYEKHLOHSXUUSWOVOXAXULSYPVABUYPLOTULSHJMDGBOOFSHUARKOJDNUJUJRFSYMANFQCADM
Score: -1622.28538113562
Currrent key:
[8 3]
[1 4]
Plaintext: UNGEPHURKWQSOTBUGLEVCAFMYHANTBYTYDYDLOZCPEZIIHSTYDYDLOVRCAJILESTKNNIJILWQSOOFEPHURMIWNITHIBGEOHEQIOEPALLQTXAXINIXGODOABGOVEMCAAHANGEBOHEKTSRNIIIDLVEDLPHQGWRJIXOJEKOUITIXAJEBOUDVANEUDJIFEBOPHGLEVMIVOTESTIIDLZEVEBTVOASANZYCAFS
Score: -1509.820783666875
Currrent key:
[10 7]
[ 1 4]
Plaintext: QNKEZHQRIWSSGTTUKLYVMARMOHANXBOTODODBOHCZEHIWHETODODBOJRMAPIBEETINNIPIBWSSGOREZHQRUICNWTDITGYODESIGEZABLSTVAVINIVGGDGATGGVYMMAAHANKETODEITERNIWIFLJEFLZHSGCRPIVOPEIOQIXIVAPETOQDJANEQDPIRETOZHKLYVUIJOXEETWIFLHEJETTJOASANHYMARS
Score: -1492.9242246562796
Currrent key:
[20 1]
[ 1 4]
Plaintext: INSETHIREWWSQTDUSLMVGAPMUHANFBUTUDUDHOXCTEXIYHCTUDUDHOLRGABIHECTENNIBIHWWSQOPETHIRKIONYTVIDGMOVEWIQETAHLWTRARINIRGQDQADGQVMMGAAHANSEDOVEETCRNIYIJLLEJLTHWGORBIROBEEOIIFIRABEDOIDLANEIDBIPEDOTHSLMVKILOFECTYIJLXELEDTLOASANXYGAPS
Score: -1454.8276553642136
Currrent key:
[ 2 17]
[ 3 8]
Plaintext: CNUEBHYRSWUSYTXUALIVIAVMYHANXBMTCDCDROHCREBIMHOTCDCDROPRIAPIBEOTONFIPIJWUSQODEBHYROIKNATHILGCOLEEIAEVAHLGTBATIFIVGODEALGWVEMIAGHANUEDOLEITQRFIYIBLPEBLBHGGGRPINOTEAOUIDIBATEDOMDTAJEMDPIDEDOBHALIVOIFOHEOTYIBLFEPELTFOISANLYIAPS
Score: -1440.5385389242883
Currrent key:
[20 1]
[ 3 8]
Plaintext: INCERHSRUWCSSTBUALGVGATMSHANBBWTIDIDDOPCDERIWHETIDIDDOVRGAVIREETENHIVIXWCSMOZERHSREIONATPIFGIOFEQIAETAPLYTRALIHITGEDQAFGKVQMGAYHANCEZOFEGTMRHISIRLVERLRHYGYRVINOLEAOCIZIRALEZOWDLAXEWDVIZEZORHALGVEIHOPEETSIRLHEVEFTHOGSANFYGAVS
Score: -1405.0489191990162
Currrent key:
[12 3]
[ 7 20]
Plaintext: OXCETTELEEAGAPRMEJOREOTMYRANDNUHIDIDNWTAHCRISJIRIDIDNWTFEOLAZAIRUFTCLATYAGGENKTTELEISLOZHMNCMMRYMKSIROBFEDHSNUTCFAOLCUNCIJESEOCFANCETERYSNAXTCWGLBTSLBTTEQIZLALCVMYCUMFSHSVMTEABDEDOABLANKTETTEJOREIHOPEIRWGLBBUTSNPHOUYANDMEORU
Score: -1398.6684497659912
Currrent key:
[ 4 21]
[11 11]
Plaintext: BNCEGHRREWASNTRURLBVEATMLHNNQBHTVDVDNOTCHERIFHVTVDVDNOGREALIZEVTHNTILITWASGONEGHRREIFNBTHINGMOREMISERAOLRTHANITIFGBDCANGVVEMEAPHNNCETOREFTNRTIWIYLTEYLGHEGVRLILOVEYOUIFIHAVETONDDADENDLINETOGHRLBVEIHOPEVTWIYLBETEATHOUSNNDYEARS
Score: -1330.116096349177
Currrent key:
[ 4 21]
[23 10]
Plaintext: ONCUTJEHEGAMARRWEDOBEATIYJANDFURIPIPNSTKHUROSJIRIPIPNSTHEALOZUIRUNTOLOTGAMGSNUTJEHEOSNORHONEMSRUMOSURABDERHANOTOFEOPCANEIBEIEACJANCUTSRUSRAHTOWOLDTULDTJEEIHLOLSVUYSUOFOHAVUTSAPDADUAPLONUTSTJEDOBEOHSPUIRWOLDBUTUNRHSUMANDQEARM
Score: -1313.3267792056824
Currrent key:
[ 4 21]
[11 24]
Plaintext: ONCETHEREWASATRUELOVEATMYHANDBUTIDIDNOTCHERISHITIDIDNOTREALIZEITUNTILITWASGONETHEREISNOTHINGMOREMISERABLETHANITIFGODCANGIVEMEACHANCETORESTARTIWILLTELLTHEGIRLILOVEYOUIFIHAVETOADDADEADLINETOTHELOVEIHOPEITWILLBETENTHOUSANDYEARS
Score: -946.2853518865237
oncetherewasatrueloveatmyhandbutididnotcherishitididnotrealizeituntilitwasgonethereisnothingmoremiserablethanitifgodcangivemeachancetorestartiwilltellthegirliloveyouifihavetoaddadeadlinetotheloveihopeitwillbetenthousandyears
已知一背包加密的公钥为
{615436700291,415460700271,15508700231,846430100773,677471501215,139578302079,179168604148,789306608798,563224517265,364498233536,229056467022,670323428329,115934481316,44989786476,518624653302,149955258190,728568829281,796899516776,546782575075,178164449829,356328899658,712657799316,569303048254,223205396187,446410792374,892821584748,524144817108,132888933895,611875519857,877653387647,839906074973,35774353074}
,密文为6020587936087
,试求明文二进制表示(形如:01001010000001110101001100001110
).
代码根据LLL算法进行修改, 如图对原A矩阵加上了一行约束条件, guess
是猜测结果中"1"的个数.
pubKey = [615436700291,415460700271,15508700231,846430100773,677471501215,139578302079,179168604148,789306608798,563224517265,364498233536,229056467022,670323428329,115934481316,44989786476,518624653302,149955258190,728568829281,796899516776,546782575075,178164449829,356328899658,712657799316,569303048254,223205396187,446410792374,892821584748,524144817108,132888933895,611875519857,877653387647,839906074973,35774353074]
nbit = len(pubKey)
encoded = 6020587936087
def LLL_by_guess(guess):
# create a large matrix of 0's (dimensions are public key length +1)
A = Matrix(ZZ, nbit + 2, nbit + 2) # 矩阵加一行一列
# fill in the identity matrix
print(f'start {guess}')
for i in range(nbit):
A[i, i] = 1
# replace the bottom row with your public key
for i in range(nbit):
A[i, nbit] = pubKey[i]
#添加最后一行前nbit个置为1
for i in range(nbit):
A[i, nbit+1] = 1
# last element is the encoded message
A[nbit, nbit] = -int(encoded)
# 设置-guess
A[nbit+1, nbit+1] = -guess
res = A.LLL(delta=1, algorithm='NTL:LLL')
for i in range(0, nbit + 2):
# print solution
M = res.row(i).list()
flag = True
for m in M:
if m != 0 and m != 1:
flag = False
break
if flag:
print('***************')
print (i, M)
print('res=', end='') # 输出结果
for i in M[:-2]:
print(i, end='')
print('')
for i in range(1, nbit+1):
LLL_by_guess(i)
10111010011001110101001100001000
利用4字母法进行解密