论文翻译助手,python3调用剪贴板和谷歌翻译

英语烂,看论文都费劲,谷歌翻译和欧陆词典是我的好朋友。
从pdf里复制段落到谷歌翻译是最常用的操作了。
但是删换行什么的太讨厌。
python写个小工具。

功能:从windows剪切板中读取复制的文字,格式处理,调用谷歌翻译api,返回结果。
环境:win10,Python 3.5.2 |Anaconda 4.2.0 (64-bit)

主程序 Clipboard.py,从这里运行。包括读写剪贴板,格式化处理。
一是注意剪贴板使用中的异常处理,剪贴板打开了必须要关闭,cb.CloseClipboard(),否则会影响复制粘贴使用(如果发现复制粘贴失效了,关闭python即可)
二是注意编码问题。python3里str都是unicode编码,从剪贴板读的时候,格式控制要选win32con.CF_UNICODETEXT,不要用win32con.CF_TEXT。那个出来时bytes类型,转str的时候还会有好多毛病。

# -*- coding: utf-8 -*-
"""
Created on Fri Oct 19 10:48:45 2018

@author: BigFly
"""
import win32clipboard as cb
import win32con
from translate import google_translate

def gettext():
    cb.OpenClipboard()
    try:
        t = cb.GetClipboardData( win32con.CF_UNICODETEXT)
    except TypeError:
        print("There are NO TEXT in clipboard.")
    else :
        return t
    finally:
        cb.CloseClipboard()

def settext(aString):
    cb.OpenClipboard()
    try:
        cb.EmptyClipboard()
        cb.SetClipboardData( win32con.CF_UNICODETEXT, aString)
    except:
        print("Any error in func:settext()")
    cb.CloseClipboard()
    
#删()引用
def deletBracket(source,flags,pad_sym=chr(0)):
    code={"(":1, ")":-1}
    index = [i for i in range(len(source)) if source[i]=="(" or source[i]==")"]
    match,start=0,-1
    for i in index:
        match+= code[ source[i] ]
        if start<0 and match==1:
            start = i
        if match==0:
            concent=source[start: i+1]
            check=sum([concent.find(flag) for flag in flags])+len(flags)
            if check > 0:
                source=source.replace(concent,pad_sym*len(concent),1)
            start=-1
    return source.replace(pad_sym,"")
    
source= gettext()
if source:
    source= source.replace(chr(0),"")
    # huanhang
    source=source.replace("\r","")
    source=source.replace("\n"," ")
    # fenju
    pad_sym=chr(0)
    source=source.replace("e.g. ","e.g."+pad_sym)
    source=source.replace("i.e. ","i.e."+pad_sym)
    source=source.replace("Eq. ","Eq."+pad_sym)
    source=source.replace("Mr. ","Mr."+pad_sym)
    
    source=source.replace(". ",". \r\n")
    source=source.replace(pad_sym," ")
    # qu()
    source=deletBracket(source,["et al.", ", 201", ", 200", ", 199"],pad_sym)
    source=source.replace("  "," ")
    
    settext(source)
    print(source)
    print("[ %d ]"%(len(source)))
    print(google_translate(source))

'''

Our architectures
will have only one representation at one resolution besides
the pooling layers and the convolutional layers that initialize
the needed numbers of channels. Take the architecture in
Table 1 as an example. There are two processes for each
resolution. The first one is the transition process, which
computes the initial features with the dimensions of the next
resolution, then down samples it to 1=4 using a 2×2 average
pooling. A convolutional operation is needed here because
F is assumed to have the same input and output sizes. The
next process is using GUNN to update this feature space
gradually. Each channel will only be updated once, and all
channels will be updated after this process. Unlike most of
the previous networks, after this two processes, the feature
transformations at this resolution are complete. There will
be no more convolutional layers or blocks following this feature representation, i.e., one resolution, one representation.
Then, the network will compute the initial features for the
next resolution, or compute the final vector representation of
the entire image by a global average pooling. By designing
networks in this way, SUNN networks usually have about
20 layers before converting to GUNN-based networks.
'''

调用谷歌翻译的程序,网上找的现成代码稍改了一下
原文:https://blog.csdn.net/yingshukun/article/details/53470424

translate.py
改了返回数据的处理:
result返回的是个长度为9的list,result[0]是翻译结果,后边有备选翻译等其他东西,用不着。
result[0]也是个列表,长度为行数or句子数+1,最后一个是翻译结果的拼音
把result[:-1]中的翻译结果拼接起来就是我们要的了。
该文件可直接运行,测试翻译。

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 23 18:58:26 2018

@author: BigFly
"""

import requests  
from HandleJs import Py4Js    

js=Py4Js()

def google_translate(content):   
    if len(content) > 4891:    
        print("翻译的长度超过限制!!!")    
        return  
    tk = js.getTk(content)
    param = {'tk': tk, 'q': content}
    result = requests.get("""http://translate.google.cn/translate_a/single?client=t&sl=en
        &tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss
        &dt=t&ie=UTF-8&oe=UTF-8&clearbtn=1&otf=1&pc=1&srcrom=0&ssel=0&tsel=0&kc=2""", params=param).json()[0]
    #返回的结果为Json,解析为一个嵌套列表
    return "".join([text[0] for text in result[:-1]])

if __name__ == "__main__":    
    content = """An old woman had a cat. 
The cat was very old; she could not run quickly, and she could not bite, because she was so old. 
One day the old cat saw a mouse; she jumped and caught the mouse. 
But she could not bite it; so the mouse got out of her mouth and ran away, because the cat could not bite it.
Then the old woman became very angry because the cat had not killed the mouse. 
She began to hit the cat. The cat said, "Do not hit your old servant. 
I have worked for you for many years, and I would work for you still, but I am too old. 
Do not be unkind to the old, but remember what good work the old did when they were young."""
    print(google_translate(content))

HandleJs.py
这段是用js生成tk码的,tk码由提交的要翻译的内容生成,相当于是个校验吧,不了解。
注意安装execjs模块时,名字是 PyExecJS。 pip install PyExecJS

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 23 18:57:54 2018

@author: BigFly
"""
import execjs
 
class Py4Js():
    def __init__(self):
        self.ctx = execjs.compile("""
        function TL(a) {
        var k = "";
        var b = 406644;
        var b1 = 3293161072;
        
        var jd = ".";
        var $b = "+-a^+6";
        var Zb = "+-3^+b+-f";
    
        for (var e = [], f = 0, g = 0; g < a.length; g++) {
            var m = a.charCodeAt(g);
            128 > m ? e[f++] = m : (2048 > m ? e[f++] = m >> 6 | 192 : (55296 == (m & 64512) && g + 1 < a.length && 56320 == (a.charCodeAt(g + 1) & 64512) ? (m = 65536 + ((m & 1023) << 10) + (a.charCodeAt(++g) & 1023),
            e[f++] = m >> 18 | 240,
            e[f++] = m >> 12 & 63 | 128) : e[f++] = m >> 12 | 224,
            e[f++] = m >> 6 & 63 | 128),
            e[f++] = m & 63 | 128)
        }
        a = b;
        for (f = 0; f < e.length; f++) a += e[f],
        a = RL(a, $b);
        a = RL(a, Zb);
        a ^= b1 || 0;
        0 > a && (a = (a & 2147483647) + 2147483648);
        a %= 1E6;
        return a.toString() + jd + (a ^ b)
    };
    function RL(a, b) {
        var t = "a";
        var Yb = "+";
        for (var c = 0; c < b.length - 2; c += 3) {
            var d = b.charAt(c + 2),
            d = d >= t ? d.charCodeAt(0) - 87 : Number(d),
            d = b.charAt(c + 1) == Yb ? a >>> d: a << d;
            a = b.charAt(c) == Yb ? a + d & 4294967295 : a ^ d
        }
        return a
    }
    """)
        
    def getTk(self,text):
        return self.ctx.call("TL",text)
    

程序演示:

pdf里选中,复制

运行下clipboard.py,中英文结果都出来了。按句换行,括号引用都去掉了,清爽。

格式处理后的英文还放到了剪贴板里,可以在别处直接粘贴(这是为了方便做ppt用的):

Deep neural networks have become the state-of-the-art systems for image recognition as well as other vision tasks .
The architectures keep going deeper, e.g., from five convolutional layers to 1001 layers .
The benefit of deep architectures is their strong learning capacities because each new layer can potentially introduce more non-linearities and typically uses larger receptive fields .
In addition, adding certain types of layers will not harm the performance theoretically since they can just learn identity mapping.
This makes stacking up layers more appealing in the network designs.

嗯,,还是得好好学英语,不要依赖这个。

你可能感兴趣的:(论文翻译助手,python3调用剪贴板和谷歌翻译)