[DIY]给word中的汉字批量加拼音(注音,全篇加拼音)

缘起:师兄发起印经,需要给word电子版的经书加注音。
word自带的加注音只能一次加30个字,一本经书有两万四千多字。网上找了下,好像没有现成的软件,但是python可以很简单的把汉字转换成带声调的拼音。再结合office的vba宏录制,摸索了一下,弄出了个批量给word加注音的脚本。分享如下:

[DIY]给word中的汉字批量加拼音(注音,全篇加拼音)_第1张图片
效果

一、python汉字转拼音

请参考:https://blog.csdn.net/r_coder/article/details/79419318

1.首先需要下载、安装python环境。我安装的python3.8版本。这一步很简单,搜索一下,下载安装就可以了。【记得安装过程中,把“加入环境变量”这个选项勾上。】

2.安装python的拼音库
网上有两个版本的python汉字转拼音的demo,一个是基于pypinyin库的,另一个是基于xpinyin库的。
基于xpinyin库的试出来拼音不带声调,后来用的pypinyin库。

安装完python之后,打开命令行(CMD),输入如下安装命令:
pip install pypinyin
然后等待安装完成。这个过程需要联网。

3.将下面代码复制下来,保存保存到D盘下,名为wordTOPinyin.py,即:(d:/wordTOPinyin.py)。
(保存路径可以是其它,但需要同时修改后面VBA代码中脚本的路径)

import sys, getopt
import pypinyin

word = sys.argv[1:]

s = ''
for i in pypinyin.pinyin(word):
    s = s + ''.join(i) + " "

print(s)

二、VBA宏,给汉字加注音

MS office默认就安装了VBA开发环境,用Alt+F11打开VBA编辑器。
WPS需要单独安装一个VBA_for_WPS的包,需要网上找一下。也是用Alt+F11打开VBA编辑器。

VBA代码:

Attribute VB_Name = "wordToPinyinByBlockSpecified"
Option Explicit

Public Declare Function timeGetTime Lib "winmm.dll" () As Long

Public g_manualCharSet As Object
Public g_specifiedCharSet As Object
Public g_exceptionCharSet As Object

Function isNeedZhuyin(ByVal tmpChar) As Boolean
            
            ' 非汉字,跳出
            If LenB(StrConv(tmpChar, vbFromUnicode)) <> 2 Then
                    isNeedZhuyin = False
                    Exit Function
            End If
            
            ' 手动指定字符 和 例外字符 两种模式,有点像黑白名单,没有做成优先级的方式,简单点,二选一就可以了
            If True Then
                    ' 特殊符号不需要加注音,跳出
                    If "" <> g_exceptionCharSet(tmpChar) Then
                            isNeedZhuyin = False
                            Exit Function
                    End If
                    
                    isNeedZhuyin = True
                    Exit Function
                    
            Else
                    '是指定的字才加注音
                    If "" <> g_specifiedCharSet(tmpChar) Then
                            isNeedZhuyin = True
                            Exit Function
                    End If
                    
                    isNeedZhuyin = False
                    Exit Function
                
            End If
            
End Function

Function initExceptionCharSet()

        Set g_exceptionCharSet = CreateObject("Scripting.Dictionary")

        '不需要加注音的字符,加到这个集合里
        
        g_exceptionCharSet.Add "。", "ok"
        g_exceptionCharSet.Add ",", "ok"
        g_exceptionCharSet.Add "、", "ok"
        g_exceptionCharSet.Add "?", "ok"
        g_exceptionCharSet.Add "(", "ok"
        g_exceptionCharSet.Add ")", "ok"
        g_exceptionCharSet.Add "(", "ok"
        g_exceptionCharSet.Add ")", "ok"
        g_exceptionCharSet.Add ":", "ok"
        g_exceptionCharSet.Add ":", "ok"
        g_exceptionCharSet.Add " ", "ok"
        g_exceptionCharSet.Add "[", "ok"
        g_exceptionCharSet.Add "]", "ok"
        g_exceptionCharSet.Add "-", "ok"
        g_exceptionCharSet.Add "+", "ok"
        g_exceptionCharSet.Add "*", "ok"
        g_exceptionCharSet.Add "《", "ok"
        g_exceptionCharSet.Add "》", "ok"
        g_exceptionCharSet.Add "【", "ok"
        g_exceptionCharSet.Add "】", "ok"
        g_exceptionCharSet.Add "“", "ok"
        g_exceptionCharSet.Add "”", "ok"
        g_exceptionCharSet.Add Chr$(9), "ok"
        g_exceptionCharSet.Add Chr$(10), "ok"
        g_exceptionCharSet.Add Chr$(13), "ok"

  
        '下面是手动注音的字,多音字可以放到这里先不注音,后面再逐一手动注音
        g_exceptionCharSet.Add "夫", "ok"
        g_exceptionCharSet.Add "还", "ok"
    
    
End Function

Function initManualCharSet()

        Set g_manualCharSet = CreateObject("Scripting.Dictionary")
        
        '如果一个字,不是所有出现的地方都相同发音,那就需要后期手动修改
    
        g_manualCharSet.Add "南", "ná"
        g_manualCharSet.Add "无", "mó"
        g_manualCharSet.Add "唵", "ong"
        g_manualCharSet.Add "尽", "jìn"
        g_manualCharSet.Add "晒", "lì"
    
End Function

Function initSpecifiedCharSet()

        Set g_specifiedCharSet = CreateObject("Scripting.Dictionary")

        g_specifiedCharSet.Add "伽", "ok"
        g_specifiedCharSet.Add "梵", "ok"
        'g_specifiedCharSet.Add "乐", "ok"
        g_specifiedCharSet.Add "苾", "ok"
        g_specifiedCharSet.Add "刍", "ok"

End Function

Function getPinyin(ByVal text As String) As String

        Dim Result As String
        Dim cmd As String
        
        Dim wExec As Object
        Dim shellObj As Object
        Set shellObj = CreateObject("WScript.Shell")
        
        'cmd = "C:\Users\taihang\appdata\local\programs\python\python37\pythonw.exe D:\wordToPinyin.py "
        cmd = "pythonw D:\wordToPinyin.py " & text
        
        Set wExec = shellObj.Exec(cmd)
        Result = wExec.StdOut.ReadAll
    
        '如果运行时同时做其它操作,这里有时候会中段,可能是什么信号,懒得处理了,F5继续运行就可以了
        
        '去掉空格和后面的换行,否则注音显示会歪向左边
    '    spaceIdx = InStr(Result, " ")
    '    If 0 <> spaceIdx Then
    '        Result = Mid(Result, 1, spaceIdx - 1)
    '    End If
        
        getPinyin = Result
    
        Set wExec = Nothing
        Set shellObj = Nothing
    
End Function

Function isExceptionChar(ByVal char As String) As Boolean
    '有些字符不需要加注音
    
    If "" <> g_exceptionCharSet(tmpChar) Then
            isExceptionChar = True
    Else
            isExceptionChar = False
    End If
    
End Function

Function getManualPinyin(ByRef tmpChar As String, ByRef charPinyin As String)

        Dim tmpPinyin As String
        tmpPinyin = g_manualCharSet(tmpChar)
        If "" <> tmpPinyin Then
            charPinyin = tmpPinyin
        End If
        
End Function

Function countChar(areaStart As Long, areaEnd As Long)
    '手动算字数
    
    Dim charNum As Long
            
            '全选时要修正一下结束位置,否则全选时会死循环结束不了
            Selection.EndKey unit:=wdStory
            docEnd = Selection.End
            If docEnd < areaEnd Then
                    areaEnd = docEnd
            End If
    
            '恢复光标到起始位置
            Selection.Start = areaStart
            Selection.End = areaStart
            
            charNum = 0
            Do While Selection.End < areaEnd
        
                    If charNum Mod 256 = 0 Then
                            ' 防止假死
                            DoEvents
                            'Debug.Print "正在计算字数: " & Selection.End & "/" & areaEnd & "    百分比:" & Format((Selection.End - areaStart) / (areaEnd - areaStart), "Percent") & "%"
                    End If
            
                    Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                    charNum = charNum + 1
        
            Loop
            
            countChar = charNum
            
End Function

Function showDebugLog(ByVal prompt As String, _
        ByVal allBeginTimeSec As Double, ByVal allDoneCount As Long, ByVal allNum As Long, _
        ByVal phaseBeginTimeSec As Double, ByVal phaseDoneCount As Long, ByVal phaseNum As Long)

        Dim nowTimeSec As Double
        
        Dim allAvrCharCount As Double
        Dim allCostTimeSec As Double
        
        Dim phaseAvrCharCount As Double
        Dim phaseCostTimeSec As Double
        
        Dim tmpMS As Long
        'tmpMS = timeGetTime()
        
        If (0 = allDoneCount) Or (0 = phaseDoneCount) Then
                Exit Function
        End If
        
        nowTimeSec = Timer()
        
        allCostTimeSec = nowTimeSec - allBeginTimeSec
        If 0 <> allCostTimeSec Then
                allAvrCharCount = allDoneCount / allCostTimeSec
        Else
                allAvrCharCount = 1
        End If
        
        phaseCostTimeSec = nowTimeSec - phaseBeginTimeSec
        If 0 <> phaseCostTimeSec Then
                phaseAvrCharCount = phaseDoneCount / (nowTimeSec - phaseBeginTimeSec)
        Else
                phaseAvrCharCount = 1
        End If
        
        Debug.Print prompt & "    all:" & Format(allCostTimeSec / 24 / 3600, "hh:mm:ss") & _
                "    " & Format((allNum - allDoneCount) / allAvrCharCount / 24 / 3600, "hh:mm:ss") & _
                "    " & allDoneCount & "/" & allNum & _
                "    " & Format(allAvrCharCount, "Standard") & "words/second" & _
                "    " & Format(allDoneCount / allNum, "Percent") & _
                "    phase:" & Format(phaseCostTimeSec / 24 / 3600, "hh:mm:ss") & _
                "    " & Format((phaseNum - phaseDoneCount) / phaseAvrCharCount / 24 / 3600, "hh:mm:ss") & _
                "    " & phaseDoneCount & "/" & phaseNum & _
                "    " & Format(phaseAvrCharCount, "Standard") & "words/count" & _
                "    " & Format(phaseDoneCount / phaseNum, "Percent")

End Function

Sub addPinyin()
        '给选中区域加注音
        
        '避免被系统信号打断,好像有点效果
        Application.EnableCancelKey = False
        
        Dim blockCharLimit As Long
        '可自行修改每次处理的字符数量的大小
        blockCharLimit = 100
    
        Dim textForPinyin As String
        Dim tmpChar As String
        Dim pinyin As String
        Dim blockText As String
        
        Dim cursor1 As Long
        Dim cursor2 As Long
        Dim areaStart As Long
        Dim areaEnd As Long
        Dim docEnd As Long
        Dim lastPos As Long
        Dim blockAddedCharCount As Long
        Dim charPinyin As String
        Dim pinyinNum As Long
        
        Dim blockStart As Long
        Dim blockEnd As Long
        Dim blockReadCharCount As Long
        Dim blockReadMovedCharCount As Long
        Dim blockAddMovedCharCount As Long
        
        Dim allReadMovedCharCount As Long
        Dim charNum As Long
        Dim allAddedMovedCharCount As Long
        
        Dim costTimeSec As Double
        Dim avrCharCount As Double
        
        Dim mainLoopCount As Long
        
        Dim logContent As String
        
        Dim appBeginTimeSec As Double
        Dim endTimeSec As Double
        Dim tmpTimeSec As Double
        Dim blockBeginTimeSec As Double
        Dim addBeginTimeSec As Double
        
        'charNum =countChar(areaStart, areaEnd)
        charNum = Selection.Characters.Count
        areaStart = Selection.Start
        areaEnd = Selection.End
        
        appBeginTimeSec = Timer()
        
        initManualCharSet       '手动指定的读音
        initSpecifiedCharSet     '需要加读音的字
        initExceptionCharSet    ' 例外字符
        
        Dim pinyinArr() As String
        
        Debug.Print ""
        
        
        ' test
        'textForPinyin = getPinyin("】")
        

        ' 收集需要加注音的字
        textForPinyin = ""
        blockText = ""
        allReadMovedCharCount = 0
        
        mainLoopCount = 0
        'blockStart = areaStart
        allAddedMovedCharCount = 0
        blockEnd = areaStart
        Selection.Start = areaStart
        Selection.End = areaStart
        Do While (allReadMovedCharCount < charNum)
        
                blockBeginTimeSec = Timer()
                
                '防止假死
                DoEvents
                
                mainLoopCount = mainLoopCount + 1
        
                Debug.Print "begin to collect words"
                
                blockStart = Selection.End
                Selection.Start = blockStart
                
                'blockLoopCount = 0
                
                blockText = ""
                
                blockReadCharCount = 0
                blockReadMovedCharCount = 0
                
                Do While (allReadMovedCharCount < charNum)
                
                        '防止假死
                        DoEvents
                        
                        '进度显示
                        If allReadMovedCharCount Mod 64 = 0 Then
                                showDebugLog "step1/2    collect words", appBeginTimeSec, allReadMovedCharCount + 1, charNum, blockBeginTimeSec, blockReadCharCount + 1, blockCharLimit
                        End If
                        
                        ' 找句末,非汉字,或者不需要加注音的符号
                        'Do While (allReadMovedCharCount < charNum) And (blockReadCharCount < blockCharLimit)
                        
                                '防止假死
                                'DoEvents
                                
                                lastPos = Selection.End
                                Selection.Start = Selection.End
                                '光标往下移一个字
                                Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                                Selection.Start = lastPos
                                
                                allReadMovedCharCount = allReadMovedCharCount + 1
                                blockReadMovedCharCount = blockReadMovedCharCount + 1
                                
                                tmpChar = Selection.text
                                
                                If True = isNeedZhuyin(tmpChar) Then
                                        blockText = blockText + tmpChar
                                        blockReadCharCount = blockReadCharCount + 1
                                ElseIf (blockCharLimit <= blockReadCharCount) Then
                                        ' 达到限制后,遇到句末才结束。python汉字转语音模块,多音字会根据词自动选择相应的读音
                                        Exit Do
                                End If
                        
                        'Loop

                Loop
                
                blockEnd = Selection.End
                
                pinyin = getPinyin(blockText)
                pinyinArr = Split(pinyin, " ")
                pinyinNum = UBound(pinyinArr) + 1
                
                Debug.Print "begin to add pinyin"
                
                addBeginTimeSec = Timer()
                
                ' add pinyin word by word
                charPinyin = ""
                Selection.Start = blockStart
                Selection.End = blockStart
                blockAddedCharCount = 0
                blockAddMovedCharCount = 0
                Do While (blockAddMovedCharCount < blockReadMovedCharCount)
                
                        '防止假死
                        DoEvents
                        
                        '进度显示
                        If blockAddedCharCount Mod 16 = 0 Then
                                showDebugLog "step2/2    add pinyin", appBeginTimeSec, allAddedMovedCharCount + 1, charNum, addBeginTimeSec, blockAddMovedCharCount + 1, blockReadMovedCharCount
                        End If
                        
                        allAddedMovedCharCount = allAddedMovedCharCount + 1
                        blockAddMovedCharCount = blockAddMovedCharCount + 1
                        
                        Selection.Start = Selection.End
                        lastPos = Selection.End
                        Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                        Selection.Start = lastPos
                        tmpChar = Selection.text
                        
                        'Debug.Print "加注音:" & blockAddedCharCount & "/" & blockReadCharCount & "    " & tmpChar
                        
                        If True = isNeedZhuyin(tmpChar) Then
                        
                                charPinyin = pinyinArr(blockAddedCharCount)
                                
                                '如果是手动指定读音的字,就用手动指定的读音
                                getManualPinyin tmpChar, charPinyin
                                
                                
                                 ' 不需要加注音,跳出
                                'If False = isExceptionChar(tmpChar) Then
                                    
                                        '注音操作后,字的大小会增长,当前Selection.End等属性会自动变化
                                        cursor1 = lastPos
                                        cursor2 = Selection.End
                                        'cursor2 = cursor1 + 1
                                        With Selection
                                                '.text = ""
                                                .SetRange Start:=cursor1, End:=cursor2
                                                .Range.PhoneticGuide text:=charPinyin, Alignment:=wdPhoneticGuideAlignmentCenter, Raise:=26, FontSize:=12, FontName:="微软雅黑 Light"
                                                '.SetRange Start:=cursor1, End:=cursor2
                                        End With
                                        
                                'End If
                                
                                blockAddedCharCount = blockAddedCharCount + 1
                                
                        End If
                        
                Loop
                
                
'                '如果add中移动光标的数量与read中的不一致,就需要修正光标位置
'                If (blockReadMovedCharCount <> blockAddMovedCharCount) Then
'                        Selection.Start = Selection.End
'                        Selection.MoveRight unit:=wdCharacter, Count:=blockReadMovedCharCount - blockAddMovedCharCount, Extend:=wdMove
'                End If
                
        Loop
        
        Set g_manualCharSet = Nothing
        Set g_specifiedCharSet = Nothing
        Set g_exceptionCharSet = Nothing
        
        ' 恢复选中区域
        Selection.Start = areaStart
        
        endTimeSec = Timer()
        
        logContent = "all cost time:" & Format((endTimeSec - appBeginTimeSec) / 24 / 3600, "hh:mm:ss") & "    all words count:" & charNum & _
            "    average:" & Format(charNum / (endTimeSec - appBeginTimeSec), "Standard") & "word/second"
        
        Debug.Print logContent
        MsgBox logContent
        

End Sub

如果直接复制网页网页上的代码到VB编辑器中,汉字会乱码。先把网页代码复制到记事本(notepad),然后另存为ANSI编码的文件,后缀名为.bas。否则中文字符在vba编辑器中会变成乱码。然后修改,在VB编辑器中导入文件,这样比较省事。

打开VB编辑器的【视图】->【立即窗口】可以查看日志和大概进度。

踩过的坑:汉字加上注音后,当前字的范围会变化,加上描述属性后,大小会增加,Selection.Start和Selection.End都会变动,后面内容的位置也会因此变化,用代码+1的方式来移动光标会出现各种乱码。(最新测试好像又可以了。)

三、运行

有一些边界条件没有测试,常用的就是选中一段文字,也可直接Crtl+a全选,然后
WPS选择【开发工具】->【宏】(MS选择【视图】->【宏】),
选择 addPinyin 宏,运行,就可以看到word中在一个字一个字的加注音。

运行期间,最好不要操作word(可以做切换到其他的应用),不要切换到word的其他文档去,不要点击文档里的内容。否则就会变成给当前查看的文档、或者点击处的内容加注音,而且可能导致死循环。(如果死循环了,可以杀掉进程重新运行,或者在VBA编辑器里选择【运行】->【重新设置】以重置。代码里没有加自动保存的内容,重置可以保留已经处理过的结果。)

运行时会看到光标会移动两遍,第一遍是遍历选中内容,这个比较快;第二遍是加注音,慢一些。

想找一下看看能不能多文档并行处理的方法,不知道怎么在不用select的情况下光标MoveRight,也没找到文档(懒)。想要多开,就把文档拆分成多个,用虚拟机或者多台电脑。

四、收尾,人工校对,重新调整排版

加了注音后,WPS Word里看排版会变化,很多换行,不知道怎么设置,用MS Word打开就好很多。转成常用的pdf格式即可。

五、定制

initManualCharSet 和 initExceptionCharSet 两个函数中的内容可以根据需要自行修改。
一般来说,文档里的中文字符都需要加到initExceptionCharSet中。可以先把未加拼音的原文档备份一下,先试几页,大概就知道需要排除那些符号了。

需要自定义拼音的格式,就先按要求修改一个例子,录制成宏,然后把代码中.Range.PhoneticGuide这一行后面的属性修改成录制的宏里面对应的属性。

文档中的汉字号,需要加到initExceptionCharSet中去,不然会出现注音一行也会带符号的情况。

加上注音后,WPS Word中无法搜索,MS Word中可以搜索单个字。
已经有注音的字,会被判断为不是汉字,已有注音不会被改变。

第一个版本是一个字一个字的转换的,效率比较慢,两万四千个字大概两个小时加完。后来改成批量的,时间缩短到四十分钟左右。

有师兄有更高效率的方案,请告诉我一下,非常感激。

感谢给python库做贡献的前辈们!
感谢写示例demo的前辈们!

补:专业加拼音软件 adobe indesign

内容里不能有跳转链接,不然会出问题。
如果经常用,可以自定义菜单,给某个菜单卡新建一个组,把这个宏命令加进去,成为一个固定按钮。

你可能感兴趣的:([DIY]给word中的汉字批量加拼音(注音,全篇加拼音))