python自带语音识别库识别语音文件(wav)

最近在试语音转文本,了解了一些相关的东西,记录一下。

一、python speechRecogniton库

python自带的speechRecognition库是一个多功能的实现语音识别的库,细节网上有很多,可以搜到,可以参考

https://blog.csdn.net/alice_tl/article/details/89684369

 

二、使用说明

  1. 安装speechRecognition库
    pip install speechrecognition

     

  2. 使用,定义不同类别的函数
    import speech_recognition as sr
    
    global r
    r = sr.Recognizer()
    
    #调用谷歌的语音api
    def google(audio):
    	try:
    		print("Google: ")
    		return r.recognize_google(audio)
    	except sr.UnknownValueError:
    		print("Google Speech Recognition could not understand audio")
    		return None
    	except sr.RequestError as e:
    		print("Could not request results from Google Speech Recognition service; {0}".format(e))
    		return "None"
    
    #使用wit的
    def wit(audio):
    	# recognize speech using Wit.ai
    	WIT_AI_KEY = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx"  # Wit.ai keys are 32-character uppercase alphanumeric strings
    	try:
    		#print("Wit.ai: ")
    		return r.recognize_wit(audio, key=WIT_AI_KEY)
    	except sr.UnknownValueError:
    		print("Wit.ai could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from Wit.ai service; {0}".format(e))
    		return "None"
    
    #调用bing的
    def bing(audio):
    	BING_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    	# recognize speech using Microsoft Bing Voice Recognition
    	try:
    		#print("Microsoft Bing Voice Recognition: ")
    		return r.recognize_bing(audio, key=BING_KEY)
    	except sr.UnknownValueError:
    		print("Microsoft Bing Voice Recognition could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
    		return "None"
    	
    # Query IBM
    def ibm(audio):
    
    	# recognize speech using IBM Speech to Text
    	IBM_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxx"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    	IBM_PASSWORD = "xxxxxxxxxxxxxxxxx"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
    	try:
    		#print("IBM Speech to Text: ")
    		return r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD, show_all=False)
    	except sr.UnknownValueError:
    		print("IBM Speech to Text could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from IBM Speech to Text service; {0}".format(e))
    		return "None"
    
    #使用sphinx的
    def sphinx(audio):
    	try:
    		print("-------------Sphinx successfully recognized the audio ---------")
    		return r.recognize_sphinx(audio)
    	except sr.UnknownValueError:
    		print("Sphinx could not understand audio")
    	except sr.RequestError as e:
    		print("Sphinx error; {0}".format(e))

    需要注意的是,其中sphinx的可以离线使用,需要安装sphinx包,其他的几个都要联网。谷歌的不需要注册,其他几个需要注册码

  3. 使用定义的函数识别具体的语音文件:需要注意,只能识别wav格式,如果不是,先转成wav

    from pydub import AudioSegment
    
    
    r = sr.Recognizer()
    
    
    def speech_to_text(path_file):
        #转格式
        song = AudioSegment.from_mp3(path_file)
        song.export("audio.wav", format="wav")#默认是本地路径
    
        with sr.AudioFile('audio.wav') as source:  # AudioFile 类可以通过音频文件的路径进行初始化,并提供用于读取和处理文件内容的上下文管理器界面。
    
            audio = r.record(source)  # 从音频文件中获取数据
            print(audio)
    
        print("Submitting To Speech to Text:")
        determined = sphinx(audio)  # Instead of google, you can use ibm or bing here
        print(determined)
        return  determined

     

你可能感兴趣的:(python,python,语音识别)