官方文档:AssemblyAI | Overview
参数配置以及相关库的导入
#导入第三方库
import requests
import pyaudio
import wave
from tqdm import tqdm
#设置参数
FRAME_PER_BUFFER = 4096 #每个缓冲区的帧数
FORMAT = pyaudio.paInt32 #字节类型
CHANNELS = 2
RATE: int = 44100
upload_endpoint = "https://api.assemblyai.com/v2/upload" #上传终端
transcript_endpoint = "https://api.assemblyai.com/v2/transcript" #转录终端
这里的API_KRY_ASSEMBLYAI为每个用户独有的密钥,位于官网主页
AssemblyAI Speech-to-Text API | Automatic Speech Recognition
headers = {'authorization': API_KRY_ASSEMBLYAI}
def record_audio(wave_out_path,record_second):
#实例化
p = pyaudio.PyAudio()
stream = p.open(
channels=CHANNELS,
rate=RATE,
format=FORMAT,
input=True,
frames_per_buffer=FRAME_PER_BUFFER
)
print("开始录制:")
frames = []
for i in tqdm(range(0, int(RATE/FRAME_PER_BUFFER * record_second ))):
data = stream.read(FRAME_PER_BUFFER)
frames.append(data)
stream.stop_stream()
stream.close()
p.terminate()
audio_obj = wave.open(wave_out_path, "wb")
audio_obj.setnchannels(CHANNELS)
audio_obj.setsampwidth(p.get_sample_size(FORMAT))
audio_obj.setframerate(RATE)
audio_obj.writeframes(b"".join(frames)) #将frames里面的所有元素合成二进制字符串的形式
audio_obj.close()
#上传本地文件
def upload(filename):
def read_file(filename, chunk_size=5242880):
with open(filename, 'rb') as _file:
while True:
data = _file.read(chunk_size)
if not data:
break
yield data
try:
upload_response = requests.post(upload_endpoint,
headers=headers,
data=read_file(filename))
audio_url = upload_response.json()['upload_url']
return audio_url
except:
print("请再次尝试")
return
向endpoint请求后得到响应中的参数
{
"upload_url": "https://bit.ly/3yxKEIY"
}
带着从上传终端得到的upload_url
以及language_code
(可以指定转录的语言,每种语言都对应一种language_code,详情请看官方文档)两个参数,向转录终端发送请求得到响应json,只需要获取id
#转录
def transcribe(audio_url):
transcript_request_json = {
"audio_url": audio_url,
"language_code": "es"
}
transcript_response = requests.post(
transcript_endpoint,
json=transcript_request_josn,
headers=headers)
id = transcript_response.json()['id']
return id
响应json:
{
"id": "5551722-f677-48a6-9287-39c0aafd9ac1"
...
}
poll操作:询问转录的状态,检查有关转录状态的更新。"status"
: "queued"
"processing"
"completed"
等待状态更新为"completed"
,从json中得到转录结果text
官方文档里的poll终端:
endpoint = "https://api.assemblyai.com/v2/transcript/YOUR-TRANSCRIPT-ID-HERE"
即
polling_endpoint = transcript_endpoint + '/' + id
代码:
#poll操作
def poll(id):
polling_endpoint = transcript_endpoint + '/' + id
polling_response = requests.get(polling_endpoint,headers=headers)
return polling_response.json()
#获取结果
def get_transcription_result_url(audio_url):
id = transcribe(audio_url)
print("转录中······")
time_start = time.time()
while True: #一直询问转录的状态,直到转录成功
data = poll(id)
if data['status'] == 'completed':
time_end = time.time()
return data['text'], None, time_end-time_start #查看转录时间
elif data['status'] == 'error':
return data['text'], data["error"], None
json: 这里的text就是转录结果
{
"acoustic_model": "assemblyai_default",
"audio_duration": 12.0960090702948,
"audio_url": "https://bit.ly/3yxKEIY",
"confidence": 0.956,
"dual_channel": null,
"format_text": true,
"id": "5551722-f677-48a6-9287-39c0aafd9ac1",
"language_model": "assemblyai_default",
"language_code": "es",
"punctuate": true,
"status": "completed",
"text": "Ya sabes Demonios en la tele así y para que la gente se exponga a ser rechazada en la tele o humillada por el factor miedo o.",
...
}
整个过程比较繁琐(转录的时长大约在10秒左右,要看自己的网速),总的来说实现起来比较简单,最终的识别结果比较准确,但是不能做到实时(实时的要付费),虽然这个接口是免费的,但是也会有时间限制。详情请看官方文档。