欸,最近想做一些nlp的项目,做完了中文的想做做英文的,但是呢,国内爬虫爬取的肯定都是中文 ,爬取外网的技术我没有尝试过,没有把握。所以我决定启用翻译,在这期间chatGPT给了我非常多的方法,但是都因为各种各样的原因一一无效。ps:大骗子GPT!
from youdaoapi import YoudaoApi
def translate_to_english(chinese_text):
youdao = YoudaoApi()
english_translation = youdao.translate(chinese_text, to_lang='en')
return english_translation
chinese_string = "你好,世界!"
english_translation = translate_to_english(chinese_string)
print("English Translation:", english_translation)
NOTE:无用的youdaoapi包
from translate import Translator
def translate_to_english(chinese_text):
translator = Translator(to_lang="en")
english_translation = translator.translate(chinese_text)
return english_translationchinese_string = "你好,世界!"
english_translation = translate_to_english(chinese_string)
print("English Translation:", english_translation)
NOTE:无用的translate包。
from googletrans import Translator
def translate_to_english(chinese_text):
translator = Translator()
translated = translator.translate(chinese_text, src='zh-cn', dest='en')
return translated.textchinese_string = "你好,世界!"
english_translation = translate_to_english(chinese_string)
print("English Translation:", english_translation)
NOTE:无用的谷歌包。
百度文本翻译api申请!整体流程页面https://console.bce.baidu.com/ai/?_=1652768945367&fromai=1#/ai/machinetranslation/overview/index
# -*- coding: utf-8 -*-
# This code shows an example of text translation from English to Simplified-Chinese.
# This code runs on Python 2.7.x and Python 3.x.
# You may install `requests` to run this code: pip install requests
# Please refer to `https://api.fanyi.baidu.com/doc/21` for complete api document
import requests
import random
import json
def get_access_token():
"""
使用 AK,SK 生成鉴权签名(Access Token)
client_id:API Key
client_secret:Secret Key
:return: access_token,或是None(如果错误)
"""
url = "https://aip.baidubce.com/oauth/2.0/token"
params = {"grant_type": "client_credentials", "client_id": '5UHGfQaGLKlINhXRv1lA0tl3', "client_secret": 'evGZuz1r14MRElOt638D8GMdheQ9gKZj'}
return str(requests.post(url, params=params).json().get("access_token"))
def baidu_translate(q):
token = get_access_token()
url = 'https://aip.baidubce.com/rpc/2.0/mt/texttrans/v1?access_token=' + token
# For list of language codes, please refer to `https://ai.baidu.com/ai-doc/MT/4kqryjku9#语种列表`
from_lang = 'zh' # example: en
to_lang = 'en' # example: zh
term_ids = '' # 术语库id,多个逗号隔开
# Build request
headers = {'Content-Type': 'application/json'}
payload = {'q': q, 'from': from_lang, 'to': to_lang, 'termIds' : term_ids}
# Send request
r = requests.post(url, params=payload, headers=headers)
result = r.json()
# Show response
# print(json.dumps(result, indent=4, ensure_ascii=False))
return result['result']['trans_result'][0]['dst']
效果截图:
耶耶耶!,尽情享用吧,爬取数据的同时加一个这个函数转成英文再储存为csv等,完美解决这个小小的难题。值得记录一下勒!--<-<-<@