一、基础操作:使用`SpeechRecognition`库
安装所需库
首先需安装`SpeechRecognition`和`pyaudio`库(用于音频输入):
```bash
pip install SpeechRecognition pyaudio
```
*注意:在某些系统上安装`pyaudio`可能需要额外配置音频设备权限或编译依赖项。*
读取音频文件并转录
以下代码示例展示了如何从WAV文件中提取语音并转换为文字:
```python
import speech_recognition as sr
def transcribe_audio(file_path):
recognizer = sr.Recognizer()
with sr.AudioFile(file_path) as source:
print("正在读取音频文件...")
audio_data = recognizer.record(source)
try:
text = recognizer.recognize_google(audio_data, language='zh-CN')
print("转换结果:", text)
except sr.UnknownValueError:
print("抱歉,没听清楚...")
except sr.RequestError:
print("网络连接出现问题...")
使用示例
transcribe_audio("your_audio_file.wav")
```
*将`your_audio_file.wav`替换为实际音频文件路径。*
二、高级应用:实时语音识别
实时音频捕获与识别
以下代码实现通过麦克风实时捕获语音并转换为文字:
```python
import speech_recognition as sr
def real_time_recognition():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("请说话...")
适应环境噪音
recognizer.adjust_for_ambient_noise(source)
while True:
try:
audio = recognizer.listen(source)
text = recognizer.recognize_google(audio, language='zh-CN')
print(f"识别结果: {text}")
except sr.UnknownValueError:
print("无法识别语音")
except sr.RequestError:
print("网络连接出现问题...")
if __name__ == "__main__":
real_time_recognition()
```
*此代码会持续监听麦克风输入并输出识别结果,适合开发语音输入工具。*
三、其他注意事项
选择识别引擎
`SpeechRecognition`支持多种引擎(如Google、IBM Watson、Microsoft Bing),可根据需求选择。例如,使用百度语音识别:
```python
text = recognizer.recognize_baidu(audio_data, language='zh-CN')
```
*需注册百度AI平台并获取API密钥。*
处理不同音频格式
若音频格式非WAV,可使用`pydub`库进行转换:
```bash
pip install pydub
```
示例代码:
```python
from pydub import AudioSegment
import speech_recognition as sr
def convert_and_recognize(file_path):
audio = AudioSegment.from_file(file_path)
wav_data = audio.export(format="wav", codec="pcm_s16le")
recognizer = sr.Recognizer()
with sr.AudioFile(wav_data) as source:
text = recognizer.recognize_google(source, language='zh-CN')
print(text)
```
提升识别准确性
- 转换音频为WAV格式
- 使用`adjust_for_ambient_noise`减少背景噪音干扰
- 尝试其他语言模型(如`language='en-US'`)提升识别效果
通过以上方法,您可实现从音频文件到文字的转换,或构建实时语音识别应用。根据需求选择合适工具和优化策略,可显著提升识别准确性和用户体验。