- Published on
多模态交互中的语音设计方案
- Authors
- Name
- Shelton Ma
后端生成语音流 (推荐)
服务端直接生成语音流,将音频数据通过 stream 实时推送至客户端.
1. 工作流程
- 用户发起请求 → 提交文本
- 后端调用
TTS (Text-to-Speech)
引擎,生成语音流 - 后端将音频流返回给前端 (如 WebSocket/HTTP Stream)
- 前端播放语音流,实现实时响应
gTTS + Hono.js
2. 使用 后端
import { Hono } from 'hono'; import OpenAI from 'openai'; const app = new Hono(); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); app.post('/api/chat-to-speech', async (c) => { const { userMessage } = await c.req.json(); // Step 1: GPT 生成对话文本 (流式) const gptResponse = await openai.chat.completions.create({ model: 'gpt-4-turbo', messages: [{ role: 'user', content: userMessage }], stream: true }); // Step 2: 将 GPT 结果拼接成完整文本 let fullResponse = ''; for await (const chunk of gptResponse) { fullResponse += chunk.choices[0]?.delta?.content || ''; } // Step 3: 使用 TTS 转换文本为语音 const ttsResponse = await openai.audio.speech.create({ model: 'tts-1', input: fullResponse, voice: 'alloy', response_format: 'mp3' }); // Step 4: 返回音频流 return new Response(ttsResponse.body, { headers: { 'Content-Type': 'audio/mpeg' } }); }); export default app;
前端实现
import { useState } from 'react'; export default function ChatToSpeech() { const [userMessage, setUserMessage] = useState(''); const handlePlay = async () => { const response = await fetch('/api/chat-to-speech', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ userMessage }) }); const audioBlob = await response.blob(); const audioUrl = URL.createObjectURL(audioBlob); const audio = new Audio(audioUrl); audio.play(); }; return ( <div> <textarea value={userMessage} onChange={(e) => setUserMessage(e.target.value)} /> <button onClick={handlePlay}>播放语音</button> </div> ); }