Published on

多模态交互中的语音设计方案

Authors
  • avatar
    Name
    Shelton Ma
    Twitter

后端生成语音流 (推荐)

服务端直接生成语音流,将音频数据通过 stream 实时推送至客户端.

1. 工作流程

  1. 用户发起请求 → 提交文本
  2. 后端调用 TTS (Text-to-Speech) 引擎,生成语音流
  3. 后端将音频流返回给前端 (如 WebSocket/HTTP Stream)
  4. 前端播放语音流,实现实时响应

2. 使用 gTTS + Hono.js

  1. 后端

    import { Hono } from 'hono';
    import OpenAI from 'openai';
    
    const app = new Hono();
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    
    app.post('/api/chat-to-speech', async (c) => {
      const { userMessage } = await c.req.json();
    
      // Step 1: GPT 生成对话文本 (流式)
      const gptResponse = await openai.chat.completions.create({
        model: 'gpt-4-turbo',
        messages: [{ role: 'user', content: userMessage }],
        stream: true
      });
    
      // Step 2: 将 GPT 结果拼接成完整文本
      let fullResponse = '';
      for await (const chunk of gptResponse) {
        fullResponse += chunk.choices[0]?.delta?.content || '';
      }
    
      // Step 3: 使用 TTS 转换文本为语音
      const ttsResponse = await openai.audio.speech.create({
        model: 'tts-1',
        input: fullResponse,
        voice: 'alloy',
        response_format: 'mp3'
      });
    
      // Step 4: 返回音频流
      return new Response(ttsResponse.body, {
        headers: { 'Content-Type': 'audio/mpeg' }
      });
    });
    
    export default app;
    
    
  2. 前端实现

    import { useState } from 'react';
    
    export default function ChatToSpeech() {
      const [userMessage, setUserMessage] = useState('');
    
      const handlePlay = async () => {
        const response = await fetch('/api/chat-to-speech', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ userMessage })
        });
    
        const audioBlob = await response.blob();
        const audioUrl = URL.createObjectURL(audioBlob);
    
        const audio = new Audio(audioUrl);
        audio.play();
      };
    
      return (
        <div>
          <textarea value={userMessage} onChange={(e) => setUserMessage(e.target.value)} />
          <button onClick={handlePlay}>播放语音</button>
        </div>
      );
    }