Skip to Content
Realtime UI 🧪 🚧Part 3: Voice Interface

DRAFT

WebRTC-based Voice AI

As setting up Realtime API from OpenAI with function calling is a bit tricky, I’ve borrowed most of the code from repository  created by Cameron King . In short, the code authorizes the WebRTC connection in a separate controller , and then establishes a peer-to-peer connection with OpenAI Realtime API, sending audio data from the microphone and receiving audio responses. The original hooks, such as useToolsFunctions and useWebRTCAudioSession, are left as is, with only a few minor changes.

As the connection is peer-to-peer, the audio data is sent directly from the browser to OpenAI servers, without passing it thru our back-end. The functions that are going to be executed are the methods of the RPC modules. In other words, AI is going to make HTTP requests to our back-end from the browser, supporting authorization that’s already implemented for normal requests. This makes the execution of the functions almmost instant, with only a small delay caused by the network latency.

Enabling function calling for the voice interface

As Cameron King provided a solid framework for Next.js + OpenAI Realtime API + Function Calling, the only thing left is to extend or replace existing tools with createLLMTools function from Vovk.ts, that turns RPC modules into tools that can be used by any LLM setup.

src/components/RealTimeDemo.tsx
"use client"; import { useToolsFunctions } from "@/hooks/use-tools"; import useWebRTCAudioSession from "@/hooks/use-webrtc"; import { tools } from "@/lib/tools"; // default tools by Cameron King import { useEffect, useState } from "react"; import { createLLMTools } from "vovk"; import { TaskRPC, UserRPC } from "vovk-client"; import Floaty from "./Floaty"; const { tools: llmTools } = createLLMTools({ modules: { TaskRPC, UserRPC }, }); const RealTimeDemo = () => { // State for voice selection const [voice] = useState<"ash" | "ballad" | "coral" | "sage" | "verse">( "ash", ); // WebRTC Audio Session Hook const { isSessionActive, registerFunction, handleStartStopClick, currentVolume, } = useWebRTCAudioSession(voice, [...tools, ...llmTools]); // Get all tools functions const toolsFunctions = useToolsFunctions(); useEffect(() => { // Register all functions by iterating over the object Object.entries(toolsFunctions).forEach(([name, func]) => { const functionNames: Record<string, string> = { timeFunction: "getCurrentTime", partyFunction: "partyMode", scrapeWebsite: "scrapeWebsite", }; registerFunction(functionNames[name], func); }); llmTools.forEach(({ name, execute }) => { registerFunction(name, execute); }); }, [registerFunction, toolsFunctions]); return ( <div> <Floaty isActive={isSessionActive} volumeLevel={currentVolume} handleClick={handleStartStopClick} /> </div> ); }; export default RealTimeDemo;

For more details, check the full code of the component at the GitHub repository  as well as the Floaty component .

Last updated on