DRAFT

WebRTC-based Voice AI

As setting up Realtime API from OpenAI with function calling is a bit tricky, I’ve borrowed most of the code from repository created by Cameron King . In short, the code authorizes the WebRTC connection in a separate controller , and then establishes a peer-to-peer connection with OpenAI Realtime API, sending audio data from the microphone and receiving audio responses. The original hooks, such as useToolsFunctions and useWebRTCAudioSession, are left as is, with only a few minor changes.

As the connection is peer-to-peer, the audio data is sent directly from the browser to OpenAI servers, without passing it thru our back-end. The functions that are going to be executed are the methods of the RPC modules. In other words, AI is going to make HTTP requests to our back-end from the browser, supporting authorization that’s already implemented for normal requests. This makes the execution of the functions almmost instant, with only a small delay caused by the network latency.

Enabling function calling for the voice interface

As Cameron King provided a solid framework for Next.js + OpenAI Realtime API + Function Calling, the only thing left is to extend or replace existing tools with createLLMTools function from Vovk.ts, that turns RPC modules into tools that can be used by any LLM setup.

src/components/RealTimeDemo.tsx


"use client";
import { useToolsFunctions } from "@/hooks/use-tools";
import useWebRTCAudioSession from "@/hooks/use-webrtc";
import { tools } from "@/lib/tools"; // default tools by Cameron King
import { useEffect, useState } from "react";
import { createLLMTools } from "vovk";
import { TaskRPC, UserRPC } from "vovk-client";
import Floaty from "./Floaty";
 
const { tools: llmTools } = createLLMTools({
  modules: { TaskRPC, UserRPC },
});
 
const RealTimeDemo = () => {
  // State for voice selection
  const [voice] = useState<"ash" | "ballad" | "coral" | "sage" | "verse">(
    "ash",
  );
 
  // WebRTC Audio Session Hook
  const {
    isSessionActive,
    registerFunction,
    handleStartStopClick,
    currentVolume,
  } = useWebRTCAudioSession(voice, [...tools, ...llmTools]);
 
  // Get all tools functions
  const toolsFunctions = useToolsFunctions();
 
  useEffect(() => {
    // Register all functions by iterating over the object
    Object.entries(toolsFunctions).forEach(([name, func]) => {
      const functionNames: Record<string, string> = {
        timeFunction: "getCurrentTime",
        partyFunction: "partyMode",
        scrapeWebsite: "scrapeWebsite",
      };
 
      registerFunction(functionNames[name], func);
    });
 
    llmTools.forEach(({ name, execute }) => {
      registerFunction(name, execute);
    });
  }, [registerFunction, toolsFunctions]);
 
  return (
    <div>
      <Floaty
        isActive={isSessionActive}
        volumeLevel={currentVolume}
        handleClick={handleStartStopClick}
      />
    </div>
  );
};
 
export default RealTimeDemo;

For more details, check the full code of the component at the GitHub repository as well as the Floaty component .