Smart speakers might not be as ubiquitous as they once were, but there’s probably a high percentage chance that you, the discerning Tom’s Guide reader, have at least one in your home.
Whether you’re using Alexa, Google Assistant, or a HomePod, though, OpenAI may have just laid the groundwork for a huge upgrade for your chatty speaker of choice.
The company behind ChatGPT’s new ‘Realtime API’ will act as a sort of connective tissue that helps ‘plug in’ Advanced Voice features (and more) into other applications.
How could Apple’s new Depth Pro model be used?
In OpenAI’s words, “Developers can now build fast speech-to-speech experiences into their applications”.
That’s a pretty good summation, and it works similarly to ChatGPT’s Advanced Voice Mode, offering speech-to-speech functionality that’s readily available for developers to implement in their own applications.
Previously, developers would need to transcribe scripts using a speech recognition application. That leads to a “stock” sounding voice, lacking in nuance and a true sense of conversation. The Chat Completions API made it easier to handle it in one API call, OpenAI explains.
As the name suggests, the Realtime API streams audio and input directly so that developers can have their voice assistants be interrupted naturally (as rude as that may sound).
How it could be huge for smart speakers
That interruption element is key. How many times as your smart speaker misinterpreted your command and you need to wait for it to talk to itself before you get to a point where you can ask again?
It’s a nuisance, but with better interruption detection, things could get much better. Your smart speaker of choice could also get things right first time more often with a better underlying model interpreting your commands, while the commands themselves could be much more complex.
If you’ve ever tried to ask your smart speaker to do multiple things in sequence, or refer to prior conversations, you’ll know that at times they’re actually not very smart at all. With the contextual awareness of OpenAI’s Realtime API, however, you could ask your speaker to recall something from a prior conversation, or add your own profile to it so it knows to address you differently to your partner or kids.
Naturally, these are all hypotheticals at this point, but that Echo Dot you picked up on Prime Day half a decade ago may be about to get supercharged.
What else could Realtime API do?
Today at DevDay SF, we’re launching a bunch of new capabilities to the OpenAI platform: pic.twitter.com/y4cqDGugjuOctober 1, 2024
I’m never one to suggest AI replace human jobs (in this field that’s a very, very slippery slope that gets more well-worn by the day), but I do think there are additional possibilities on offer outside of your speaker knowing which version of a song you asked for.
An obvious fit would be call centers, which would still need humans for the actual service parts of the job, but that could benefit from more accurate triaging of calls (begone, keypad options in 2024!).
There’s also the potential for voice assistants in general to become more interchangeable as they tap into the same API, or that the technology becomes so democratized that we end up with more options than ever on the App Store.
Finally, OpenAI’s realtime model could run on robots. It sounds far-fetched, but having robots that can communicate in a more human way could be the next step in automation – or they could just diagnose errors themselves and tell you how to fix them.