Artificial intelligence has been a hot topic lately, and it seems like everyone is an expert all of a sudden and is discussing its potential uses and how it could revolutionize various industries. I usualy talk about XR but today, I want to shift the focus to my personal AI assistant, Rosie, which I’ve been developing for the past few years.
Before we go into details, here’s an example of a conversation with Rosie. This whole conversation is done through AI, nothing was scripted.
I started working on Rosie in 2019 to develop a bot for my live streams . The initial version of the bot was pretty basic, and her primary function was to respond to the chat. However, as I continued to tinker with the bot, I quickly realized that I could add more fun functionalities to her.
I began by experimenting with controlling various aspects of my live stream through Rosie, such as the lights and sound effects. As I continued to add more chat commands and sound effects, I found myself wanting even more functionality from my bot. That’s when I decided to give Rosie a voice in early 2020.
With a voice, Rosie could now interact with me and respond to live stream viewers more naturally and engagingly. Using my microphone, I programmed Rosie to respond to the trigger command ‘hey Rosie’, allowing me to activate her by simply speaking to her. Using speech-to-text technology and Azure Cognitive Services, Rosie can interpret my commands and respond appropriately. This was made possible through the use of Azure Language Understanding , which was used to extract keywords from my commands, and Azure QnA Maker , which allowed Rosie to provide answers to questions from me or from my live stream chat. The language understanding also made it possible to trigger commands that are programmed like changing the color of my Hue lights or writing a message in chat.
But after a while, responding to predefined questions is a bit boring. Even though the questions can be asked naturally, the responses are pretty much the same every time.
Then GPT3 became a thing, and this opened up a whole new range of possibilities. So I’ve added that as well. When a question is asked that can’t be answered by either the Language Understanding or QnA, it sends the text over to Azure OpenAI. It then uses the Davinci model to create a conversation. And everything between the trigger command and a ’thank you’ will be added to that conversation. This means I can have a chat about something and Rosie keeps context. At the start of the conversation there’s a message embedded to set the context of the conversation with a description of Rosie and that she is an AI assistent during my live streams, this is what creates her personality.
I’ve recently upgraded Rosie to the latest versions of Vue.js and Electron and plan on adding more features to her. It would be great to add support for controlling YouTube streams as well. Having her integrate into VSCode is also one of the features I would love to add since Azure OpenAI also does a lot with code. And how about DALL-E 2? That is also on Azure now and would make a great addition Rosie.
In the end, she might become an assistant that can help out with my day-to-day activities.