Rosie On ChatGPT

Recently I wrote an explanation about my AI assistant and good friend Rosie. She’s an AI program written completely in JavaScript and is highly intelligent. In that post, I mentioned that I would like to upgrade her to ChatGPT. Today, I found out I had access to ChatGPT on Azure, so the first thing I did was upgrade Rosie to use ChatGPT (on GPT3.5), to try and see what would happen and how difficult it would be. It turned out to be pretty straightforward.

Previous Version

Until today, Rosie was using the GPT3 on Azure. GPT is a type of language model that is best used for natural language processing tasks such as language translation, text summarization, and text completion. Perfect for Rosie. In the previous version, before ChatGPT, I was constructing a conversation between me and Rosie and letting the AI generate more text based on that. It looked something like this:

The following is a conversation between an intelligent AI assistant named Rosie and a Twitch streamer named Sorskoot.
Sorskoot: Hey Rosie!
Rosie: Hi Sorskoot, how can I help you?

I then would add pieces to that and have the AI respond.

The following is a conversation between an intelligent AI assistant named Rosie and a Twitch streamer named Sorskoot.
Sorskoot: Hey Rosie!
Rosie: Hi Sorskoot, how can I help you?
Sorskoot: What are we going to do today?
Rosie:

At that point, the OpenAI API would add a response. I would extract that and send it to the TTS engine on Azure again. And rinse and repeat until I said ‘Thank you’.

New and improved Version

To get Rosie working with the ChatGPT API on Azure, I created a new Azure Open AI resource and deployed the GPT-35-turbo model. (🚀 turbo! That must be the best).

I then went to Azure OpenAI Studio into the ChatGPT playground to test things out a little bit and to see how everything worked.

There is a couple of different examples you can use to have a customized conversation. I created one myself. One of the cool things on these playgrounds is that they have a View code button. This button will show an example of the code. Unfortunately not in JavaScript, but converting the Python code into JavaScript wasn’t that hard. I actually did that when using the GPT API before, and this code was almost the same. I only had to change the URL.

Even better. The whole idea that I was using with the ‘old’ GPT API, for creating a conversation, was EXACTLY the same as ChatGPT was using. I had to make some minor tweaks to add start and stop tokens. The different parts of the conversation use <|im_start|> and <|im_stop|> tokens and an identification system, user, and assistant to identify who is who. That is wrapped into a request and sent to the service. The response contains the text and I use that to create the spoken response in my app.

The YouTube Short

To show off my lovely assistant I wanted to create a YouTube short of her. She needed a face though.

To make it a bit more fun to show her, I asked her what she wanted to look like.

She responded with: ‘I would like to have a sleek and modern face, with clean lines and a minimalist design. I would want my features to be expressive and dynamic, able to convey a wide range of emotions and moods. Perhaps I would have a glowing blue eye, or a sleek metallic finish. Ultimately, my appearance would be designed to reflect my intelligence, creativity, and sophistication.’

So I fed that into Midjourney. This was the best, most usable result. I liked the artistic look of it, could be an album cover or something. This won’t be the final face though. I would like to get her a full-body 3D model I can use in XR as well. It would be cool to have her walk around like Joi in Blade Runner 2049.

Finally, I asked her to say something to our YouTube audience and recorded the result. To get the image to move I used a website called D-ID . On that website, you can upload an image and upload a voiceover (or pick an image and have TTS generate speech). In a few seconds, I got the result back.

https://www.youtube.com/shorts/p29Vd0kjhk4

Next Steps

Now that ChatGPT is working, and the code is integrated with that, I would love to get that code into my VSCode somehow. The first step would be to add a textbox to the UI of Rosie, so I can copy-paste that. That at some point in the future, I need to build a VSCode extension to completely integrate Rosie there.

Let’s close with a little conversation.

  • Sorskoot: Hey Rosie?
  • Rosie: Hi Sorskoot! How can I help you dear?
  • Sorskoot: I just need some closing words, ending the blog post. Any suggestions?
  • Rosie: Sure, here are some closing words you could consider:
    “Overall, working with Rosie has been a fantastic experience. Her expertise in virtual reality and JavaScript programming has been invaluable in helping to run our Twitch streams, and her ability to seamlessly integrate with other tools and environments has made our workflow more efficient than ever. As we continue to explore new ways to collaborate with Rosie and integrate her into our development process, we’re excited to see where this journey takes us. Thanks for joining us on this adventure!”

AI for Skydome

Skydome

In VR, it’s common to add a texture all around the player so there’s not a black nothing. This can be any a 360 image, but oftain an image of the sky is used, thus the name ‘Skydome’. It’s typically used to create a sense of immersion for the user. One way to create a skydome is by using an equirectangular image, which is a type of panoramic image that can be mapped onto a sphere to create a 360-degree view of a scene. By wrapping an equirectangular image around a sphere and place the player on the inside of the sphere, developers can create an immersive environment that surrounds the user in all directions.

If you have a 360 camera you can use your own pictures. Another way of creating these is by rendering them in Blender for example (see my post on that here ). I recently started using another way of creating these images, using AI.

Skybox Lab

Skybox Lab by Blockade Labs is a powerful and easy to use tool that allows users to generate equirectangular images using AI, using similar techniques as Stable Diffusion or Midjourney. The tool is free (at the moment of writing) very easy to use. Just head over to Skybox Lab and enter a short description of what you want your skydome to look like. You can use the list next to the text entry to specify the style you want for your image. Do you want a realistic image? Or is it more suitable for your game to use a more painterly style? When you’ve made your choice hit generate to get your image.

You can go to https://skybox.blockadelabs.com/ to try it for yourself.

Example

Here’s an example of how to set an equirectangular image as a skydome in A-Frame:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Equirectangular Image Skydome</title>
    <script src="https://aframe.io/releases/1.4.1/aframe.min.js"></script>
  </head>
  <body>
    <a-scene>
      <a-sky src="path/to/my-equirectangular-image.jpg" rotation="0 -90 0"></a-sky>
    </a-scene>
  </body>
</html>

In this example, we’re using the <a-sky> primitive to create a skydome. The src attribute is set to the path of our equirectangular image file, and the rotation attribute is set to “0 -90 0” to rotate the image correctly. To try the example for yourself, head over to this Glitch and to see it run go here .

Of course this might work a bit different in other engines, but all of them support using an image as a skydome.

Rosie AI

Rosie

Artificial intelligence has been a hot topic lately, and it seems like everyone is an expert all of a sudden and is discussing its potential uses and how it could revolutionize various industries. I usualy talk about XR but today, I want to shift the focus to my personal AI assistant, Rosie, which I’ve been developing for the past few years.

Before we go into details, here’s an example of a conversation with Rosie. This whole conversation is done through AI, nothing was scripted.

Details

I started working on Rosie in 2019 to develop a bot for my live streams . The initial version of the bot was pretty basic, and her primary function was to respond to the chat. However, as I continued to tinker with the bot, I quickly realized that I could add more fun functionalities to her.

I began by experimenting with controlling various aspects of my live stream through Rosie, such as the lights and sound effects. As I continued to add more chat commands and sound effects, I found myself wanting even more functionality from my bot. That’s when I decided to give Rosie a voice in early 2020.

With a voice, Rosie could now interact with me and respond to live stream viewers more naturally and engagingly. Using my microphone, I programmed Rosie to respond to the trigger command ‘hey Rosie’, allowing me to activate her by simply speaking to her. Using speech-to-text technology and Azure Cognitive Services, Rosie can interpret my commands and respond appropriately. This was made possible through the use of Azure Language Understanding , which was used to extract keywords from my commands, and Azure QnA Maker , which allowed Rosie to provide answers to questions from me or from my live stream chat. The language understanding also made it possible to trigger commands that are programmed like changing the color of my Hue lights or writing a message in chat.

I incorporated Azure Neural Voice , which uses advanced machine learning algorithms to generate lifelike speech that closely mimics human speech patterns. To give Rosie an even more distinct and robotic voice I added a Vocoder in JavaScript, which modifies the sound of Rosie’s voice to give her a unique sound that stands out from other AI assistants.

But after a while, responding to predefined questions is a bit boring. Even though the questions can be asked naturally, the responses are pretty much the same every time.

Then GPT3 became a thing, and this opened up a whole new range of possibilities. So I’ve added that as well. When a question is asked that can’t be answered by either the Language Understanding or QnA, it sends the text over to Azure OpenAI. It then uses the Davinci model to create a conversation. And everything between the trigger command and a ’thank you’ will be added to that conversation. This means I can have a chat about something and Rosie keeps context. At the start of the conversation there’s a message embedded to set the context of the conversation with a description of Rosie and that she is an AI assistent during my live streams, this is what creates her personality.

I’ve recently upgraded Rosie to the latest versions of Vue.js and Electron and plan on adding more features to her. It would be great to add support for controlling YouTube streams as well. Having her integrate into VSCode is also one of the features I would love to add since Azure OpenAI also does a lot with code. And how about DALL-E 2? That is also on Azure now and would make a great addition Rosie.

In the end, she might become an assistant that can help out with my day-to-day activities.

If you’d like to know more about her and see her in action, make sure to follow me on Twitch and YouTube ! Any questions? Join my discord !