Different types of LLM models

Hello Coders! 👾

In today’s post, I hope to demystify Large Language Models and break down their functionality a bit without any confusing terminology (I hope) since it got me confused for a while. And I still have some questions I want to get answered in this post.

Imagine that instead of spelling out words letter by letter, like “C-O-D-E” for “code,” LLMs use extensive lists of numbers to capture the essence of words. So, “code” could be represented by a sequence such as [0.0074, 0.0030, -0.0105, …]. These number sequences, known as word vectors, are particularly adept at conveying the nuanced relationships between words. And, because of these vectors they “know” what words are related, like king and queen for example.

The real powerhouse behind LLMs, such as ChatGPT, is a structure called the transformer. This framework sifts through the input text to produce outputs that carry meaning. Transformers are equipped with something called attention mechanisms which allow them to grasp the context and interplay between words in a sentence.

The training of these models is fascinating—they employ a method known as transfer learning. Here’s how it works: a model that’s already been trained, boasting billions of parameters, absorbs the intricate ties that bind words together from a wealth of text data. Then, this model is fine-tuned to carry out specific tasks, like engaging in conversation or summarizing texts.

One interesting thing about these big language models is they can guess what word comes next in a sentence by looking at the words before. But, even though they are very good at guessing, almost nobody really knows the exact way they do this — it’s still kind of a mystery.

The performance of LLMs is directly related to the amount of data they are trained on. They require vast amounts of text, billions of words, to function effectively. In essence, LLMs are sophisticated models that interpret text data and leverage both word vectors and transformers to craft responses that come across as remarkably human. While we may not fully understand the complexities of their inner workings, their contribution to the advancement of natural language understanding is nothing short of a revolution.

I personally don’t like to think of this as “intelligence” as it doesn’t possess consciousness or self-awareness. Maybe something called narrow intelligence at most. Since, in the end, it’s all mathematical. Magic mathematics, but still mathematics.

What’s in a name

When you encounter names like SorskootLLM-70b-instruct in the realm of AI, they might seem like cryptic code at first glance. Let’s decipher this and get a clearer picture.

The 70b here is shorthand for the colossal number of parameters the model is built upon — 70 billion, to be exact, with b denoting billion. Think of parameters as the knowledge nuggets the model picks up during training, which ultimately shape its abilities. The more parameters there are, the smarter and more knowledgeable the model becomes. However, it’s worth noting that this increase in brainpower comes with the need for more data to learn from and more computing muscle to process that information. Thus, a higher number of parameters doesn’t always have to be the better choice, they often cost more to run and are slower in giving responses.

Moving on to instruct, this part tells us that the model doesn’t just process data — it’s been specially fine-tuned to take commands, in this case. This means that it’s undergone additional training with a set of guidelines, making it adept at interpreting and executing tasks according to the instructions it receives.

In essence, SorskootLLM-70b-instruct is like a virtual maestro of a model, boasting a vast 70 billion parameters, all fine-tuned to act on your directions with precision. It’s an AI that doesn’t just seem to understand the data; it looks like it understands you.

Different fine-tuning objectives sculpt AI models for varied tasks. When a model is honed with an instruct objective, it becomes a specialist in following orders. On the conversational front, we have models trained with a chat objective. These models are fine-tuned to produce relevant and engaging replies in the give-and-take of human conversation. They’re the backbone of chatbots and social interaction tools that can handle everything from user queries to jokes and casual chitchat.

Diving into more technical territories, models with a python or code fine-tuning objective are the coding whizzes, specifically trained to work with code. Whether it’s writing new code, completing lines, or squashing bugs.

When to use what model?

When looking at places where you can see different models, like Azure AI Studio, or Huggingface you can come across different versions of the same model. For example, say I have 3 versions of the same model SorskootLLM-v2-13b, SorskootLLM-v2-13b-instruct and SorskootLLM-v2-13b-chat. What would be the use-cases for each of these?

Let’s explore the use cases for each of these fictional fine-tuned models:

SorskootLLM-v2-13b

This model is a general-purpose language model with a capacity of approximately 13 billion parameters.

Use Cases:

Summarization: It can summarize long texts, extract key information, and provide concise summaries.
Text Generation: Useful for creative writing, content generation, and storytelling.
Question Answering: Can answer factual questions based on context.
Language Understanding: Understands context and generates coherent responses.

SorskootLLM-v2-13b-instruct:

This model is optimized for classification, extraction, and summarization tasks.

Use Cases:

Information Extraction: Extract relevant details from documents, articles, or other text.
Sentiment Analysis: Detect sentiment (positive, negative, neutral) from text.
Named Entity Recognition (NER): Identify mentions of people, organizations, locations, etc.
Summarization: Generate concise summaries of longer inputs.

SorskootLLM-v2-13b-chat:

This model is designed for dialogue use cases and works well with virtual agents and chat applications.

Use Cases:

Chatbots and Virtual Agents: Engage in natural conversations with users.
Customer Support: Provide automated responses to user queries.
Interactive Conversations: Respond contextually in back-and-forth interactions.
Social Media Chat: Generate chat-like responses for social media platforms. Remember that the choice of model depends on your specific task and requirements. Each of these models has its strengths, and you can select the one that best aligns with your use case!

Wrap up

With this, I hope I shed some light on the confusing world LLMs. In a future blog post, I want to take a look at what it takes to fine-tune a model yourself. If you’ve got any thoughts or questions, let me know down below. And, as always, make sure to follow me and subscribe to my YouTube Channel for updates on that side.

Happy Coding! 🚀