Generative AI: Key Concepts. Beyond the hype that the term is… | by Learn Multi Cloud!

Beyond the hype that the term is receiving today and the skepticism that still surrounds it, ChatGPT (and other GenAI LLMs) seems to have been a turning point in the history of technology. With very human-like features and its access through a simple website through chat, it made it possible to bring these functionalities closer to a large part of the population and gave rise to a great wave of innovation in this discipline.

Beyond this, AI is not a panacea but one more tool for our Toolkit as IT professionals and will be only as good as the algorithms and training we can give to the models we use.

According to ISO/IEC 22989:2022, Artificial Intelligence (AI) is called “… a technical and scientific field dedicated to generating systems or functionalities that generate results such as content, forecasts, recommendations or decisions for a set of objectives defined by the human being”.

We usually think of AI when a software shows similar or equal capabilities to those of humans. Basically, to extend the ability of computers to simulate human intelligence. Some of the capabilities that AI can simulate include:

Visual Perception
Text and conversation analysis
Ability to speak
Decision making

AI is built on the concepts of Data Science and Machine Learning.

It is a discipline that focuses on data processing and analysis. It seeks to find relationships and insights through the application of different statistics techniques. For this purpose, it uses predictive models.

Within Data Science, ML is the discipline that is responsible for the training and validation of predictive models. We have data, we train the model, and then we perform an inference.

It allows computers to learn from data, identify patterns, make decisions, and make predictions autonomously. This learning is achieved through different approaches and techniques such as supervised, unsupervised, or reinforced learning.

Generative Artificial Intelligence can be understood as an ML model that is designed to generate new data rather than make predictions about specific data. Basically, GenAI learns to generate data similar to the data with which it was trained. In response to a user prompt, AI can generate text, images, videos, or code. It is based on Deep Learning models. In general, Generative AI has 3 phases:

Training
Tuning (adjustment)
Generation, evaluation, and re-tuning

Within ML, we find the concept of Deep Learning (DNN = Deep Neural Networks). This discipline uses multi-layered neural networks to simulate the decision-making capacity of a human brain. These models are trained with a large amount of data and processed by a large number of Neural Networks. The more data and the more neural networks (i.e., the more computation), the more you can refine it at human-like levels. The main difference between these models and ML models is that they can process much more unstructured data (without labels).

Also called Artificial or Simulated Neural Networks (ANN or SNN), it is the foundation stone of Deep Learning. The objective of Neural Networks is to simulate the action of human neurons.

They are composed of nodes and organized in layers. Nodes interact with each other, translating information just as our neurons transmit electrical charges.

Many of today’s best-known solutions such as Large Language Models (LLMs) such as ChatGPT, DALL-E, Gemini, and other predictive models use Neural Networks in a large part of their architecture.

Transformers are a type of architecture from Neural Networks that has become a fundamental concept in several AI disciplines. It consists of a mechanism that receives an Input sequence and transforms it into an Output sequence, taking into account the context and the relationship between the elements of the sequence.

This concept was very revolutionary in NLP as it fundamentally changed the way it was processed in context and the relationship between words to predict the next one. It allowed the context processing of longer dependencies.

Traditionally, Neural Networks (NN) use an Encoder/Decoder architecture. The Encoder reads and processes the Input Data (such as a sentence in Spanish in text format) and transforms it into a mathematical representation. The Decoder takes this mathematical information and generates the Output Data step by step (which can be the same sentence translated into English). This processing happens sequentially.

Transformers modify this process by incorporating the Self-Attention mechanism that modifies the order of data processing. It allows the model to process different parts of the sequence in parallel and determine which are the most important.

It was one of the most important breakthroughs that allowed the construction of GPT models.

Two very interesting papers about transformers:

In practice, the implementation of the Transformers architecture may vary. For example, the Bidirectional Encoder Representations (BERT) model developed by Google, used in its Search Engines, only uses the Encoder Block, while the Generative Pretrained Transformer (GPT) used by OpenAI only uses the Decoder Block.

They are a specialized type of ML model that can perform NLP tasks such as:

Determine sentiment or classify natural language text
Summarize text
Compare text
Generate new natural language

Basically, they are very large Deep Learning models that are trained with a lot of data. Transformers are the underlying technology, allowing parallel processing and consisting of: Encoder, Decoder, and Self-Attention capabilities. This allows the use of GPUs for training. These language models can be trained in an Unsupervised manner (i.e., the models are Self-Learning).

Generative AI models such as GPT are used with LLMs and rely on two fundamental factors:

A lot of data (practically the entire internet)
A lot of computing power

These large LLMs stopped using the mathematical representation of each word and overcame this limitation by using Multi-Dimensional Vectors (commonly referred to as “word embeddings”) to represent words that have similar contextual meanings as other close relationships in vector space.

Using word embeddings, Transformers can pre-process text as numerical representations through the Encoder and understand the context of words and phrases with similar meanings. It is then possible for LLMs to apply this knowledge of the language through the Decoder to produce the output.

It basically encompasses the creation of software that understands language in both spoken and written form.

It is a fundamental process in NLP and in the operation of the transformer model in GenAI.

This is the process of dividing a continuous text into smaller units: tokens. These tokens can be words, subwords, characters, or even smaller units, depending on the tokenization scheme used.

Basically, this way the model can better “understand” the input text. This happens because Transformers models like GPT can’t directly process text. They need to convert it into a sequence of numbers represented by these tokens. You can refer to tokenization as a “bridge” between human language and the numerical representation used by AI to learn and generate text.

These numerical representations into which the input is converted are Vectors called Embeddings.

Vectors are multi-valued numerical representations of information. They represent multidimensional lines that describe a direction and distance in relation to multiple axes (amplitude and magnitude). They help build relationships between tokens. In relation to language tokens, each value of the vector represents a semantic attribute of the token.

Token elements in the embedding space represent some semantic attribute of the token. In this sense, other tokens that are semantically similar should result in vectors with a similar orientation.

OpenAI: Leader in the GenAI race with advanced models such as GPT-4o and o1/o3 (with reasoning). Sora is the model of video generation.
Google DeepMind: Drives the development of multimodal models such as Gemini 2.0. Some information was also known about the Veo 2 video generation model.
Microsoft: Integrates OpenAI models into Azure, offering generative AI services thanks to the partnership with OpenAI.
Meta (Facebook): Promotes open innovation with models such as LLaMA and BlenderBot.
Anthropic: With models such as Claude3 with the Sonnet models that we can find in AWS.
Mistral AI: An emerging organization that launched the Mistral7B in 2023 as an OpenSource model that surpassed GPT-3.5.
Stability AI: Democratizes access to generative AI through open-source tools such as Stable Diffusion.