drio

Mental model for generative AI

I read a couple of interesting articles recently that helped me clarify how the recent generative AI tools work --at a very high level. I think I have a good mental model now.

What are they good for?

In short: to help you communicate better.

I see a lot of people worried that this is going to be the end of software engineering. I don't think that's going to be the case at least for a while. The article I was reading talks about how to use tools like ChatGPT to augment your cognition, particularly in the context of improving your communication.

In the article, the author uses an example I find useful. He talks about a group of people talking for an hour about a complex problem and how, the next day, one of them asks the AI tool to generate a drawing capturing the essence of the discussion and share it with the other people.

The author then touches what I think is the most important point of the article. He talks about how these new set of AI tools will help us to articulate our thoughts in a more clear way, by using better prose, better analogies and great technical drawings.

I think having a "sidekick" that helps you clarify our thoughts will make us better communicators. That, in turn, will help us share our ideas more effectively.

What is generative AI anyway?

The generative part comes from the fact that the output of these algorithms is something new, something that did not exists before. The other big category of algorithms are discriminative AI which draws distinctions between different kind of inputs. They answer questions like: is this a rabbit or a lion? While generative AI algorithms answer the questions like: draw me a picture of a lion chasing a rabbit.

Another important term is "model". A model is an algorithm (a mathematical representation) that attempts to simulate (or model) some aspect of the real world. It does so by using a subset of information about it. These datasets can be huge.

How does it work?

At a very high level, humans feed huge amounts of data (visual or textual) to machine learning algorithms and they determine what things are likely to appear near other things. When you ask a question, the tools respond with something that falls within the realm of probability based on what data it was trained with. The auto suggestions you get from gmail are a less advance form of this.

I think our minds are not capable of comprehend how much one of this algorithms can "learn" from this huge amounts of data the we feed to them. These data is probably also crawled from the Internet to maximize the amount of data available for the algorithms.

There are a few techniques within the training step. The first one is the transformer (the T in ChatGPT). A transformer derives meaning from sequences of text to understand how words or semantic components relate to each other. Then it determines how likely they are to appear close to each other. These transformer run unsupervised. This means that the algorithm evolves as it keeps running against the dataset.

Another technique is to set two competing algorithms to work against each other. One of them is a generative AI algorithm, the other one is a discriminative AI one. The DAI one is trained to determine if the output of the GAI algorithm is AI generated or not. The algorithms keeps running and the GAI algorithm adjust its parameters until it defeats the other algorithm. At that point the algorithm is further tuned by humans.

I'll leave you with a comment from Chris Phipps when he was asked if ChatGPT was a thinking machine. He said "no, it is just a very good prediction machine".