Why AI is LEGO

October 24, 2024November 11, 2024

**AI is just like LEGO** — image from https://www.lego.com/en-dk/product/lots-of-bricks-11030

To understand why AI is transformative you have to stop thinking about the use-cases you can see, and understand that underneath there is a fundamental way of representing a “problem domain” which opens up 100s of specific use-cases with each incremental advance of the underlying model.

As a bad analog, think about the smartphone. This is a computer with internet / telecommunication, accelerometer, camera, and GPS. These components can be used for multiple use cases: PokemonGO, find you friends, snapchat, counting steps, reminding you to call mom on Sundays etc. All from the same basic components. AI is no different, the same fundamental architectures can be used for multiple different use-cases, and as the field progress it opens up for more use-cases that we can dream of. Let me give some examples that might help this kind of thinking.

Bonus. If you read to the end, you’ll even get to understand the LEGO analogy.

LLMs

Most people have heard about chatGPT, some about Claude, even fewer about LLaMA. All of these are for many really useful chatbots, capable of answering all sorts of homework questions, crafting recipes for what you have left in the fridge, to suggesting really buggy code. From these examples it can be hard to see AI as a transformational tool. But it really is, when you start to understand what happens underneath the hood.

These chatbots are built on LLMs, large language models, which essentially works by predicting the next words in a sentence. Given the sentence: a cat in a <>, it has an internal representation of what is the most likely next word and would likely suggest <hat>. Essentially, this is predicting next discrete event.

Now start to think about all the problems that can be modelled as predict next discrete event. We have built a formulation AI using these principles, predicting the next likely ingredient in the ingredient list of a skincare product. In drug discovery you can model small molecules as next “character” in a SMILES string. In trading you deal with time series data and want to predict the next event as well. There are many more examples like: music generation, protein prediction, energy consumption patterns, production scheduling, customer behavior modeling and many many more.

All follows the same basic recipe:

Input sequence
-> Embedding layer
-> Positional encoding
-> Self-attention layers
  -> Query, Key, Value matrices
  -> Attention scores
  -> Weighted sum
-> Feed-forward networks
-> Layer normalization
-> Output predictions

That is kind of amazing to think about. It is all about reducing your problem to tokens representing meaningful discrete events.

Image based systems

But what about images, things we see that are not easily described in words. As this field as a bit more diverse in its output, from generation of images from text (midjourney, flux etc.), classification, segmentation, and bounding boxing objects, the architectures differ — but the premise holds. This technology is transformative as the number of use-cases explodes from just a very few instances of basic modelling approaches. Let us take for instance: U-net, YOLO and SAM. U-net was first used for medical segmentation, YOLO is used for bounding boxing objects in an image, and SAM stands for Segment Anything Model and can segment most everyday objects based on user input. From these three models alone you can solve the following.

Unet: medical image segmentation, satellite image analysis, defect detection in manufacturing, image restoration, depth estimation

YOLO: real-time object detection, traffic monitoring, security surveillance, sports analytics, robotics vision

SAM: interactive segmentation, autonomous driving, retail inventory analysis, quality control

And it doesn’t stop at images. Similar to LLMs you can think of all the problems that can be efficiently or cleverly represented as a 2D grid problem. These are typically problems where you have a time and a structure component. For instance, speech recognition, audio denoising, sound event detection, ECG/EEG analysis, financial market analysis, sensor data analysis, particle physics and many many more.

AI is LEGO

AI is a creative and transformative tool that you can think of as LEGO. You have a few basic building blocks from which you can build vastly different things using a set number of connection principles. They can be re-configured for new purposes, and go from simple to complex structures. And you can play out an endless amount of narratives, and if you do it right you will:

Learn through play
Have iterative improvements
Understand the fundamentals
Solve problems creatively
Explore endless possibilities from finite pieces