- TheWhiteBox by Nacho de Gregorio
- Posts
- Anthropic's Claude Can Now Ingest All Six Star Wars Films At Once
Anthropic's Claude Can Now Ingest All Six Star Wars Films At Once
š TheTechOasis š
š¤ This Weekās AI Insight š¤
Weāve grown accustomed to continuous breakthroughs in AI over the last few months.
But not record-breaking announcements that set the new bar at 10 times the one before, which is precisely what Anthropic, OpenAIās biggest rival, has done with its newest version of Claude, their ChatGPT competitor.
Now, youāll soon be turning hours of text and information searchesā¦ into seconds.
A Chatbot focused on harmlessness
Albeit the countless benefits Generative AI is bringing to the world, as with anything in technology, it came with a trade-off.
With GenAI weāve opened a window for this technology to generate stuff, like text, or images, which is awesome.
But the problem is that GenAI models lack awareness of whatās āgoodā or ābadā and are trained with a humongous amount of raw data in almost every form you can imagine, data that carries in many cases debatable biases and dubious content.
Sadly, as these models grow better as they get bigger, the incentive to simply give it any possible text you can find, no matter the content, is particularly enticing.
This has led to several cases where these models have acted in a sketchy, almost vile way towards their uses, as weāve seen in cases like Bing, forcing Microsoft to act.
Robot in the style of Hebru Brantley, Diffusion model
To prevent this, these based models have been trained with humans in feedback, a concept dubbed as Reinforcement Learning from Human Feedback or RLHF, to create Instruction-based models that are capable of responding, almost every time, following certain guidelines those humans gave them.
Examples of such models include ChatGPT, or Bard.
But as we saw with Bing (based on ChatGPT), this solution isnāt perfect.
For that reason, Anthropic decided to take it a step further with a concept described as Constitutional AI, a new training paradigm with one sole objective, creating the first real harmless chatbot.
And this takes us to Claude.
Allegedly harmless and now super powerful
The biggest difference between Claude and other chatbots is that it was trained against a Constitution.
But what does that mean?
Using several documents like the Universal Declaration of Human Rights, this model not only was taught to predict the next word in a sentence (like any other language model) very well, but it also had to take into account, in each and every response it gave, a Constitution that determined what it could say or not.
But what could really make all the difference for Claude is that, this week, Anthropic has announced that it has become 10 times more powerful.
Specifically, it has increased its context window from 9k tokens to 100k. An unprecedented number that has incomparable implications.
Let me digress.
Itās all about tokens
Despite what many people may tell you, LLMs donāt predict the next word in a sequenceā¦ at least not literally.
They predict the next token, which usually represents between 3 and 4 characters. Naturally, these tokens may represent a word, or words can be composed of several of them.
For reference, 100 tokens represent around 75 words.
To do so, it breaks the text you gave it into parts and performs a series of matrix calculations, a concept defined as self-attention, that combine all the different tokens in the text to learn how each token impacts all the rest.
That way, the model ālearnsā the meaning and context of the text and, that way, can then proceed to respond.
The issue is that this process is computationally intensive for the model.
To be precise, the computation requirements are quadratic to the input length, so the longer the text you give it, described as the context window, the more expensive is to run the model, both in training and in execution time.
These forced researchers to considerably limit the allowed size of the input given to these models to around a standard size between 2k to 8k, the latter of which is around 6,000 words.
This is okay for chatting, but what if you want to summarize an entire book?
Not a chanceā¦ until now.
āAll the knowledge in the worldā, Diffusion model
The Great Gatsby in seconds
Iāll get to the point.
The newest version of Claude can ingest, in one go, 100,000 tokens, or around 75,000 words.
I know that didnāt mean that much to you, so let me give you some references:
Thatās around the length of Mary Shelleyās Frankenstein book
A human would take around 5 hours nonstop to read that amount of words
Itās enough to include all dialog from 8 Star Wars filmsā¦ combined
Now, think about a chatbot that can, in a matter of seconds, give you the power to ask it anything you want about that text.
This is the pinnacle solution for lawyers, research scientists, and basically anyone or any company that needs to go through lots of data at once.
If you want to see this Claude version live, you can check Assembly AIās awesome video.
The technology we thought was decades away is now hereā¦ one token at a time.
Key AI concepts youāve learned from reading this article:
- Large Language Model Token
- Constitutional AI
- Context Window of LLMs
š¾Top AI news for the weekš¾
š£ Stability AI launches their first text-to-animation model
š Google expands TensorFlowās capabilities
šØš¼āš« Deep dive into Claudeās constitutional AI
š¼ Google releases MusicLM, its text-to-music model