ChatGPT's advanced abilities, such as debugging code, writing an essay or cracking a joke, have led to its massive popularity. Despite its abilities, its assistance has been limited to text -- but that is going to change.
On Tuesday, OpenAI unveiled GPT-4, a large multimodal model that accepts both text and image inputs and outputs text.
Also: How to make ChatGPT provide sources and citations
The distinction between GPT-3.5 and GPT-4 will be "subtle" in casual conversation. However, the new model will be way more capable in terms of reliability, creativity, and even intelligence.
According to OpenAI, GPT-4 scored in the top 10% of a simulated bar exam, while GPT-3.5 scored around the bottom 10%. GPT-4 also outperformed GPT-3.5 in a series of benchmark tests as seen by the graph below.
For context, ChatGPT runs on a language model fine-tuned from a model in the 3.5 series, which limit the chatbot to text output.
OpenAI's GPT-4 announcement followed an address from Andreas Braun, CTO of Microsoft Germany, last week, in which he said GPT-4 would be coming soon and would allow for the possibility of text-to-video generation.
Also: How does ChatGPT work?
"We will introduce GPT-4 next week; there we will have multimodal models that will offer completely different possibilities -- for example, videos," said Braun according to Heise, a German news outlet at event.
Despite GPT-4 being multimodal, the claims of a text-to-video generator were a bit off. The model can't quite produce video yet, but it can accept visual inputs which is a major change from the previous model.
One of the examples OpenAI provided to showcase this feature shows ChatGPT scanning an image in an attempt to figure out what about the photo was funny, per the user's input.
Other examples included uploading an image of a graph and asking GPT-4 to make calculations from or uploading a worksheet and asking it to solve the questions.
Also: 5 ways ChatGPT can help you write an essay
OpenAI says it will be releasing GPT-4's text input capability via ChatGPT and its API via a waitlist. You will have to wait a bit longer for the image input feature since OpenAI is collaborating with a single partner to get that started.
If you are disappointed about not having a text-to-video generator, don't worry, it's not a completely new concept. Tech giants such as Meta and Google already having models in the works. Meta has Make-A-Video and Google has Imagen Video, which both use AI to produce video from user input.