X
Innovation

How to use ChatGPT's file analysis capability (and what it can do for you)

From searching a text 'semantically' to creating outlines and captioning images, file uploads on ChatGPT expand the bot's usefulness in important ways.
Written by Tiernan Ray, Senior Contributing Writer
Digital files in rows
Yuichiro Chino/Getty Images

OpenAI a week ago unveiled a new feature for ChatGPT called "memory," which stores things you explicitly ask the program to have access to, for later use. 

Alongside memory, it's good to remember that ChatGPT can also use existing file-upload capabilities to analyze text and images. You just drag and drop a file into the chat window, such as a PDF or a JPEG, add a prompt if you like, and ChatGPT will start to produce some text output based on what you've uploaded. 

Also: How to use ChatGPT

The capability is available to all paying users of the $20-per-month "Plus" version. The Plus version has the added capability of using the latest ChatGPT, version 4, instead of version 3.5, and the quality of output can be noticeably better. Plus also allows the use of DALL-E, the image-generation program. 

The most obvious uses for file upload are summarization, outlining, and more advanced kinds of semantic search beyond just keyword search. File upload is easy: Just drag. 

The file upload function shines when presented with a long document and asked to do something such as isolate particular kinds of content thematically. This is a form of semantic search, meaning, not based strictly on an individual keyword.

For example, I uploaded a 4,500-word report on specialized semiconductors known as silicon carbide. Silicon carbide is widely used in Tesla and other electric vehicles to create what's called the traction inverter. But it has less-obvious applications. I asked ChatGPT, "In this report on silicon carbide, are there any references to non-automotive use cases?" 

Also: The best AI chatbots

ChatGPT responded with an excellent summary of six use cases that were identified in the report and did not pertain to cars. That's more powerful than having to use individual keywords. I've considered using ChatGPT as my go-to source for making a first pass at working with long documents. 

chatgpt-report-summarization
Screenshot by Tiernan Ray/ZDNET

Textual summarization is useful for long transcripts of interviews. I uploaded a 6,800-word transcript and got a usable summary of the most important topics that could be the beginning of an outline for an interview based on the transcript.

However, such summaries are not a replacement for editing and shaping a story. That kind of data compression requires identifying themes, re-phrasing them in useful ways, and, most critically, deciding which things to leave out. Those things, especially what to leave out, are currently beyond what ChatGPT can do, although more particular kinds of prompting can help.

ChatGPT's file analysis can handle picture files but not yet video. When various images are uploaded, the program does a satisfactory job of identifying the contents and even adding some descriptive copy. That can be useful for things such as captioning.

Also: The best AI image generators

ChatGPT identified the New York City skyline, the Empire State Building, and commented on the mix of old and new styles. 

ChatGPT skyline description
Tiernan Ray via ChatGPT/ZDNET

A street scene in midtown Manhattan also evoked a useful descriptive caption from the machine.

ChatGPT street scene description
Tiernan Ray via ChatGPT/ZDNET

I was able to submit a work of art based on a public-domain image of Alan Turing, whom the program identified, and annotated with commentary about the intent of the picture. 

ChatGPT Turing image annotation
Tiernan Ray via ChatGPT/ZDNET

ChatGPT offered an appropriate if bland description of ZDNET's photo of OpenAI executives Sam Altman and Mira Murati from an article on the two from November, without actually identifying the individuals. 

ChatGPT Mira and Sam photograph
Tiernan Ray via ChatGPT/ZDNET

AI's ability to analyze images and video is evolving rapidly. Alphabet's Google recently introduced its latest large language model, Gemini 1.5. The program is able to zero in on the moment in a 440-page transcript of the Apollo mission to the moon when Neil Armstrong takes "one small step" on the moon's surface. It was also able to pick out time stamps in a silent movie of Buster Keaton. Those kinds of abilities are still beyond the precision of ChatGPT's file upload.

Also: What is Gemini? Everything you should know about Google's new AI model

It's clear that document analysis will merge with the memory function in ChatGPT at some point. Typing in memories by hand at the prompt is not necessarily as efficient as providing a whole document that has all the things one wants to apply to ChatGPT, such as references and background information. A year from now, the use of memory and analysis will probably be one of the main ways that ChatGPT will have evolved from its current incarnation. 

Editorial standards