ChatGPT Vision: What It Can and Cannot Do Currently

ChatGPT now has the capability to process images; simply take a picture of a complex serial number, and it will read and output the text for you.

The OpenAI team has been hard at work. They’ve not only integrated DALL·E into ChatGPT, but they’ve also added a new Vision feature to it.

ChatGPT Vision feature

Vision enables interaction with ChatGPT through images and photos. You can upload a photo from your phone, or via a browser if you’re using the desktop version, or you can take a new picture and upload it. After selecting the photo, click ‘Confirm,’ and then provide the question or instruction to ChatGPT.

ChatGPT will use your image as a reference, and you can ask it all sorts of things. I’ve tested it extensively, pushing it to its limits to discover its capabilities and limitations with vision. To find out more about what vision can do and assess its accuracy, continue reading.

✅ Recognizing Objects with Limited Info

First, I snapped a photo of a mobile game to see if ChatGPT could figure out what it was.

Results:

While it didn’t give the exact name of the game – since it wasn’t visible in the picture – it did correctly identify it as a Monopoly-like mobile game. To me, that’s a pretty accurate guess for an AI.

Prompt:

Mobile game resembling Monopoly

Output:

AI identified Monopoly-like game
✅ Extracting Text from an Image

Then, I snapped a photo of an article on hongkiat.com to see if ChatGPT could read the text within the image.

Result:

It managed to read and reproduce the website’s name, article title, and body text flawlessly.

Prompt:

Article photo for text extraction

Output:

Extracted text from article
✅ Extracting Selected Text from an Image

I also tested if ChatGPT could read just a part of an image by circling the text I was interested in.

Results:

It successfully followed the instruction and output the required text just as well.

Prompt:

Circled text for selective extraction

Output:

AI extracted circled text
✅ Interpreting a Real-World Photo

Later, I took a photo of a restaurant menu that included text and pictures and asked ChatGPT to itemize all the dishes along with their prices.

Result:

It did this perfectly.

Prompt:

Restaurant menu photo

Output:

Listed dishes with prices
✅ Analyzing Data from a Real-World Photo

I gave it another menu and this time asked for the total cost of certain items.

Results:

It calculated the total correctly.

Prompt:

Menu photo for cost calculation

Output:

Calculated total cost
✅ More Complex Analysis of a Real-World Photo

To further test the vision feature, I took a picture of a bookshelf to see if it could estimate the number of books in the column.

Results:

It counted 42 book spines, which is close enough, considering I estimate the actual number to be between 40 and 50.

Prompt:

Bookshelf photo

Output:

Estimated book count
✅ Creating Content from a Product Photo

Then I snapped a photo of a mug to see if it could recognize the object and generate some content for it.

Results:

The output it gave were pretty good!

Prompt:

Mug photo

Output:

Generated content for mug
❎ Retrieving EXIF Info from a Photo

However, there were tasks ChatGPT’s Vision couldn’t handle. For instance, it was unable to extract the EXIF data from the uploaded image.

Prompt:

Photo for EXIF data

Output:

Failed EXIF data retrieval
❎ Recognizing Objects in a Photo

It also can’t use internet browsing to acquire information it doesn’t know. For example, when I showed it a picture of a Pokémon and asked for its name, it guessed incorrectly, likely because it can’t reference the internet.

Prompt:

Pokémon photo

Output:

Incorrect Pokémon identification
❎ Recognizing Languages in a Photo

It struggled with foreign languages too. I showed it Chinese text, and it didn’t recognize the characters or their meaning.

Prompt:

Chinese text photo

Output:

Failed Chinese text recognition

So, those were my tests of ChatGPT’s vision feature. Overall, it’s quite a useful tool that can be employed creatively. It’s also worth mentioning that, at the time of writing this article, ChatGPT’s Vision is only available on desktop browser versions and the iOS app.

WebsiteFacebookTwitterInstagramPinterestLinkedInGoogle+YoutubeRedditDribbbleBehanceGithubCodePenWhatsappEmail