ChatGPT Expands with Voice and Image Abilities - VOA Learning English

The artificial intelligence (AI) tool ChatGPT has added new abilities that include voice and image recognition.

The changes permit some users to directly ask ChatGPT questions and receive voice answers. In addition, the tool can recognize images and provide information about what is in them.

ChatGPT is a chatbot, a computer-powered tool designed to interact smoothly with humans and perform high-level writing. The technology is also known as “generative AI.”

The creator of ChatGPT, OpenAI, announced the tool’s latest additions, or upgrades, this week. Currently, the new voice and image upgrades are only available to users of ChatGPT’s Plus and Enterprise services.

ChatGPT’s main service, called GPT-3.5, is free for all users. ChatGPT Plus costs $20 per month. The Enterprise service is designed for individual companies, with costs tied to the services used by the business.

OpenAI explained that ChatGPT Plus and Enterprise users would be able to use the voice and image additions over the next two weeks. The upgraded tools would be made available to other groups of users, including developers, “soon after.” The voice and image upgrades will also be added to devices using the iOS and Android systems in the near future.

The company said ChatGPT’s new voice control is designed to provide a natural way for users to communicate with the AI tool in a way that is similar to speaking with a human. But it noted the chatbot can do more than answer questions. It can also tell a story to children or provide detailed instructions for making or building something.

Users can choose different voices they want the chatbot to use. The company said it worked closely with professional voice actors to make the interactions more realistic and personal.

The voice interaction ability of the ChatGPT upgrade already exists in many voice assistance systems. These include Amazon’s Alexa, Alphabet’s Google Assistant, Apple’s Siri, and others. American software maker Microsoft added voice controls to its new ChatGPT-powered Bing search engine earlier this year.

Another notable change to the ChatGPT tool is image recognition. This permits users to upload a photo to the system and then get information about what is contained in the picture.

For example, the company says a user could take a picture of what is currently available in their refrigerator. After entering the photo into ChatGPT, the tool could suggest dinner possibilities based on what the person has. The system could also provide step-by-step instructions for preparing the meal.

Another example given is a parent who might take a picture of a child’s math problem and then seek advice on how to explain to the child how to solve it. There is even a way for users to mark areas of the image – for example with a circle – to get more specific information or help with that element.

Along with its announcement, OpenAI issued another warning about how its ChatGPT tool can easily get things wrong. It noted that because the system is trained using massive amounts of publicly available information, it can return results that are false, outdated or discriminatory.

The company urged all its users to watch out for misinformation and to attempt to verify the information provided by chatbots.

OpenAI announced its AI technology is also being used by the digital music service Spotify. ChatGPT is being used to power a system designed to permit Spotify podcasters to translate their shows into different languages. The translations are completed in the podcasters’ own voice in an effort to make them sound more natural, OpenAI said.

Spotify says the first languages to be added in the coming weeks will be Spanish, French and German.

I’m Bryan Lynn.