OpenAI’s ChatGPT can now talk, hear and see: a multi-modal upgrade in history

With the addition of additional voice and image capabilities in ChatGPT, OpenAI is once again pushing the boundaries of AI technology. These features are expected to change the way users interact with the AI ​​model, providing a more intuitive and immersive experience.

Voice conversations with ChatGPT

One of the most notable aspects of this version is the ability to carry out voice conversations using ChatGPT. Users can now chat with their AI assistant in real time, creating a world of possibilities. ChatGPT’s voice skills are ready to help you whether you’re on the go, looking for a bedtime story for your family, or resolving a disagreement at the dinner table.

To start using voice, go to the Settings menu in the mobile app, select “New Features,” and enable voice conversations. Once activated, press the headphones icon in the top right corner of the home screen to select one of the five voices. Professional voice actors have meticulously developed these voices to provide a human-like audio experience. Additionally, Whisper, OpenAI’s open source speech recognition system, transcribes spoken words into text, improving the overall quality of the conversation.

Image interaction with ChatGPT

The ability to share photos with ChatGPT is another game changer. Users can now use ChatGPT to troubleshoot, explore material, and evaluate complex data by displaying one or more photographs. ChatGPT can help you figure out why your grill won’t start, design a dinner based on the contents of your refrigerator, or analyze a data graph for work.

See also  Mac Studio Review: Exploring the Mac M2 Max and M2 Ultra!

Tap the photo button to capture or select an image to use this feature. Tap the add button first on iOS or Android to upload multiple photos, or use the drawing tool to guide your assistant. Multimodal models, such as GPT-3.5 and GPT-4, boost these imaging capabilities by applying linguistic reasoning skills to a wide range of visual inputs, such as photographs, screenshots, and documents comprising text and images.

Phased deployment for security and resilience

Voice and picture capabilities will be gradually delivered to Plus and Enterprise subscribers over the next two weeks. Voice will be available on iOS and Android platforms, with the option to participate through settings, while photos will be available on all devices.

OpenAI recognizes the dangers that come with this increase in capabilities. The emphasis of voice is on voice chat and the technology was created in partnership with voice actors to ensure authenticity and security. Notably, Spotify is leveraging this technology for its voice translation service, which allows podcasters to grow their audience by translating content into multiple languages ​​using their own voices.

To protect people’s privacy, OpenAI has limited ChatGPT’s ability to analyze them and make direct statements about them using images. Real-world use and user input will be critical to further improve these safeguards while ensuring the usability of the tool.

Subscribe to our latest newsletter

To read our exclusive content, register now. $5/Monthly, $50/Yearly

Categories: Technology
Source: vtt.edu.vn

Leave a Comment