OpenAI has released highly anticipated upgrades to its ChatGPT chatbot, enabling it to interact with images and voices. OpenAI is trying to create a very smart AI that can do more than just read and understand words. They want it to be able to see things, hear things, and understand all kinds of information. This is a big step towards that goal.
In their official blog post, OpenAI stated, “We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”
Know more details: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
The enhanced ChatGPT-Plus will incorporate voice chat powered by a groundbreaking text-to-speech model that can mimic human voices. Additionally, it will possess the ability to discuss images through integration with OpenAI’s image generation models. These advancements seem to be part of what OpenAI refers to as GPT Vision (or GPT-V, sometimes mistaken for a theoretical GPT-5) and are fundamental components of the enhanced multimodal version of GPT-4 that OpenAI hinted at earlier this year.
This upgrade follows OpenAI’s introduction of DALL-E 3, their most advanced text-to-image generator to date. Early testers have lauded DALL-E 3 for its remarkable quality and accuracy. This innovative tool can generate high-fidelity images based on text prompts while comprehending complex context and concepts expressed in natural language. DALL-E 3 will be integrated into ChatGPT Plus, a subscription-based service offering ChatGPT powered by GPT-4.
The integration of DALL-E 3 and conversational voice chat underscores OpenAI’s commitment to developing AI assistants capable of perceiving the world much like humans do – with multiple senses.
According to OpenAI, “Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it.”