
AI Model that has Surpassed GPT4: Meet Google Deep Mind’s Gemini
The transformation of AI has indeed had a profound impact on human evolution. AI is a future that has already arrived. Google recently announced the much-awaited AI model, Gemini, in its blog.
In a recent blog, Mr. Sundar Pichai, CEO of Google, wrote a note. He explained how this newfound technology will help everyone across the globe.
Pichai claims that “this AI is built for the new generation.” It is inspired by how people understand and interact with the world
More hands-on feedback on its performance still needs to be included, but reports have claimed it to be outstanding.
But, many suspect this AI model will surpass Open AI’s GPT4.
Gemini flaunts “multimodality,” allowing it to understand different data types. It can work with data simultaneously, including text, code, audio, images, and video.
It comes in 3 different sizes:
- Ultra (for highly complex tasks)
- Pro (for scaling across a wide range of functions)
- Nano (on-device tasks)
The launch of Gemini has made some shock waves with its claims. But is it as good as it claims to be?
Let’s find out.
Technical Reports
Google DeepMind states that Gemini surpasses GPT-4 on 30 out of 32 standard performance measures.
Google conducted several benchmark tests comparing Gemini with GPT-4. Achieving an impressive score
- 90 % on the Massive Multitask Language Understanding (MMLU) test
- Surpassed human experts (89.8 %)
- Outperformed GPT-4 (86.4 %)
MMLU used a combination of 57 subjects. This included math, physics, history, law, medicine, and ethics. This helped in testing world knowledge and problem-solving abilities.
Parameter of Comparison
Sure, the results are great. However, it is important to notice that Google conducted these tests using an outdated version of GPT-4.
Google also used different prompting techniques for the two models. “5-shot” for GPT4 and “chain-of-thought with 32 samples.” for GEMINI.
In machine learning, “shot” refers to the number of examples provided during training. The number, such as 5 shots, would indicate five instances of each class.
Google used different prompting techniques for other benchmarks:
- HumanEval for Python coding tasks
- DROP for reading comprehension and arithmetic
- GSM8K (Grade School Math 8K) for grade school math reasoning
In September, OpenAI announced a model called GPT-4Vision. This can work with images, audio, and text as well.
GPT-4Vision can produce images. In this, generating text prompts are passed to a separate deep-learning model called Dall-E 2. This converts text descriptions into images.
In contrast, Google claims Gemini to be “natively multimodal.” This means the core model directly handles various input types (audio, images, video, and text). This may provide faster and more prompt results.
However, it is not a fully multimodal model like Gemini boasts to be.
Google Gemini Best AI Model?
“Hands-on with Gemini: Exploring Multimodal AI Interaction.” Google presented this as a video. It was revealed on December 6, 2023, as a competitor to OpenAI’s GPT-4.
This highlighted the capabilities of the multimodal system in handling various inputs. This revelation created a buzz across the internet.
Although the video description stated, “For this demo, latency has been reduced, and Gemini outputs have been shortened for brevity.”
Mr. Vinyals proved that the video misrepresented the actual performance of the AI model.
Vinyals said in the post:
“All the user prompts and outputs in the video are real, shortened for brevity. The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”
Google mentioned that the Gemini Ultra model achieves the highest accuracy. It does this by using a chain-of-thought prompting approach.
This approach uses multiple samples to produce a sequence of responses. The model considers model uncertainty by checking for a consensus in the samples. If a consensus is reaching above a threshold, it selects that answer. Otherwise, it resorts to a greedy sample that is next in the maximum likelihood. A greedy sample means choosing the most probable or likely next word.
In conclusion, it is difficult to comment on Gemini’s ability. Google still needs to release Gemini 1.0 Ultra. Hence, the results are yet to be validated without trying the product in real time.
They have released a deceptive demo video. This is different from the true representation of Gemini’s capabilities.
The video showed the model commenting very smoothly on a live video stream, which is a lie. Still, the landscape of AI tools is changing for the better. This is indeed an exciting step by Google for a multimodal tool.
Did you know that GPT-4 is training on 500 billion words? Even for a bot, that is a lot of data.
What do you think? Will Gemini be able to achieve extensive training like GPT did and improve itself for the better?
Only time will tell.