I Tried Out Gemini’S New Native Image Gen Feature, And It’S Fricking Nuts

Google has introduced native image generation and editing with the new Gemini 2.0 Flash Experimental model.
It’s available on AI Studio for free right now, and you can generate a series of consistent images and edit them using simple prompts.
You can remove and add objects, insert text, colorize photos, generate a visual story, and do much more.

We have been hearing the term ‘ natively multimodal ‘ in the AI space for over a year, but companies were slow in unlocking full multimodal capabilities of their AI models until now. Google has finally released its latest “Gemini 2.0 Flash Experimental” model with the ability to generate and edit images natively .

Now, you might be wondering, what is the big deal with image generation? AI image generation has been available with all major AI chatbots like ChatGPT for quite some time. Well, when we generate AI images on ChatGPT or Gemini, the prompt is routed to a specialized Diffusion-based model like Dall-E 3 or Imagen 3. The said models are trained on images and designed only to generate images; they are like an extension to the main AI model and not part of it.

However, language-vision models like Gemini are natively multimodal, meaning they can inherently understand, generate, and modify both text and images. Until now, no tech company had made this capability available to users. OpenAI demonstrated its native image generation feature with GPT-4o in 2024, but again, it was never released.

With native image generation, you get better consistency as multimodal models are trained on a large dataset of different modalities. As a result, such models boast better understanding of concepts and exhibit broader world knowledge.

Beyond image generation, you can seamlessly edit images with simple prompts. For example, you can upload an image and ask the model to add sunglasses, insert legible text, remove objects, and more to the image. And unlike Diffusion models which regenerate the whole image with each new prompt, natively multimodal models maintain consistency across multiple modifications.

Native Image Generation with Gemini 2.0 Flash Experimental

Currently, the native image generation feature is not available to general users. The Gemini 2.0 Flash Experimental model with native image generation is only available on Google’s AI Studio ( visit ) for free.

After previewing the model on AI Studio, it will be released on Gemini for everyone to use in the near future. However, I tried out the new Gemini model with native image generation, and it was quite the exciting experience.

First, I started with a visual guide to showcase the consistency of Gemini’s native image generation capability. I asked Gemini to create a visual guide on how to make an omelet, generating an image for each step of the process.

As you can notice, the results are highly consistent across images with no glitches. Even the bowl is the same in the second image. Finally, you can download the images in 1024 x 680 resolution. This way, you can create a visual guide on anything you want.

Next, I asked Gemini to create an aesthetic table and then told it to show the table from the center camera angle. It did a perfect job. After that, I prompted Gemini to add a PlayStation to the table and give me a closer look. Again, Gemini nailed it. The AI model, as you see below, also included a reflection of the PS5 in the mirror behind it.

Native Image Editing with Gemini 2.0 Flash Experimental

To demonstrate native image editing, I uploaded an image from my gallery and asked Gemini 2.0 to remove the wine glass from the table. Following that, I told Gemini to add mushrooms to the pizza, and it did a wonderful job. Then, I prompted Gemini to add a croissant and there you have AI image editing in full glory, thanks to Gemini’s native multimodal capability.

Next, I uploaded an image of mine, and asked Gemini to add sunglasses and then add the “Beebom” text on my t-shirt. Both were done quite well.

Lastly, I asked Gemini to colorize an image, and it worked really well too. I mean, the image came out more beautiful than it was before, without any weird glitches, artifacting, or part of the image missing.

There are many such use cases that you can try with Gemini’s new multimodal capability. Google has done a commendable job with native image generation and editing, and I’m planning to use it more rigorously in the coming weeks to test its limits.

After the release of Veo 2 for video generation and Imagen 3 for specialized image generation, it appears Google has outclassed OpenAI in many areas; not just AI text generation. So, it would be interesting to see what OpenAI does next to reclaim the top spot with ChatGPT.

I Tried Out Gemini’s New Native Image Gen Feature, and It’s Fricking Nuts - 2

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

openai releases gpt-4.5 ai model to chatgpt pro users - 10

OpenAI has finally unveiled the GPT-4.5 model and it’s rolling out to ChatGPT Pro subscribers. ChatGPT Plus users will get GPT-4.5 next week.
It’s not a frontier model, and doesn’t outperform o-series reasoning models, but delivers better performance than GPT-4o.
OpenAI says GPT-4.5 has a thoughtful personality and excels at creative writing. It also exhibits fewer hallucinations.

OpenAI introduced GPT-4o, a non-reasoning model to ChatGPT users back in May 2024. Finally, over 10 months later, the hot AI startup has unveiled its next-generation GPT-4.5 AI model, codenamed ‘Orion’ today. GPT-4.5 is the last non-reasoning model from OpenAI as the upcoming GPT-5 will merge the o3 reasoning model to create a unified AI system.

OpenAI says GPT-4.5 is the “largest and most knowledgeable language model” developed by the company so far, but it’s not a frontier model. It’s designed to be more general-purpose than STEM-focused o-series reasoning models.

It means that GPT-4.5 excels at creative writing, natural conversation, practical problem-solving, and offers a broader knowledge base. Note that it’s a multimodal model so it can process images and files too.

What is interesting is that GPT-4.5 exhibits fewer hallucinations than GPT-4o . Its hallucination rate dropped to 37.1% from GPT-4o’s 61.8%. And GPT-4.5’s accuracy improved to 62.5% from GPT-4o’s 38.2%. Apart from that, early testers say that GPT-4.5 is “warm, intuitive, and natural” during conversations.

Image Credit: OpenAI

As for benchmarks, GPT-4.5 outperforms GPT-4o in MMLU across 14 languages. Next, in SWE-bench Verified which evaluates the ability to solve real-world software issues, GPT-4.5 achieves 38% while GPT-4o gets 30.7%. That said, it performs worse than the o1, o3, and o3-mini reasoning models.

In the new SWE-Lancer benchmark developed by OpenAI which evaluates performance on real-world, economically valuable software engineering tasks, GPT-4.5 solved 32.6% of the tasks, compared to GPT-4o’s 23.3%. In GPQA (Science), GPT-4.5 scored 71.4% while GPT-4o got 53.6%.

Image Credit: OpenAI

About availability, GPT-4.5 is rolling out to ChatGPT Pro users starting today. And OpenAI says starting next week, GPT-4.5 will be available to ChatGPT Plus, Team, and Edu users.

All in all, it appears scaling LLMs via pre-training has hit a wall , and that’s why OpenAI says GPT-4.5 will be the last non-reasoning model. In the benchmark numbers, it’s clear that o-series reasoning models perform exceptionally well, even on older base models.

Nevertheless, in every aspect, GPT-4.5 performs better than GPT-4o while being 10x more efficient. It has a refined personality, produces superior writing, and has a broader world knowledge. Now, anticipation builds for the unified GPT-5 AI system which will integrate the o3 reasoning model. It’s likely to be released in May this year.

I Tried Out Gemini’s New Native Image Gen Feature, and It’s Fricking Nuts - 13