Google can turn text into realistic images using AI – is it a threat to photographers?

New AI can create any image from simple text
A robot couple fine dining with Eiffel Tower in the background (Image credit: Google Imagen)

Google's latest research effort delves into an artificially intelligent text-to-image diffusion model that can produce unimaginably real-looking images from a simple phrase, prompt or text description. Imagen is the name of this very cool AI and it could potentially be doing our jobs as photographers for us.

A deep level of language understanding is paired with unprecedented photorealism to create fabricated images via what Google is calling a text-to-image diffusion model. If you can imagine it, chances are Imagen can create it. 

Searching for the best photo editing software?

Ever wondered what an alien octopus floating through a portal reading a newspaper might look like? Imagen has already created the image so that you no longer have to. This industry-changing diffusion model could have limitless potential and application into our lives when (eventually) it is made available to the public. 

Brought to you by Google Research's Brain Team, the Imagen AI model can curate virtually anything you can think of or type into words, by producing a scarily accurate looking image that could have easily been captured by someone using a smartphone camera or even a DSLR, with added post-production. So should we be worried about this new AI entering the business and stealing our jobs? Not just yet.

New AI can create any image from simple text (Image credit: Google Imagen)

While Google is usually very proud and pushy of its latest developments, it has made clear statements that this latest text-to-image diffusion model is in no way ready to be accessed by the public just yet. 

Limitations and societal impact are being considered by the team, with several ethical challenges broadly facing text-to-image research that include racial and gender bias as a result of researchers having had to rely heavily on large web-scraped datasets that are mostly uncurated. 

Read more: What is an AI camera? How AI is changing photography?

A statement on Google's Imagen research site suggests that: "Datasets of this nature often reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups. While a subset of our training data was filtered to remove noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including racist slurs, and harmful social stereotypes."

A cute corgi lives in a house made out of sushi (Image credit: Google Imagen)

The site continues to elaborate that "Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes... We aim to make progress on several of these open challenges and limitations in future work."

Imagen is also said to exhibit serious limitations when generating images that depict people and human faces, hence why most of the released sample images we have seen so far are of animals or objects.

A chrome-plated duck with a golden beak arguing with an angry turtle in a forest (Image credit: Google Imagen)

The team working on Imagen have published a full research paper detailing the complete mathematical and technological workings behind this AI, but in short, it uses a large frozen T5-XXL encoder to encode the input text into embeddings. A conditional diffusion model then maps the text embedding into a 64×64 image, which can be further upsampled as Imagen utilizes text-conditional super-resolution diffusion models to create 64×64→256×256 and 256×256→1024×1024 images.

They've even created a handy diagram to explain things a bit better, (see below).

Diagram of how Imagen creates images from text (Image credit: Google Imagen)

While some images like the octopus one look a little cartoonish and sort of like the subject is made from clay, from a photography perspective most of the example images created by Google Imagen appear to do a fantastic job at employing basic photography techniques such as depth of field, composition, and key focus points. 

Having admittedly not read the entire research paper it's unclear in simple terms as to how the AI actually creates these images, and whether it takes samples and snippets from an already existing pool of license-free images to create the unusual prompted ones from the text provided.

A photo of a raccoon wearing an astronaut helmet, looking out of the window at night (Image credit: Google Imagen)

As for when we could potentially see the AI being accessible for all, Google suggests via its research site that "There is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place. In future work we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access."

A giant cobra snake on a farm. The snake is made out of corn (Image credit: Google Imagen)

It's unclear exactly how Google intends to use the Imagen Diffusion Model, and when it's potential can have worldwide implementation. But for now, it's a pretty cool research avenue, and isn't a threat to photographers just yet until its biased stereotyping and human face rendering can be fixed. 

Read more:

Best free photo editing software
Best noise reduction software
AI and AR are the future of visual storytelling, according to Canon
Best tablet for photo editing 

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

Beth Nicholls
Staff Writer

A staff writer for Digital Camera World, Beth has an extensive background in various elements of technology with five years of experience working as a tester and sales assistant for CeX. After completing a degree in Music Journalism, followed by obtaining a Master's degree in Photography awarded by the University of Brighton, she spends her time outside of DCW as a freelance photographer specialising in live music events and band press shots under the alias 'bethshootsbands'.