New machine learning models make AI artists even better

AI-powered machine learning models including DALL-E and Midjourney are redefining art. And new models could take these AI artists into the classroom

Video game designer Jason Allen made headlines this year with Théâtre D’opéra Spatial, his submission to the Colorado State Fair’s digital arts competition. Judges awarded him first place and $300 prize, but the artwork also received a sudden flurry of global attention when it was discovered Allen had used AI-powered image generator Midjourney to create the work of art.

Midjourney, DALL-E and DALL-E 2 have brought a wealth of weird and wonderful images to the world as users type in natural language descriptions and share the dream-like results.

DALL-E 2 uses a “diffusion model”, which attempts to take the input text in its entirety and generate an image from that. But the output becomes less accurate as that text becomes more complex; the existing model appears to struggle to understand composition of concepts, and confuses attributes and relations between different objects. 

Scientists from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) say they looked at the problem from a different angle by adding models together so they could cooperate, which was seen to produce more creative combinations in the final images.

“DALL-E 2 is good at generating natural images but has difficulty understanding object relations sometimes,” says MIT CSAIL PhD student and co-lead author Shuang Li, “Beyond art and creativity, perhaps we could use our model for teaching. If you want to tell a child to put a cube on top of a sphere, and if we say this in language, it might be hard for them to understand. But our model can generate the image and show them.”

Machine learning models help to learn about language

The team’s model – Composable Diffusion – uses diffusion and compositional operators to combine text descriptions without further training, which more accurately captures text details. One example using this model - which called for “a pink sky” and “a blue mountain on the horizon” and “cherry blossoms in front of the mountain” - produced an accurate image, while the original model returned a blue sky, but gave everything in front of the mountains a pink colour. 

Images using a new method developed by MIT researchers for the request “a train on a bridge and a river under the bridge”

“The fact that our model is composable means that you can learn different portions of the model, one at a time,” says co-lead author and MIT CSAIL PhD student Yilun Du. “You can first learn an object on top of another, then learn an object to the right of another, and then learn something left of another. Since we can compose these together, you can imagine that our system enables us to incrementally learn language, relations, or knowledge, which we think is a pretty interesting direction for future work.”

The research - supported by Raytheon BBN Technologies Corp., Mitsubishi Electric Research Laboratory, and DEVCOM Army Research Laboratory - has received the approval of DALL-E2’s co-creator Mark Chen.

“This is a nice idea that leverages the energy-based interpretation of diffusion models so that old ideas around compositionality using energy-based models can be applied,” says Chen, who is a research scientist at OpenAI, the company behind DALL-E.

Share

Featured Articles

Mobile AI in 2024: Unlocking smartphone opportunities

From Samsung, to Google, to Qualcomm, AI Magazine considers how enterprises are unlocking further value in Mobile AI via smartphones and other devices

A year of events: Tech LIVE Virtual, Cloud & 5G LIVE & more

We look back at our events from 2023, which focused on some of the hottest topics in technology: from sustainability and AI to quantum computing

Magazine roundup: Top 100 women in technology 2023

We take a look at some of the leading women in the tech sector and how their contributions to the field are advancing global digital transformation

OpenAI preparedness framework: Enhancing global AI safety

Machine Learning

GenAI as key to accelerating digital transformation in India

AI Strategy

Humane chooses cloud telecom Optiva BSS for AI Pin launch

AI Applications