ERNIE-Image is a text to image model from Baidu released in April 2026. It uses a single stream Diffusion Transformer setup.
There are two main versions. ERNIE-Image is the standard release that has 8B parameters and it usually runs at about 50 inference steps. ERNIE-Image-Turbo is the faster distilled version and it is built for about 8 steps.
Baidu presents ERNIE-Image as an open 8B text to image model built for more exact image making, not just nice looking art. It is said to do better with text inside images prompt following and layout control than many image tools. That makes it a good fit for posters, infographics, comics, and similar work where the words and placement matter a lot.
Supports English, Chinese, and Japanese prompts with built-in light Prompt Enhancer. That helper can turn short prompts into fuller descriptions before the image is made.
Its main selling point is control. Baidu says the model is meant to handle prompts with many objects, relationships and text layout better than many art first models, which still mess up readable words or organized scenes.
That makes it more useful for real design work, not just AI art play.
The model also seems to handle different aspect ratios with suggested sizes like 1024×1024, 848×1264, and 1264×848.
Baidu is known around the world for search, but it has also worked on the ERNIE model family for years. And more lately it has shared open ERNIE family projects like ERNIE 4.5 and related work. So ERNIE-Image looks like part of a bigger move to make more of that model set public instead of keeping it behind closed products.






If you'd like to access this model, you can explore the following possibilities: