Da Vinci spent 16 years painting the Mona Lisa. He painted her lips, according to some, after 12 years.
The rumors that a slow Internet connection was to blame are false.
A new Google Research-developed text-to-image generative vision transformer, on the other hand, would have been appreciated by Da Vinci, a polymath who also dabbled in painting, botany, engineering, science, sculpture, and geology.
According to a June 1 paper on the arXiv preprint server, Google’s StyleDrop lets users specify artistic styles they want included in the generated output.
In about three minutes, StyleDrop returns images that meet the user’s requirements.
In its report titled “StyleDrop,” Google stated, “The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. “Generation of Text to Image in Any Style”
Additionally, StyleDrop creates typography that faithfully incorporates image stylistic characteristics.
Users could, for instance, suggest a bridge or a letter and then specify a drawing style. Melting golden rendering, wooden sculpture, three-dimensional rendering, cartoon drawing, or any other preferred style are examples of such styles. The only limitation is one’s imagination.
Credit: After that, Google StyleDrop will produce impressive renderings of things with a dripping bridge reminiscent of Dali or a cartoon-like version, as well as letters with the same characteristics.
StyleDrop works with Google’s Muse, a generative vision transformer that was unveiled earlier this year and offers a remarkable degree of photorealism. StyleDrop is integrated with Muse. Muse was taught to generate high-quality images using three billion parameters.
The accuracy and quality of StyleDrop’s output were evaluated by researchers using user feedback and the industry standard CLIP text and style scoring. StyleDrop “convincingly outperforms” other leading image- and text-generation techniques, such as DreamBooth, Imagen, and Stable Diffusion, according to evaluations.
The creators of this program, which has not yet been made available to the general public, believe that it will be of great assistance to art directors and graphic designers in the creation of photorealistic imagery of particular products or themes as well as text that adheres to the same colors, structure, and style.
An artist could propose in a few words for a new product campaign, say for a new soda brand, a sleek-shaped glass bottle surrounded by thousands of tulips in a Dutch field. The accompanying text would feature letters made of 3D-rendered glass in the style of Impressionist Monet. With the right words, a new advertising campaign with a warm, colorful skyscape could be created in three minutes.
“Typography needs to be felt” was a quote by the well-known typographer Helmut Schmidt. Experience with typography is necessary. “Designers may be able to bring a greater sense of intimacy and connection to their work with the assistance of StyleDrop.
However, copyright protection is acknowledged in the report as a concern.
The report stated, “We urge the responsible use of our technology and recognize potential pitfalls such as the ability to copy individual artists’ styles without their consent.”
Furthermore, what instructions would Da Vinci have followed for StyleDrop? Draw an image of an alluring aristocrat, sort of grinning but not to an extreme, sitting outside with mountains behind the scenes. Draw in the manner of… Da Vinci. Leonardo, who enjoyed botany, would have had plenty of time to go outside and appreciate those roses if the task had been completed in three minutes as opposed to 16 years.
More information: Kihyuk Sohn et al, StyleDrop: Text-to-Image Generation in Any Style, arXiv (2023). DOI: 10.48550/arxiv.2306.00983