The text encoders for Firefly are trained on unlicensed content, making it commercially unsafe.

Report · Oct 11, 2023

Steps to reproduce:

Use Adobe Firefly image generation tools which accept text as part of the input.
Observe that the text is used to influence the image generated.
Observe that CLIP and T5 both used LAION 2b, C4 or similar, along with other corpuses, in their training.
Observe that the conditioning for Sensei also leverages the text encoder, which in all senses of the word "train," strict or relaxed, means that Sensei's "image model" was "trained" with data from LAION2b, C4 or similar.
Since LAION and other datasets in CLIP and the text in T5 are not exclusively from Adobe Stock; and there's no evidence that Adobe has developed its own "text encoder," of either architecture; it appears that the generated output uses unlicensed content.

> The first commercially released Firefly model was trained on Adobe Stock images, openly licensed content, and public domain content where copyright has expired and is designed to generate images safe for commercial use.

https://helpx.adobe.com/stock/contributor/help/firefly-faq-for-adobe-stock-contributors.html

> Trained on Adobe Stock images, openly licensed content, and public domain content, Firefly is designed to be safe for commercial use. To ensure that creators can benefit from generative AI, we have developed a compensation model for Adobe Stock contributors whose content is used in the dataset to retrain Firefly models.

https://www.adobe.com/sensei/generative-ai/firefly.html#faqs

> As part of Adobe’s effort to design Firefly to be commercially safe, we are training our initial commercial Firefly model on licensed content, such as Adobe Stock, and public domain content where copyright has expired.

https://www.adobe.com/sensei/generative-ai/firefly.html#faqs

Report · Oct 11, 2023

Thank you for the share @BenjaminBerman ,

Very interesting reading.

A response from an Adobe rep would be helpful

Regards

mj