Hello @Jovial_HeroB82C,
They say here what data they are using:
However, there can also be concerns around copyright. To help address these concerns, Adobe trained Firefly on licensed images in Adobe Stock along with openly licensed content and public domain content where the copyright has expired.
Source: What is generative AI and how does it work?
They also have said:
Trained on Adobe Stock images, openly licensed content, and public domain content, Firefly is designed to be safe for commercial use. To ensure that creators can benefit from generative AI, we’ve developed a compensation model for Adobe Stock contributors whose content is used in the dataset to retrain Firefly models.
Source: The new Firefly. Now smarter than ever.
In the FAQ section of that same document:
As an Adobe customer, will I have copies of my content included as part of the Firefly model?
No, copies of customer content are not included in the Firefly models.
As an Adobe customer, will I have my content used automatically to train Firefly?
No. We do not train on any Creative Cloud subscribers’ personal content. For Adobe Stock contributors, the content is part of the Firefly training dataset, in accordance with Stock Contributor license agreements.
The video How we trained the Firefly model by Ely Greenfield, Chief Technology Officer on this page: Welcome to Generation AI., also explains the source of the training content.
This document, released around the time Firefly was launched said:
Generative AI, as with any AI, is only as good as the data on which it’s trained. Mitigating harmful outputs starts with building and training on safe and inclusive datasets. For example, Adobe’s first model in our Firefly family of creative generative AI models is trained on Adobe Stock images, openly licensed content, and public domain content where copyright has expired. Training on curated, diverse datasets inherently gives your model a competitive edge when it comes to producing commercially safe and ethical results.
Source: Responsible innovation in the age of generative AI.
They have also said:
Commercial Safety: When we build our own proprietary models, we can control what goes into the model. For example, today’s Firefly Image Generation model is designed to be commercially safe, by being trained on licensed content, such as Adobe Stock, and public domain content where copyright has expired, and customers have shown that this is a critical feature when they are building content for commercial consumption.
Source: Adobe’s approach to generative AI models & customer choice
They have also said:
Among the key concerns we have heard from our creative community are control over how their data is used and ensuring ownership of their work in the digital age. At Adobe, we trained the first model of our generative AI tool Adobe Firefly only on licensed images from our Adobe Stock collection, openly licensed content, and public domain content where copyright has expired. Beyond our own model, we are also committed to helping creators protect their work across the entire AI ecosystem.
Source: Building safe, secure, and trustworthy AI: Adobe’s commitments to our customers and community
And they have said:
1. Respect for the creative community.
Adobe has established the AI ethics principles of accountability, responsibility, and transparency. Firefly was created in a way that respects existing customers and aligns with Adobe values.
That's why Firefly is trained on licensed content, such as Adobe Stock, and public domain content where the copyright has expired. Firefly is designed to be commercially safe, and it’s available for enterprise-wide access.
Source: Adobe Firefly vs. DALL·E 3: Express your vision with the right AI art generator for you.
I hope this helps clarify what data is used to train the models.
My best,
droopy