Firefly is in need of a context buffer that will allow the continuation of a prompt without having to inject a modified full prompt every time. This is completely doable in LLM architectures, a routine in text generators like ChatGPT, Copilot, or Llama, and it ties in with the repeatability problem, whereby we can't generate multiple images of the same house/car/character, but are always presented with a random variation on the theme.
Firefly would become genuinely useful if it could retain an image and execute minor edits on it via text input.
As an example:
"A cartoon mouse with black ears and nose wearing white gloves and red pants."
[Firefly generates an image]
"Make the ears rounder."
{Firefly modifies the selected image.]
"How about blue pants?"
etc.