-
Notifications
You must be signed in to change notification settings - Fork 111
mflux-debugger #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
mflux-debugger #275
Conversation
0557b44 to
455a2d0
Compare
562b9f8 to
5e64c82
Compare
64b2ef2 to
09d6187
Compare
|
@anthonywu This PR is obviously huge and I'm not expecting a full review here. However, I was thinking about one thing you added before, which was the prompt file support. That feels like a really natural way to work with this model especially. Some background in case you haven't played around with this model yet: The FIBO mode model works with structured JSON prompts that can be quite large, for example in one of our tests we have the following one: {
"short_description": "A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl's gaze is curious and full of charm, giving it a whimsical, storybook-like personality.",
"objects": [
{
"description": "An adorable, fluffy owl with large, expressive eyes and soft, voluminous feathers. Its plumage is a mix of warm browns, grays, and subtle silver highlights from the moonlight.",
"location": "center",
"relationship": "The owl is the sole subject, perched comfortably within its environment.",
"relative_size": "large within frame",
"shape_and_color": "Round head, large eyes, bulky body, predominantly brown and grey with silver accents.",
"texture": "Extremely soft, fluffy, and detailed feathers, giving a plush toy-like appearance.",
"appearance_details": "The eyes are wide, dark, and reflective, conveying a sense of wonder and curiosity. The beak is small and light-colored, almost hidden by the feathers. Subtle silver highlights catch the moonlight on its feathers.",
"orientation": "upright, facing forward"
}
],
"background_setting": "A dark, nocturnal forest setting with blurred trees and foliage, illuminated by a soft, cool moonlight. The background is out of focus, emphasizing the owl.",
"lighting": {
"conditions": "moonlight",
"direction": "backlit and side-lit from the left",
"shadows": "soft, diffused shadows on the right side of the owl and within the background foliage, indicating a single light source."
},
"aesthetics": {
"composition": "centered, portrait composition",
"color_scheme": "cool blues and silvers from the moonlight contrasting with warm browns and grays of the owl and forest.",
"mood_atmosphere": "mysterious, enchanting, whimsical, and serene.",
"aesthetic_score": "very high",
"preference_score": "very high"
},
"photographic_characteristics": {
"depth_of_field": "shallow",
"focus": "sharp focus on the owl's face and eyes, with a soft blur in the background.",
"camera_angle": "eye-level",
"lens_focal_length": "portrait lens (e.g., 50mm-85mm)"
},
"style_medium": "digital illustration",
"text_render": [],
"context": "A whimsical character illustration, possibly for a children's book, animated film, or fantasy art collection.",
"artistic_style": "fantasy, illustrative, detailed"
}When the user wants to generate an image they currently have 3 options:
(see VLM tests for examples) I can image the typical workflow being mostly to start with b), but then, once you are semi-happy with the image, do more minor tweaks using a) directly, or c), and iterate. In these cases, I think it is very nice to have the full JSON saved to disk. Maybe for this model, we should always output the JSON prompt that was used? This was not really a question, but just came to think of this as I'm wrapping up the core implementation and started to think about the UI/UX for this model... If you have other ideas/opinions, let me know :) |
|
Current design proposal (always have # Step 1: Generate initial image
mflux-generate-fibo --prompt "dragon" --output dragon.png
# Creates: dragon.png + dragon.prompt.json
# Step 2: Refine the JSON prompt (manually, or as here, with the VLM-based refine command)
mflux-refine-fibo \
--prompt-file dragon.prompt.json \
--instructions "make the dragon blue and add wings" \
--output dragon_refined.prompt.json
# Step 3: Generate image with refined prompt
mflux-generate-fibo \
--prompt-file dragon_refined.prompt.json \
--output dragon_refined.pngIn this proposal, |
|
Lots of great new work. I'll take a look soon. |
This is exactly what dspy (and pydantic models inside) can help with! Have you digested #264? I think super relevant. |
# Step 1: Generate initial image
mflux-generate-fibo --prompt "dragon" --output dragon.png
# Creates: dragon.png + dragon.prompt.json
# Step 2: Refine the JSON prompt (manually, or as here, with the VLM-based refine command)
mflux-refine-fibo \
--prompt-file dragon.prompt.json \
--instructions "make the dragon blue and add wings" \
--output dragon_refined.prompt.json
# Step 3: Generate image with refined prompt
mflux-generate-fibo \
--prompt-file dragon_refined.prompt.json \
--output dragon_refined.pngThis smells like a good fit for a fluent interface in the SDK level or a (
FiboPipeline.from_prompt("dragon")
.generate("dragon.png") # Step 1: mflux-generate-fibo --prompt "dragon" --output dragon.png
.refine(instructions="make the dragon blue and add wings",
output_prompt="dragon_refined.prompt.json") # Step 2: mflux-refine-fibo ...
.generate("dragon_refined.png") # Step 3: mflux-generate-fibo --prompt-file dragon_refined.prompt.json ...
)(
FiboPipeline.from_prompt_file("existing.prompt.json")
.refine(instructions="make the dragon blue and add wings", output_prompt="dragon_refined.prompt.json")
.generate("dragon_refined.png")
)this is going to be faster than a shell based method where each invocation loads the model from disk, it's not prohibitive but feels wrong to load the same weights over and over. echo "dragon" \
| fibo-prompt \
| tee dragon.prompt.json \
| fibo-refine --instructions "make the dragon blue and add wings" \
| tee dragon_refined.prompt.json \
| fibo-render --output dragon_refined.pngI expect that showing these prototyped usages to Gemini 3 or Composer or Codex 5.1 can get you really far. |
anthonywu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a big change that you're able to isolate into sub-modules. So the risk to the pre-existing project appears minimal.
Trust the linter, trust the tests, and you should ship whenever you feel comfortable. No blocking concerns on my part!
| max_tokens=4096, | ||
| stop=["<|im_end|>", "<|end_of_text|>"], | ||
| task="refine", | ||
| structured_prompt='{"short_description":"A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl\'s gaze is curious and full of charm, giving it a whimsical, storybook-like personality.","objects":[{"description":"An adorable, fluffy owl with large, expressive eyes and soft, voluminous feathers. Its plumage is a mix of warm browns, grays, and subtle silver highlights from the moonlight.","location":"center","relationship":"The owl is the sole subject, perched comfortably within its environment.","relative_size":"large within frame","shape_and_color":"Round head, large eyes, bulky body, predominantly brown and grey with silver accents.","texture":"Extremely soft, fluffy, and detailed feathers, giving a plush toy-like appearance.","appearance_details":"The eyes are wide, dark, and reflective, conveying a sense of wonder and curiosity. The beak is small and light-colored, almost hidden by the feathers. Subtle silver highlights catch the moonlight on its feathers.","orientation":"upright, facing forward"}],"background_setting":"A dark, nocturnal forest setting with blurred trees and foliage, illuminated by a soft, cool moonlight. The background is out of focus, emphasizing the owl.","lighting":{"conditions":"moonlight","direction":"backlit and side-lit from the left","shadows":"soft, diffused shadows on the right side of the owl and within the background foliage, indicating a single light source."},"aesthetics":{"composition":"centered, portrait composition","color_scheme":"cool blues and silvers from the moonlight contrasting with warm browns and grays of the owl and forest.","mood_atmosphere":"mysterious, enchanting, whimsical, and serene.","aesthetic_score":"very high","preference_score":"very high"},"photographic_characteristics":{"depth_of_field":"shallow","focus":"sharp focus on the owl\'s face and eyes, with a soft blur in the background.","camera_angle":"eye-level","lens_focal_length":"portrait lens (e.g., 50mm-85mm)"},"style_medium":"digital illustration","text_render":[],"context":"A whimsical character illustration, possibly for a children\'s book, animated film, or fantasy art collection.","artistic_style":"fantasy, illustrative, detailed"}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style wise I'd put these long prompt structures in a test data file and read it in, so future edit diffs look clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, that would be cleaner! I will do some slight refactoring of the tests once everything is in place
Great reminder, I haven't yet but will do it soon! |
fbcd955 to
3c1bd03
Compare
3c1bd03 to
c03ac60
Compare
cf86b32 to
b01427d
Compare
7bc05f2 to
2ac3af2
Compare
2ac3af2 to
de8c131
Compare
- feat: add mflux-debugger with rules, dependencies, and 0.14.2dev0 bump (c0ebf242) - clean (203b3fbf)
ffeadfc to
b311a8d
Compare
No description provided.