Skip to content

Conversation

@filipstrand
Copy link
Owner

@filipstrand filipstrand commented Nov 11, 2025

No description provided.

@filipstrand filipstrand self-assigned this Nov 11, 2025
@filipstrand filipstrand changed the title Mflux debugger mflux-debugger Nov 11, 2025
@filipstrand filipstrand force-pushed the mflux-debugger branch 4 times, most recently from 562b9f8 to 5e64c82 Compare November 13, 2025 00:12
@filipstrand filipstrand changed the title mflux-debugger mflux-debugger + Bria FIBO Nov 13, 2025
@filipstrand filipstrand marked this pull request as draft November 14, 2025 10:01
@filipstrand filipstrand force-pushed the mflux-debugger branch 2 times, most recently from 64b2ef2 to 09d6187 Compare November 18, 2025 17:27
@filipstrand
Copy link
Owner Author

filipstrand commented Nov 20, 2025

@anthonywu This PR is obviously huge and I'm not expecting a full review here. However, I was thinking about one thing you added before, which was the prompt file support. That feels like a really natural way to work with this model especially.

Some background in case you haven't played around with this model yet:

The FIBO mode model works with structured JSON prompts that can be quite large, for example in one of our tests we have the following one:

{
  "short_description": "A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl's gaze is curious and full of charm, giving it a whimsical, storybook-like personality.",
  "objects": [
    {
      "description": "An adorable, fluffy owl with large, expressive eyes and soft, voluminous feathers. Its plumage is a mix of warm browns, grays, and subtle silver highlights from the moonlight.",
      "location": "center",
      "relationship": "The owl is the sole subject, perched comfortably within its environment.",
      "relative_size": "large within frame",
      "shape_and_color": "Round head, large eyes, bulky body, predominantly brown and grey with silver accents.",
      "texture": "Extremely soft, fluffy, and detailed feathers, giving a plush toy-like appearance.",
      "appearance_details": "The eyes are wide, dark, and reflective, conveying a sense of wonder and curiosity. The beak is small and light-colored, almost hidden by the feathers. Subtle silver highlights catch the moonlight on its feathers.",
      "orientation": "upright, facing forward"
    }
  ],
  "background_setting": "A dark, nocturnal forest setting with blurred trees and foliage, illuminated by a soft, cool moonlight. The background is out of focus, emphasizing the owl.",
  "lighting": {
    "conditions": "moonlight",
    "direction": "backlit and side-lit from the left",
    "shadows": "soft, diffused shadows on the right side of the owl and within the background foliage, indicating a single light source."
  },
  "aesthetics": {
    "composition": "centered, portrait composition",
    "color_scheme": "cool blues and silvers from the moonlight contrasting with warm browns and grays of the owl and forest.",
    "mood_atmosphere": "mysterious, enchanting, whimsical, and serene.",
    "aesthetic_score": "very high",
    "preference_score": "very high"
  },
  "photographic_characteristics": {
    "depth_of_field": "shallow",
    "focus": "sharp focus on the owl's face and eyes, with a soft blur in the background.",
    "camera_angle": "eye-level",
    "lens_focal_length": "portrait lens (e.g., 50mm-85mm)"
  },
  "style_medium": "digital illustration",
  "text_render": [],
  "context": "A whimsical character illustration, possibly for a children's book, animated film, or fantasy art collection.",
  "artistic_style": "fantasy, illustrative, detailed"
}

When the user wants to generate an image they currently have 3 options:

  • a) Fully manual edit: {complex user JSON prompt} ---> {JSON prompt for model}
  • b) Go through the fine-tuned FIBO VLM (GENERATE): {simple user prompt} ----> {VLM expands to structured JSON} --> {JSON prompt for model}
  • c) Go through the fine-tuned FIBO VLM for edits (REFINE): {complex JSON + simple user edit prompt} ----> {VLM expands to structured JSON} --> {JSON prompt for model}

(see VLM tests for examples)

I can image the typical workflow being mostly to start with b), but then, once you are semi-happy with the image, do more minor tweaks using a) directly, or c), and iterate. In these cases, I think it is very nice to have the full JSON saved to disk. Maybe for this model, we should always output the JSON prompt that was used?

This was not really a question, but just came to think of this as I'm wrapping up the core implementation and started to think about the UI/UX for this model... If you have other ideas/opinions, let me know :)

@filipstrand
Copy link
Owner Author

filipstrand commented Nov 20, 2025

Current design proposal (always have mflux-generate-fibo save prompt.json in generation): It can then later be refined, either manually or "VLM-style", like so:

# Step 1: Generate initial image
mflux-generate-fibo --prompt "dragon" --output dragon.png
# Creates: dragon.png + dragon.prompt.json

# Step 2: Refine the JSON prompt  (manually, or as here, with the VLM-based refine command)
mflux-refine-fibo \
  --prompt-file dragon.prompt.json \
  --instructions "make the dragon blue and add wings" \
  --output dragon_refined.prompt.json

# Step 3: Generate image with refined prompt
mflux-generate-fibo \
  --prompt-file dragon_refined.prompt.json \
  --output dragon_refined.png

In this proposal, mflux-refine-fibo only works with text: (text in -> text out). Of course, we could have it generate a new image, but maybe it is better to have it be more limited in in scope and let the generate command be the only one that actually generates the image...It feels a bit more composable this way

@anthonywu
Copy link
Collaborator

Lots of great new work. I'll take a look soon.

@anthonywu
Copy link
Collaborator

The FIBO mode model works with structured JSON prompts

This is exactly what dspy (and pydantic models inside) can help with! Have you digested #264? I think super relevant.

@anthonywu
Copy link
Collaborator

# Step 1: Generate initial image
mflux-generate-fibo --prompt "dragon" --output dragon.png
# Creates: dragon.png + dragon.prompt.json

# Step 2: Refine the JSON prompt  (manually, or as here, with the VLM-based refine command)
mflux-refine-fibo \
  --prompt-file dragon.prompt.json \
  --instructions "make the dragon blue and add wings" \
  --output dragon_refined.prompt.json

# Step 3: Generate image with refined prompt
mflux-generate-fibo \
  --prompt-file dragon_refined.prompt.json \
  --output dragon_refined.png

This smells like a good fit for a fluent interface in the SDK level or a | interface in the CLI level.

(
    FiboPipeline.from_prompt("dragon")
    .generate("dragon.png")  # Step 1: mflux-generate-fibo --prompt "dragon" --output dragon.png
    .refine(instructions="make the dragon blue and add wings",
            output_prompt="dragon_refined.prompt.json")  # Step 2: mflux-refine-fibo ...
    .generate("dragon_refined.png")  # Step 3: mflux-generate-fibo --prompt-file dragon_refined.prompt.json ...
)
(
    FiboPipeline.from_prompt_file("existing.prompt.json")
    .refine(instructions="make the dragon blue and add wings", output_prompt="dragon_refined.prompt.json")
    .generate("dragon_refined.png")
)

this is going to be faster than a shell based method where each invocation loads the model from disk, it's not prohibitive but feels wrong to load the same weights over and over.

echo "dragon" \
  | fibo-prompt \
  | tee dragon.prompt.json \
  | fibo-refine --instructions "make the dragon blue and add wings" \
  | tee dragon_refined.prompt.json \
  | fibo-render --output dragon_refined.png

I expect that showing these prototyped usages to Gemini 3 or Composer or Codex 5.1 can get you really far.

Copy link
Collaborator

@anthonywu anthonywu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big change that you're able to isolate into sub-modules. So the risk to the pre-existing project appears minimal.

Trust the linter, trust the tests, and you should ship whenever you feel comfortable. No blocking concerns on my part!

max_tokens=4096,
stop=["<|im_end|>", "<|end_of_text|>"],
task="refine",
structured_prompt='{"short_description":"A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl\'s gaze is curious and full of charm, giving it a whimsical, storybook-like personality.","objects":[{"description":"An adorable, fluffy owl with large, expressive eyes and soft, voluminous feathers. Its plumage is a mix of warm browns, grays, and subtle silver highlights from the moonlight.","location":"center","relationship":"The owl is the sole subject, perched comfortably within its environment.","relative_size":"large within frame","shape_and_color":"Round head, large eyes, bulky body, predominantly brown and grey with silver accents.","texture":"Extremely soft, fluffy, and detailed feathers, giving a plush toy-like appearance.","appearance_details":"The eyes are wide, dark, and reflective, conveying a sense of wonder and curiosity. The beak is small and light-colored, almost hidden by the feathers. Subtle silver highlights catch the moonlight on its feathers.","orientation":"upright, facing forward"}],"background_setting":"A dark, nocturnal forest setting with blurred trees and foliage, illuminated by a soft, cool moonlight. The background is out of focus, emphasizing the owl.","lighting":{"conditions":"moonlight","direction":"backlit and side-lit from the left","shadows":"soft, diffused shadows on the right side of the owl and within the background foliage, indicating a single light source."},"aesthetics":{"composition":"centered, portrait composition","color_scheme":"cool blues and silvers from the moonlight contrasting with warm browns and grays of the owl and forest.","mood_atmosphere":"mysterious, enchanting, whimsical, and serene.","aesthetic_score":"very high","preference_score":"very high"},"photographic_characteristics":{"depth_of_field":"shallow","focus":"sharp focus on the owl\'s face and eyes, with a soft blur in the background.","camera_angle":"eye-level","lens_focal_length":"portrait lens (e.g., 50mm-85mm)"},"style_medium":"digital illustration","text_render":[],"context":"A whimsical character illustration, possibly for a children\'s book, animated film, or fantasy art collection.","artistic_style":"fantasy, illustrative, detailed"}',
Copy link
Collaborator

@anthonywu anthonywu Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style wise I'd put these long prompt structures in a test data file and read it in, so future edit diffs look clean

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that would be cleaner! I will do some slight refactoring of the tests once everything is in place

@filipstrand
Copy link
Owner Author

This is exactly what dspy (and pydantic models inside) can help with! Have you digested #264? I think super relevant.

Great reminder, I haven't yet but will do it soon!

@filipstrand filipstrand changed the title mflux-debugger + Bria FIBO mflux-debugger Nov 21, 2025
@anthonywu anthonywu mentioned this pull request Nov 21, 2025
@filipstrand filipstrand force-pushed the mflux-debugger branch 3 times, most recently from cf86b32 to b01427d Compare November 28, 2025 11:44
@filipstrand filipstrand force-pushed the mflux-debugger branch 2 times, most recently from 7bc05f2 to 2ac3af2 Compare December 5, 2025 11:30
- feat: add mflux-debugger with rules, dependencies, and 0.14.2dev0 bump (c0ebf242)
- clean (203b3fbf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants