So, does anyone working on Remix mode for SD? #4595
Replies: 33 comments 58 replies
-
Please could you explain what is remix in the context of SD ? TY |
Beta Was this translation helpful? Give feedback.
-
Yeah. Ive thinking about such concept for like months. In the other hand, you can mix prompts with | or AND, even many. Is not exactly the same, but works as, specially since MJ make a quite random mix, not too different as mix two prompt descriptions (if not is doing exactly that). |
Beta Was this translation helpful? Give feedback.
-
Now that I think about it, isn't it similar to "MagicMix" ? see feature request and some explanations here : #4538 |
Beta Was this translation helpful? Give feedback.
-
I just made a progress in this clip allows that, and there's one available sd finetune with text clip replaced by img clip https://github.com/justinpinkney/stable-diffusion Main reason i think it works as i described because mj is able to preserve exact facial features presented on BOTH images, people on those images never existed in training dataset, so it would be impossible to accurately recreate them by text prompt. It is possible with textual inversions, but that's obvious they're not training TI for every single query. UPD: just blending initial latents might be enough, should check this out, apparently magicmix works without image guidance just fine. |
Beta Was this translation helpful? Give feedback.
-
Hello again. I have only one normal-looking result from blending latents and embeddings, at the start, at the end, and in the process. and it is in previous post. Using every single thing i listed above the best thing i got is: So guiding unet with image embeddings proved to be roughly the same as guiding it with text embeddings. How and why it may be TI:With TI we can capture precise concepts of both pictures, and train on them both, considering them one subject. So i think using higher lr and less training can still deliver required result. At first i thought that maximum embedding size is preffered with this, and images at the very bottom are made with TI of size 75 tokens. Seems like no matter what, there's some sweet spot where it is more or less stable and images are indeed merged: I believe that MJ found this sweet spot, because what i got from this is infinitely better than anything produced by any other way mentioned before. But there's more. MJ keeps layout of the images and always provide 4 ALMOST identical pictures. No way in hell you can get 4 so similar pictures from same prompt/embedding/whatever with different seeds in SD. Plus if you switch image order in prompt resulting image grid is transposed.
And guys, i tried. I really tried to shove it in one new script, to train TI from 2 images without any manual preparations, then to take noise finder from img2imgalt, find the noise, apply slerp, run txt2img. But i cannot understand anything anymore. multiple models being supported in one code, some infinite dummy appendages i need to add, clauses, situations i cannot resolve. To get your attention my results with trained TI (300 steps and 0.02 lr):MJ results: Other possibilitieshttps://github.com/cloneofsimo/lora |
Beta Was this translation helpful? Give feedback.
-
Hi, any news on ReMix mode for SD ? All the work you put into it shall not be wasted. |
Beta Was this translation helpful? Give feedback.
-
Hi, I am also interested in figuring out how we can emulate the remix feature in SD! I spent a few hours yesterday trying various approaches with unfortunately limited success:
Let me ask you this: how long does it take MJ to process a remix? Is it any longer than normal inference? If not, I think some sort of latent interpolation must be the key to this, not additional training. |
Beta Was this translation helpful? Give feedback.
-
Okay, I've made a few interesting discoveries. Check this out: Other results with similar settings: Not bad, right? Here are some important takeaways:
Here's a different example - I'm sure it would look even better with careful prompting: While I'm almost certain there's more to MJ's approach, I think this is a step in the right direction. |
Beta Was this translation helpful? Give feedback.
-
You know, I'm actually getting some halfway decent results with interpolation: However, it doesn't work nearly as well if I reverse the images. It looks like MJ is prioritizing the composition of Mona Lisa, which I haven't been able to do quite yet. Also, the quality of my output is definitely a bit worse, but that might have more to do with the SD model itself. |
Beta Was this translation helpful? Give feedback.
-
For inspiration ?
Also :
|
Beta Was this translation helpful? Give feedback.
-
CLIP interrogation can pick up on blurriness to a degree, yes. Here's what I got from your modified Shrek: "a close up of a person with a smile on their face, a picture, inspired by Choi Buk, reddit, the duke shrek, blurry footage, squinting at high noon, emote" It's not highly emphasized, though. Results from SD interpolation: https://i.ibb.co/WpdPKfG/screencapture-localhost-7860-2022-12-19-21-12-14.png |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Possibly some guidance on every denoising step is done, maybe something like CFG. And seems there is latent interpolation (maybe initial noise), but I never had success with latent interpolation for good results. Maybe theirs nnet have image embeddings input opposing to SD text-only embeddings? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Any update on this? |
Beta Was this translation helpful? Give feedback.
-
https://www.reddit.com/r/StableDiffusion/comments/10ent88/guy_who_made_the_image_variations_model_is_making/ |
Beta Was this translation helpful? Give feedback.
-
https://huggingface.co/spaces/lambdalabs/image-mixer-demo You can try this |
Beta Was this translation helpful? Give feedback.
-
Either i'm a dumdum or provided demo images are cherrypicked. The new demo outputs pretty much the same results i saw in my experiments with this model. Nowhere near mj still. |
Beta Was this translation helpful? Give feedback.
-
On 15th January 2023 has been announced MJ's new "blend" feature that allows to merge "concept and feel" of at most 5 images. I wonder how this relates to ReMix... Results on the related discord show and tell are quite consistent (for those showing image prompts together with the end result). |
Beta Was this translation helpful? Give feedback.
-
Anyone working on improving sd-remix? |
Beta Was this translation helpful? Give feedback.
-
Cannot wait for the real implementation on SD, MJ is dominating right now, their result is so good |
Beta Was this translation helpful? Give feedback.
-
Another fact to keep in mind is that in Midjourney you can mix together far more than two images. I'm pretty sure v3 allowed at least five images to be mixed and generate a whole new one. |
Beta Was this translation helpful? Give feedback.
-
I tried to replicate Remix mode via SD using ideas from this discussion. Used SD 2.1 Unclip to work with image embeddings, diffused content image to use as initial latent and averaged image embeddings of content and style to guide diffusion. In case anyone interested: https://github.com/unishift/stable-diffusion-remix |
Beta Was this translation helpful? Give feedback.
-
There is a tool that have the "remix" function, you could check https://nvwa.design/, and upload any image under the "Image" mode and upload another image in the apply step. and click generate. Even this look like for interior design, but the behavior is quite like "remix". |
Beta Was this translation helpful? Give feedback.
-
Another remix tool : ClipDrop's ReimagineXL by Stability AI : https://clipdrop.co/fr/stable-diffusion-reimagine |
Beta Was this translation helpful? Give feedback.
-
is there any news? |
Beta Was this translation helpful? Give feedback.
-
A few days ago another tool appeared: https://www.artbreeder.com/create/mixer |
Beta Was this translation helpful? Give feedback.
-
https://github.com/taabata/LCM_Inpaint_Outpaint_Comfy#mix-images |
Beta Was this translation helpful? Give feedback.
-
I am no expert and cannot write it by myself, but I think interrogation + noise reconstruction from img2img alt + prompt switching on every even step should do the trick, at least in basic way.
Beta Was this translation helpful? Give feedback.
All reactions