So, does anyone working on Remix mode for SD? #4595

CORRUPTOR2037 · 2022-11-11T08:04:41Z

CORRUPTOR2037
Nov 11, 2022

I am no expert and cannot write it by myself, but I think interrogation + noise reconstruction from img2img alt + prompt switching on every even step should do the trick, at least in basic way.

Ehplodor · 2022-11-11T10:04:05Z

Ehplodor
Nov 11, 2022

Please could you explain what is remix in the context of SD ? TY

5 replies

Blucknote Nov 11, 2022

It's feature from midjourney where you blend two images

Would be cool to have analogue in SD

ClashSAN Nov 11, 2022
Collaborator

two input images eg: taken from my phone?

Blucknote Nov 11, 2022

@ClashSAN any 2 images i guess. I don't have access to midjourney but saw some discord servers offer this feature via bot.

ClashSAN Nov 11, 2022
Collaborator

There are two scripts, that allow you to merge 2input images from file. the interpolate.py by DiceOwl and recently #4455

I wouldn't have deep understanding of their differences, but you should use, compare both and see which is better

CORRUPTOR2037 Nov 11, 2022
Author

I looked up interpolate.py
Looks cool but it is interpolating between pixels on images, not between concepts

AbyszOne · 2022-11-13T11:35:06Z

AbyszOne
Nov 13, 2022

Yeah. Ive thinking about such concept for like months. In the other hand, you can mix prompts with | or AND, even many. Is not exactly the same, but works as, specially since MJ make a quite random mix, not too different as mix two prompt descriptions (if not is doing exactly that).

0 replies

Ehplodor · 2022-11-15T12:03:05Z

Ehplodor
Nov 15, 2022

Now that I think about it, isn't it similar to "MagicMix" ? see feature request and some explanations here : #4538
EDIT : not quite. MagicMix is img2img combined with changing prompt midway.

0 replies

mezotaken · 2022-12-01T00:55:01Z

mezotaken
Dec 1, 2022
Collaborator

I just made a progress in this

and some bad results too

Current model by itself isnt enough, interpolating in pixel space, interpolating in text embedding space is not enough. To preserve faces, compositions and details midjourney likely uses sd with combined guidance by prompt and image embeddings

clip allows that, and there's one available sd finetune with text clip replaced by img clip https://github.com/justinpinkney/stable-diffusion
It's crazy hard for me to use it since there's no optimisations and i yet to discover how to hijack something model-related properly to apply optimizations required to run it on 6gb. Currently i tried it on cpu.
tldr img2img of the first one with image guidance from the second one.
Possible ways to make it better: blend init latents from both images, blend image embeddings from both images, use different samplers (only ddim available in the repo mentioned), and create A SCRIPT for this repo including all optimizations and features, that's main priority.
If anyone will work on this, please let me know. meanwhile i'll try my best to create a PoC script.

Main reason i think it works as i described because mj is able to preserve exact facial features presented on BOTH images, people on those images never existed in training dataset, so it would be impossible to accurately recreate them by text prompt. It is possible with textual inversions, but that's obvious they're not training TI for every single query.

UPD: just blending initial latents might be enough, should check this out, apparently magicmix works without image guidance just fine.

1 reply

phazei Dec 3, 2022

Here's an implementation of it

https://github.com/cloneofsimo/magicmix

mezotaken · 2022-12-08T19:48:38Z

mezotaken
Dec 8, 2022
Collaborator

Hello again. I have only one normal-looking result from blending latents and embeddings, at the start, at the end, and in the process. and it is in previous post. Using every single thing i listed above the best thing i got is:

So guiding unet with image embeddings proved to be roughly the same as guiding it with text embeddings.
Earlier i said that it is obviously not the case of textual inversion, but after some experiments and considerations i think it just might be, at least partly.

How and why it may be TI:

With TI we can capture precise concepts of both pictures, and train on them both, considering them one subject.
Usually we want to move carefully without loss of generalization with trained embedding, hence low lr and high number of steps, plus often the goal is to capture a lot of features for one style/subject presented on several images. But this case is a bit different: we need to find the best fit to recreate same picture every time with as less variation as possible, we do not care about using it with other prompts and subjects whatsoever.

So i think using higher lr and less training can still deliver required result.

At first i thought that maximum embedding size is preffered with this, and images at the very bottom are made with TI of size 75 tokens.
The problem is generated images include everything from shrek to gigachad, not just the middle state.
Then i also tried smaller TI, in hope that it can lower capacity and suppress both subjects being memorized separately, but in the end it is the same, after some epoch.

Seems like no matter what, there's some sweet spot where it is more or less stable and images are indeed merged:
Those are with different cfg scale, but no pure gigachads or shreks here

I believe that MJ found this sweet spot, because what i got from this is infinitely better than anything produced by any other way mentioned before.

But there's more. MJ keeps layout of the images and always provide 4 ALMOST identical pictures. No way in hell you can get 4 so similar pictures from same prompt/embedding/whatever with different seeds in SD. Plus if you switch image order in prompt resulting image grid is transposed.
What may happen here:

Reconstruct initial noise from both images, and then mirror it, which yields 4 pairs of similar noises.
Interpolate each pair and use it as initial latent for txt2img with trained TI.

And guys, i tried. I really tried to shove it in one new script, to train TI from 2 images without any manual preparations, then to take noise finder from img2imgalt, find the noise, apply slerp, run txt2img. But i cannot understand anything anymore. multiple models being supported in one code, some infinite dummy appendages i need to add, clauses, situations i cannot resolve.
And on top of that, training on colab is broken as of today for reasons unknown #5523 , since there were no changes for 4 days, and it is too slow to experiment on my 6gb VRAM. I hope some of you are interested in implementing this, it is closer than ever before. Those of you with better hardware: try to find best parameters for training that provide merge and stability. AND: More different images. Obviously everything could be different for different images.

To get your attention my results with trained TI (300 steps and 0.02 lr):

MJ results:

Some cherripicked results after training TI first try.

Other possibilities

https://github.com/cloneofsimo/lora
As authors of TI paper mentioned, it may be possible to extract embedding from picture by more straightforward way and they're working on it. Maybe MJ found a way?

3 replies

mezotaken Dec 8, 2022
Collaborator

Fun happens when you train on 256x256 to be fast, but generate 512x512 afterwards:

rexelbartolome Dec 8, 2022

wow thanks for sharing, this is one of the big things I'm looking forward to try since I feel like it has many use cases. Unfortunately I'm not smart enough to understand half of what you said but your results seem promising!

mezotaken Dec 9, 2022
Collaborator

Additional thoughts i forgot to mention:

MJ makes previews of generation, and it always looks like straightforward new generation from scratch. That was main reason i switched to TI

It all comes down to reconstruction problem, given image, how can we generate new one with SD in a way, that new image is similar? What kind of guidance will it require? Remix mode is not just merging two pictures together, first it was introduced as a way to redo part of the image without messing it up completely.

I feel like TI is more of an imitation of final result than solution to reconstruction problem. And while some of my observations are valid enough, there's still posibility they somehow invented completely new way to do that, and it may be out of our reach for a long time.

Ehplodor · 2022-12-14T08:46:41Z

Ehplodor
Dec 14, 2022

Hi, any news on ReMix mode for SD ? All the work you put into it shall not be wasted.

2 replies

mezotaken Dec 14, 2022
Collaborator

well, no news, i still think that TI just imitate the result, and i'm kinda dissapointed that i cannot understand how img2imgalt works with this noise reconstruction. On math level, that is.

Ehplodor Dec 16, 2022

IDK but maybe you will find something of interest here

ThereforeGames · 2022-12-17T21:35:53Z

ThereforeGames
Dec 17, 2022

Hi,

I am also interested in figuring out how we can emulate the remix feature in SD!

I spent a few hours yesterday trying various approaches with unfortunately limited success:

Aesthetic gradient embeddings. We can produce these in a matter of seconds, but they don't seem very good at incorporating small details like facial features. Given your two example images--Shrek and Gigachad--an aesthetic embedding can help apply a "pixar" look to Gigachad, but he won't look much like Shrek. Creating an embedding from both images doesn't help either - if anything it dilutes the effect and makes it less consistent.
I tried modifying the img2img alt. script to produce the noise from one image then swap to the other image right before inference. This is a useful technique but it's still a far cry from Midjourney's results.
Using TI was a clever idea, but if we're training fast & hard with 0.02 learning rate or higher, it doesn't do a great job of hybridizing our two concepts. I experimented with different vectors per token, different filewords, etc, and it just doesn't seem possible to get consistent results in a reasonable time.

Let me ask you this: how long does it take MJ to process a remix? Is it any longer than normal inference? If not, I think some sort of latent interpolation must be the key to this, not additional training.

2 replies

mezotaken Dec 17, 2022
Collaborator

My thoughts exactly, and yes, inference time in remix mode is exactly the same as regular prompting, that's why i abandoned TI's. Moreover, generation previews look like they're being made from scratch, from pure noise. I didnt get to see any interpolations at the beginning. What bothers me most is that latent interpolation is being made in pixel space anyway, but with one pixel encoding small area of original image. But how are they able to mix face closeup with distant view of a person and get like half-length portrait? Such abstract concepts will only be averaged when appropriate guidance is averaged, right? You cannot overlap two images like this and get something in between.

mezotaken Dec 17, 2022
Collaborator

My opinion - img2imgalt is able to reproduce given picture and change it slightly, therefore solves reconstruction problem, which in turn should solve mixing. I feel like this is the way, someone just has to figure it out.

ThereforeGames · 2022-12-18T02:10:56Z

ThereforeGames
Dec 18, 2022

Okay, I've made a few interesting discoveries. Check this out:

Other results with similar settings:

Not bad, right? Here are some important takeaways:

The SD 1.5 inpainting checkpoint with 0 mask conditioning strength can do pretty amazing things when combined with latent interpolation.
Speaking of which, I'm using this script for the interpolation: https://github.com/DiceOwl/StableDiffusionStuff/blob/main/interpolate.py
I underestimated the importance of CLIP interrogation. The A1111 interrogator was giving me questionable results for the Shrek image, something like "a green man wearing a crown." When I captioned it myself (e.g. "green cartoon shrek in front of a blurry castle, 3D pixar style") it improved the output dramatically. It is possible that MidJourney trained their own interrogation model.
I think it may be best to come up with a prompt that describes the final image as accurately as possible. Doing so will allow you to raise the "interpolation value" (sometimes above 1.0) without distorting the result. For whatever reason, I got worse results with prompt editing features such as [x:y] or x AND y. Try x as y instead.
My approach is not bidirectional. Interpolating Gigachad with Shrek looks great, but interpolating Shrek with Gigachad doesn't. @mezotaken, do you know if this is true of MJ as well? Or will you get the same results in MJ regardless of what order you supply the images?
DDIM sampler seems to do the best job of mixing colors from both compositions. I'm still trying to work out the optimal step count, but I've noticed that I can sometimes get really good results from as few as 7 steps.
"Restore Faces" and "Apply Color Correction" should both be disabled.

Here's a different example - I'm sure it would look even better with careful prompting:

While I'm almost certain there's more to MJ's approach, I think this is a step in the right direction.

3 replies

ThereforeGames Dec 18, 2022

Euler A did a fantastic job with this one. I'm also playing around with 0.5 denoising strength and a simple negative prompt ("mutated").

mezotaken Dec 18, 2022
Collaborator

I totally forgot about image conditioning in inpainting models. This might be important and relevant, when used different than just base image mask.
Well, you just encountered those two major problems with this approach. One - how to guide it towards the result - that's less difficult with interpolation script, and second one: interpolation messes up your bidirectionality and gets you overlap instead of clean conceptual merge:

Yes, mj is bidirectional, and even more so: half of grid is based on first picture and the other half on second. When you switch the input, resulting grid is transposed, see the example:

I'm too lazy to create proof, but when you push in vertical and horizontal image half of grid will be horizontal and half will be vertical.

Now check out this one, the illustration of what i wrote yesterday:

What would happen when you straight up interpolate those two? i suspect that nothing good can come out, no matter how much you try it. Interpolation works great in gigashrek example because conceptually and compositionally they're almost identical: two faces looking in the same direction. I dont believe it to be the solution of MJ, but it can replicate with some restrictions on starting images.

ThereforeGames Dec 18, 2022

Thanks for sharing those extra examples! That last one is especially impressive considering how different the compositions are.

I have a few ideas for fixing the "overlap" issue, but in the meantime here are some pretty cool SD remixes with similar starting images :)

ThereforeGames · 2022-12-18T09:35:19Z

ThereforeGames
Dec 18, 2022

What would happen when you straight up interpolate those two?

You know, I'm actually getting some halfway decent results with interpolation:

However, it doesn't work nearly as well if I reverse the images. It looks like MJ is prioritizing the composition of Mona Lisa, which I haven't been able to do quite yet.

Also, the quality of my output is definitely a bit worse, but that might have more to do with the SD model itself.

7 replies

Ehplodor Dec 19, 2022

@ThereforeGames please take note that the example given by @mezotaken doesn't uses this cropped mona lisa image but some larger crop, as indicated by the almost full upper body in result picture. Maybe using larger crop as init image may result in better interpolation ?

mezotaken Dec 19, 2022
Collaborator

@Ehplodor nope, i used that exact cropping to prove the point of averaging concepts. head only + full body = head to waist

Ehplodor Dec 19, 2022

@mezotaken OK then I'm profundly sorry for this bad hypothese. The composition is so much different that I thought you had somehow applied some transformation that you may have ommitted for brevity. That's even more interesting (IMHO) because it seems (to me at least) that MJ's ReMix recovers somehow the original composition of Mona Lisa :

mezotaken Dec 19, 2022
Collaborator

Yep, there's a lot of unkown here. For example: how are these white areas created?

It looks like a result of some image transformation to me. Its clear that one image has white background, other image has not, but why such specific, sharp edges?

ThereforeGames Dec 19, 2022

Why you not use 2.1 SD model?

I think there are a few steps one must follow to get the 2.1 inpainting model to work with A1111. I haven't found the time yet.

Could images be altered (cropped / rotated / warped / flipped) before entering MJ's ReMix ? For example, what happens if Mona Lisa as first driving img2img, and the policewoman is cropped so that faces kind of overlap, before going to interpolate script ?

Yeah, I was thinking about that too. It works really well, for what it's worth. Check it out:

You can even run the resulting image through img2img with ~0.5 conditioning mask strength and a negative prompt like "hologram, blurry edges" to deal with some of the overlap issues:

Yet, I don't think this is what MJ is doing. I found some other examples of the Remix feature that can't be explained with simple cropping:

That's even more interesting (IMHO) because it seems (to me at least) that MJ's ReMix recovers somehow the original composition of Mona Lisa :

This strongly suggests that CLIP interrogation is part of MJ's recipe.

After looking at enough examples, I think we can be confident about two things:

MJ's system shows strong preference for one composition over another. It's not a 50/50 split. In the Teletubby picture above, you can see the overall layout leans toward the hellish landscape.
Some sort of interrogation is adding new information that is not present in either of the source images (e.g. it's guiding the image with aspects of "Mona Lisa" that are not in the source imagery.)

EDIT: I found more evidence of interrogation. If you run the Teletubby image through this tool it thinks three Teletubbies are present, even though only 1.5 are visible. The final, remixed image includes 3 teletubbies. (Well, 3 teletubby-esque demons anyway.)

It looks like a result of some image transformation to me. Its clear that one image has white background, other image has not, but why such specific, sharp edges?

That's an interesting observation. I'm very tempted to pick up a MidJourney subscription just to see what we can glean from giving it weird image combos. EDIT: Fair warning - it appears doing so may constitute a violation of MidJourney's ToS ("You may not reverse engineer the Services or the Assets.") I'm not sure if this is legally enforceable as reverse engineering is "generally legal" but it's important to be aware of the terms. It could certainly give them a reason to close your account, at the very least.

Ehplodor · 2022-12-19T13:03:45Z

Ehplodor
Dec 19, 2022

For inspiration ?

Also :

access latents -> Add mid-kdiffusion cfgdenoiser script callback - access latents, conditionings and sigmas mid-sampling #4021
Efficient TI : DreamArtist extension -> https://github.com/7eu7d7/DreamArtist-sd-webui-extension

1 reply

ThereforeGames Dec 19, 2022

Additionally, here is a thread that contains various inverted samplers and latent manipulation ideas:

#2940

Many of these are not yet implemented in A1111, but may be worth testing in a standalone environment.

I tried MagicMix yesterday - I don't think it's the right tool for the job.

ThereforeGames · 2022-12-20T01:10:35Z

ThereforeGames
Dec 20, 2022

1 reply

mezotaken Dec 20, 2022
Collaborator

may i add to that one more thing: i used intentionally blurred input pictures once as an experiment and got the blurry output of the same degree, that's kinda curious to me.
https://imgur.com/a/Dxl60Wz
I mean yea, on the basic level its obvious - you put in blurry, you got out blurry, but can interrogated prompt pick up on this? or is this simply a consequence of interpolation/merging without any external knowledge about given pictures? @ThereforeGames check what happens with inputs intentionally blurred with gaussian blur of some degree with interpolation.

ThereforeGames · 2022-12-20T02:13:24Z

ThereforeGames
Dec 20, 2022

I mean yea, on the basic level its obvious - you put in blurry, you got out blurry, but can interrogated prompt pick up on this? or is this simply a consequence of interpolation/merging without any external knowledge about given pictures? @ThereforeGames check what happens with inputs intentionally blurred with gaussian blur of some degree with interpolation.

CLIP interrogation can pick up on blurriness to a degree, yes. Here's what I got from your modified Shrek:

"a close up of a person with a smile on their face, a picture, inspired by Choi Buk, reddit, the duke shrek, blurry footage, squinting at high noon, emote"

It's not highly emphasized, though.

Results from SD interpolation: https://i.ibb.co/WpdPKfG/screencapture-localhost-7860-2022-12-19-21-12-14.png

0 replies

mezotaken · 2022-12-20T10:06:37Z

mezotaken
Dec 20, 2022
Collaborator

I just realised we haven't made the important test: remix picture with itself. Result can say a lot about preprocessing, whether interrogation is involved or not (with interrogation result will probably differ from original significantly) and probably other insights.
Buuuuut guess what?

And that's a second try where i just put the black pixel dot and reuploaded. First one just straight up told me to not use same link twice. I guess this is a way to protect their workflow from investigating, since i see no other apparent reason to block such behaviour.

Bonus thought: from the very beginning when i saw this feature i wondered why there were no balance slider. Like with interpolation, you can make it more like first image, or more like second. It would be crazy useful to control it in MJ, but there is no such a thing. But if the process is parametrized explicitly like with interpolation, and there exist such a value somewhere in the workflow, so why not leave it up to user? This makes me think if they even can control it explicitly?

0 replies

rskvazh · 2022-12-20T13:10:05Z

rskvazh
Dec 20, 2022

Possibly some guidance on every denoising step is done, maybe something like CFG. And seems there is latent interpolation (maybe initial noise), but I never had success with latent interpolation for good results. Maybe theirs nnet have image embeddings input opposing to SD text-only embeddings?

4 replies

ThereforeGames Dec 20, 2022

MidJourney v4 was purportedly "in the works for over 9 months" so it's certainly possible that fundamental differences in their model will make it very difficult for us to come anywhere close with SD.

Could you elaborate on image embedding inputs as opposed to text-only? Are there any good research papers on this subject I should check out?

rskvazh Dec 20, 2022

I think this is what we need from this paper https://arxiv.org/abs/2204.06125.

mezotaken Dec 20, 2022
Collaborator

This repo https://github.com/justinpinkney/stable-diffusion uses SD model fine-tuned on image embedding guidance instead of text embedding. It was an initial workflow i assumed, but i never got any good result out of it. This may be due to some implementation error i made, or poor quality of the training itself.

associated model: https://huggingface.co/lambdalabs/stable-diffusion-image-conditioned

Ehplodor Feb 23, 2023

For my personal information, "Hierarchical Text-Conditional Image Generation with CLIP Latents" (the paper @rskvazh linked 2 messages above) is actually at the core of DALL-E 2 from OpenAI (same paper available at OpenAI cdn : https://cdn.openai.com/papers/dall-e-2.pdf )

Ehplodor · 2022-12-20T16:29:57Z

Ehplodor
Dec 20, 2022

I gave a try to the "interpolate" script by DiceOwl. Not quite easy to use but :
init image :

target image :

Result at 60% policewoman 40% shrek latent interpolation :

Result at 70% policewoman :

I'd say the sweet spot is around 60% here XDDD

That being said, the image cropping is VERY important. As well as each's image prompt.
Example with no crop so 60% full policewoman as init image and 40% shrek face

:

0 replies

Ehplodor · 2022-12-31T12:41:41Z

Ehplodor
Dec 31, 2022

After a lot of trial and error, some thoughts on this. May that be helpful or, at least, it is to me :

Firstly, to achieve desired results, vanilla img2img with a well crafted prompt and adjusted parameters can be sufficient
Secondly, using 50-50 latent interpolation between the init image and the image we want to "remix" it with, can help :

NB : if I set denoing strength to zero, the pure interpolation is reported and that is interesting in itself :

So my belief now is that blending image latents kind of kickstart the process by setting up the init image as a mix between the two images to be remixed. But the real magic happens from the clip embedding that drives final image generation. The "well crafted prompt" is actually a proxy for that but I think that may be the force of MJ's V4 ReMix : a very good image's CLIP embedding model that is able to capture concepts precisely and faithfully.

0 replies

strangePause · 2023-01-14T00:22:13Z

strangePause
Jan 14, 2023

Any update on this?

0 replies

mezotaken · 2023-01-18T08:29:08Z

mezotaken
Jan 18, 2023
Collaborator

https://www.reddit.com/r/StableDiffusion/comments/10ent88/guy_who_made_the_image_variations_model_is_making/
The same guy who trained sd model on image embeddings now modified it somehow even more, promises to upload it later. I checked demo results and i can only think that MJ trained their model similarly.

2 replies

Ehplodor Jan 18, 2023

This is very promising. Thank you !

strangePause Jan 18, 2023

Oooh! Lots and lots of creative freedom when this releases. I can't wait to create accurate hybrids between things now lol.

dill-shower · 2023-01-19T18:24:53Z

dill-shower
Jan 19, 2023

https://huggingface.co/spaces/lambdalabs/image-mixer-demo

You can try this

0 replies

mezotaken · 2023-01-19T19:28:04Z

mezotaken
Jan 19, 2023
Collaborator

Either i'm a dumdum or provided demo images are cherrypicked. The new demo outputs pretty much the same results i saw in my experiments with this model.

Nowhere near mj still.

2 replies

Ehplodor Jan 19, 2023

Couldn't get any good result neither. Maybe not enough parameters' tweaking but at the very least that's quite difficult.

Ehplodor Jan 19, 2023

However, the provided examples generate very good results... Mhhh

Ehplodor · 2023-01-25T10:48:18Z

Ehplodor
Jan 25, 2023

On 15th January 2023 has been announced MJ's new "blend" feature that allows to merge "concept and feel" of at most 5 images. I wonder how this relates to ReMix... Results on the related discord show and tell are quite consistent (for those showing image prompts together with the end result).

2 replies

usamaehsan Jan 26, 2023

the guy who built SD remix have joined MJ few days ago.

Ehplodor Jan 26, 2023

OK ! Thanks for the info. That explains a lot. Congrats to him (his announcement on twitter)

usamaehsan · 2023-02-03T04:37:57Z

usamaehsan
Feb 3, 2023

Anyone working on improving sd-remix?

1 reply

Ehplodor Feb 3, 2023

All previous comments suggest strongly, in my honest and very subjective opinion, that MJ's ReMix is really "just" img2img, with the first image being used as init, and an excellent clip embedding of the second image to guide diffusion. MJ's clip model is obviously quite "good" by itself so that may explain the consistency of ReMix's results.

So I really do think the best we can currently do with SD is img2img with the prompt generated through the clip model of our choice, or probably better use of some "well crafted prompt".

I am not aware of further developments on this subject at the moment, apart from those discussed previously about fine-tuned SD image clip embedding model, used for image variations and image mixing, from that developer who recently joined MJ.

LiJT · 2023-02-20T07:07:41Z

LiJT
Feb 20, 2023

Cannot wait for the real implementation on SD, MJ is dominating right now, their result is so good

0 replies

Alcyon6 · 2023-02-25T10:36:35Z

Alcyon6
Feb 25, 2023

Another fact to keep in mind is that in Midjourney you can mix together far more than two images. I'm pretty sure v3 allowed at least five images to be mixed and generate a whole new one.

0 replies

unishift · 2023-03-27T14:09:18Z

unishift
Mar 27, 2023

I tried to replicate Remix mode via SD using ideas from this discussion. Used SD 2.1 Unclip to work with image embeddings, diffused content image to use as initial latent and averaged image embeddings of content and style to guide diffusion.

In case anyone interested: https://github.com/unishift/stable-diffusion-remix

4 replies

Blucknote Mar 29, 2023

Unfortunately 12Gb VRAM is not enough yet. Running with --device cpu gives me RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

AbyszOne Mar 29, 2023

Indeed. 😕

unishift Mar 30, 2023

Unfortunately 12Gb VRAM is not enough yet

I'll look into that though I've tested code on 1080ti with 12 GB of VRAM and it worked fine. Thanks for reporting!

Running with --device cpu gives me RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

I'll fix this as well.

AbyszOne Mar 30, 2023

Auto1111 is already supported, and fp16 works fine, but lacks the remix option. I think this type of utility would be very welcome as a Gradio extension. Also, the official repo has support for multiple inputs.

susanqq · 2023-05-23T21:58:44Z

susanqq
May 23, 2023

There is a tool that have the "remix" function, you could check https://nvwa.design/, and upload any image under the "Image" mode and upload another image in the apply step. and click generate. Even this look like for interior design, but the behavior is quite like "remix".

0 replies

Ehplodor · 2023-05-31T07:55:11Z

nikolaiusa · 2023-06-14T08:41:10Z

nikolaiusa
Jun 14, 2023

is there any news?

0 replies

rskvazh · 2023-07-26T09:39:56Z

rskvazh
Jul 26, 2023

A few days ago another tool appeared: https://www.artbreeder.com/create/mixer
https://www.aitoolsclub.com/meet-mixer-a-new-ai-tool-from-artbreeder-that/

0 replies

dill-shower · 2024-01-07T14:55:46Z

dill-shower
Jan 7, 2024

https://github.com/taabata/LCM_Inpaint_Outpaint_Comfy#mix-images

0 replies

So, does anyone working on Remix mode for SD? #4595

Replies: 33 comments · 58 replies

ClashSAN Nov 11, 2022 Collaborator

ClashSAN Nov 11, 2022 Collaborator

CORRUPTOR2037 Nov 11, 2022 Author

mezotaken Dec 1, 2022 Collaborator

mezotaken Dec 8, 2022 Collaborator

How and why it may be TI:

To get your attention my results with trained TI (300 steps and 0.02 lr):

Other possibilities

mezotaken Dec 8, 2022 Collaborator

mezotaken Dec 9, 2022 Collaborator

mezotaken Dec 14, 2022 Collaborator

mezotaken Dec 17, 2022 Collaborator

mezotaken Dec 17, 2022 Collaborator

mezotaken Dec 18, 2022 Collaborator

mezotaken Dec 19, 2022 Collaborator

mezotaken Dec 19, 2022 Collaborator

mezotaken Dec 20, 2022 Collaborator

mezotaken Dec 20, 2022 Collaborator

mezotaken Dec 20, 2022 Collaborator

mezotaken Jan 18, 2023 Collaborator

mezotaken Jan 19, 2023 Collaborator

Replies: 33 comments 58 replies

ClashSAN Nov 11, 2022
Collaborator

ClashSAN Nov 11, 2022
Collaborator

CORRUPTOR2037 Nov 11, 2022
Author

mezotaken
Dec 1, 2022
Collaborator

mezotaken
Dec 8, 2022
Collaborator

mezotaken Dec 8, 2022
Collaborator

mezotaken Dec 9, 2022
Collaborator

mezotaken Dec 14, 2022
Collaborator

mezotaken Dec 17, 2022
Collaborator

mezotaken Dec 17, 2022
Collaborator

mezotaken Dec 18, 2022
Collaborator

mezotaken Dec 19, 2022
Collaborator

mezotaken Dec 19, 2022
Collaborator

mezotaken Dec 20, 2022
Collaborator

mezotaken
Dec 20, 2022
Collaborator

mezotaken Dec 20, 2022
Collaborator

mezotaken
Jan 18, 2023
Collaborator

mezotaken
Jan 19, 2023
Collaborator