Some insights on framepack lora training. #380
xianienie
started this conversation in
Success Stories
Replies: 1 comment
-
|
Thank you, very valuable insights. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So far, I have published 6 framepack lora.
This is my homepage.
https://civitai.com/user/xianienie
Personally, I prefer a discrete_flow_shift value of 17. Through numerous experiments, I observed a strong positive correlation between discrete_flow_shift and the initial training loss. A higher initial loss indicates greater potential for reduction. Utilizing a higher discrete_flow_shift value results in quicker convergence of the framepack LoRA.
A higher discrete_flow_shift will cause the timestep to shift backward.

discrete_flow_shift 3
It is known that diffusion models tend to generate more details as the generation process approaches completion.

I find framepack’s default timestep is set too late. Consequently, the critical stages of image generation and the main actions are mostly completed in the final stages, resulting in poor detail quality.
Framepack defaults to a timestep value of 7, whereas Wan suggests using 3. Wan’s timestep progression is more gradual, leading to a more consistent generation process. This allows the main actions to be established in earlier stages, thereby providing more opportunity to render excellent details in the later stages of generation.
discrete_flow_shift 7
At the same time, I found that after using LoRA with framepack, the prompt almost becomes ineffective, and the effect of using multiple LoRAs simultaneously is also not very good. I think this is also caused by the timestep shifting backward.
about network dim,a higher network dim can store more information, regardless of whether it is useful or not. This can, to some extent, speed up the training process. It is reflected in the higher train loss at the end, and there will also be a better effect.For each cache, I believe at least 10 dim network is needed.
Then, for network alpha being 1/3 of the network dim, I believe this value is relatively good. This allows the weights used during generation to be best around 1.
I prefer to use a learning rate of 4e-4. For shorter videos, like those with only 1 cache, use 6e-4, and for longer ones, use 2e-4.
Beta Was this translation helpful? Give feedback.
All reactions