Chenlei Hu
4d55f16ae8
Use enum list for --fast options ( #7024 )
2025-03-01 02:37:35 -05:00
comfyanonymous
cf0b549d48
--fast now takes a number as argument to indicate how fast you want it.
...
The idea is that you can indicate how much quality vs speed you want.
At the moment:
--fast 2 enables fp16 accumulation if your pytorch supports it.
--fast 5 enables fp8 matrix mult on fp8 models and the optimization above.
--fast without a number enables all optimizations.
2025-02-28 02:48:20 -05:00
comfyanonymous
eb4543474b
Use fp16 for intermediate for fp8 weights with --fast if supported.
2025-02-28 02:17:50 -05:00
comfyanonymous
1804397952
Use fp16 if checkpoint weights are fp16 and the model supports it.
2025-02-27 16:39:57 -05:00
comfyanonymous
f4dac8ab6f
Wan code small cleanup.
2025-02-27 07:22:42 -05:00
BiologicalExplosion
89253e9fe5
Support Cambricon MLU ( #6964 )
...
Co-authored-by: huzhan <huzhan@cambricon.com>
2025-02-26 20:45:13 -05:00
comfyanonymous
3ea3bc8546
Fix wan issues when prompt length is long.
2025-02-26 20:34:02 -05:00
comfyanonymous
0270a0b41c
Reduce artifacts on Wan by doing the patch embedding in fp32.
2025-02-26 16:59:26 -05:00
comfyanonymous
c37f15f98e
Add fast preview support for Wan models.
2025-02-26 08:56:23 -05:00
comfyanonymous
4bca7367f3
Don't try to use clip_fea on t2v model.
2025-02-26 08:38:09 -05:00
comfyanonymous
b6fefe686b
Better wan memory estimation.
2025-02-26 07:51:22 -05:00
comfyanonymous
fa62287f1f
More code reuse in wan.
...
Fix bug when changing the compute dtype on wan.
2025-02-26 05:22:29 -05:00
comfyanonymous
0844998db3
Slightly better wan i2v mask implementation.
2025-02-26 03:49:50 -05:00
comfyanonymous
4ced06b879
WIP support for Wan I2V model.
2025-02-26 01:49:43 -05:00
comfyanonymous
cb06e9669b
Wan seems to work with fp16.
2025-02-25 21:37:12 -05:00
comfyanonymous
9a66bb972d
Make wan work with all latent resolutions.
...
Cleanup some code.
2025-02-25 19:56:04 -05:00
comfyanonymous
ea0f939df3
Fix issue with wan and other attention implementations.
2025-02-25 19:13:39 -05:00
comfyanonymous
f37551c1d2
Change wan rope implementation to the flux one.
...
Should be more compatible.
2025-02-25 19:11:14 -05:00
comfyanonymous
63023011b9
WIP support for Wan t2v model.
2025-02-25 17:20:35 -05:00
comfyanonymous
f40076096e
Cleanup some lumina te code.
2025-02-25 04:10:26 -05:00
Jedrzej Kosinski
605893d3cf
Merge branch 'master' into worksplit-multigpu
2025-02-24 19:23:16 -06:00
comfyanonymous
96d891cb94
Speedup on some models by not upcasting bfloat16 to float32 on mac.
2025-02-24 05:41:32 -05:00
comfyanonymous
ace899e71a
Prioritize fp16 compute when using allow_fp16_accumulation
2025-02-23 04:45:54 -05:00
comfyanonymous
aff16532d4
Remove some useless code.
2025-02-22 04:45:14 -05:00
comfyanonymous
072db3bea6
Assume the mac black image bug won't be fixed before v16.
2025-02-21 20:24:07 -05:00
comfyanonymous
a6deca6d9a
Latest mac still has the black image bug.
2025-02-21 20:14:30 -05:00
comfyanonymous
41c30e92e7
Let all model memory be offloaded on nvidia.
2025-02-21 06:32:21 -05:00
comfyanonymous
12da6ef581
Apparently directml supports fp16.
2025-02-20 09:30:24 -05:00
Silver
c5be423d6b
Fix link pointing to non-exisiting docs ( #6891 )
...
* Fix link pointing to non-exisiting docs
The current link is pointing to a path that does not exist any longer.
I changed it to point to the currect correct path for custom nodes datatypes.
* Update node_typing.py
2025-02-20 07:07:07 -05:00
maedtb
5715be2ca9
Fix Hunyuan unet config detection for some models. ( #6877 )
...
The change to support 32 channel hunyuan models is missing the `key_prefix` on the key.
This addresses a complain in the comments of acc152b674
.
2025-02-19 07:14:45 -05:00
bymyself
afc85cdeb6
Add Load Image Output node ( #6790 )
...
* add LoadImageOutput node
* add route for input/output/temp files
* update node_typing.py
* use literal type for image_folder field
* mark node as beta
2025-02-18 17:53:01 -05:00
Jukka Seppänen
acc152b674
Support loading and using SkyReels-V1-Hunyuan-I2V ( #6862 )
...
* Support SkyReels-V1-Hunyuan-I2V
* VAE scaling
* Fix T2V
oops
* Proper latent scaling
2025-02-18 17:06:54 -05:00
comfyanonymous
b07258cef2
Fix typo.
...
Let me know if this slows things down on 2000 series and below.
2025-02-18 07:28:33 -05:00
Jedrzej Kosinski
048f4f0b3a
Merge branch 'master' into worksplit-multigpu
2025-02-17 19:35:58 -06:00
comfyanonymous
31e54b7052
Improve AMD arch detection.
2025-02-17 04:53:40 -05:00
comfyanonymous
8c0bae50c3
bf16 manual cast works on old AMD.
2025-02-17 04:42:40 -05:00
comfyanonymous
530412cb9d
Refactor torch version checks to be more future proof.
2025-02-17 04:36:45 -05:00
comfyanonymous
e2919d38b4
Disable bf16 on AMD GPUs that don't support it.
2025-02-16 05:46:10 -05:00
comfyanonymous
1cd6cd6080
Disable pytorch attention in VAE for AMD.
2025-02-14 05:42:14 -05:00
comfyanonymous
d7b4bf21a2
Auto enable mem efficient attention on gfx1100 on pytorch nightly 2.7
...
I'm not not sure which arches are supported yet. If you see improvements in
memory usage while using --use-pytorch-cross-attention on your AMD GPU let
me know and I will add it to the list.
2025-02-14 04:18:14 -05:00
comfyanonymous
019c7029ea
Add a way to set a different compute dtype for the model at runtime.
...
Currently only works for diffusion models.
2025-02-13 20:34:03 -05:00
comfyanonymous
8773ccf74d
Better memory estimation for ROCm that support mem efficient attention.
...
There is no way to check if the card actually supports it so it assumes
that it does if you use --use-pytorch-cross-attention with yours.
2025-02-13 08:32:36 -05:00
comfyanonymous
1d5d6586f3
Fix ruff.
2025-02-12 06:49:16 -05:00
zhoufan2956
35740259de
mix_ascend_bf16_infer_err ( #6794 )
2025-02-12 06:48:11 -05:00
comfyanonymous
ab888e1e0b
Add add_weight_wrapper function to model patcher.
...
Functions can now easily be added to wrap/modify model weights.
2025-02-12 05:55:35 -05:00
Jedrzej Kosinski
d2504fb701
Merge branch 'master' into worksplit-multigpu
2025-02-11 22:34:51 -06:00
comfyanonymous
d9f0fcdb0c
Cleanup.
2025-02-11 17:17:03 -05:00
HishamC
b124256817
Fix for running via DirectML ( #6542 )
...
* Fix for running via DirectML
Fix DirectML empty image generation issue with Flux1. add CPU fallback for unsupported path. Verified the model works on AMD GPUs
* fix formating
* update casual mask calculation
2025-02-11 17:11:32 -05:00
comfyanonymous
af4b7c91be
Make --force-fp16 actually force the diffusion model to be fp16.
2025-02-11 08:33:09 -05:00
comfyanonymous
4027466c80
Make lumina model work with any latent resolution.
2025-02-10 00:24:20 -05:00
comfyanonymous
095d867147
Remove useless function.
2025-02-09 07:02:57 -05:00
Pam
caeb27c3a5
res_multistep: Fix cfgpp and add ancestral samplers ( #6731 )
2025-02-08 19:39:58 -05:00
comfyanonymous
3d06e1c555
Make error more clear to user.
2025-02-08 18:57:24 -05:00
catboxanon
43a74c0de1
Allow FP16 accumulation with --fast
( #6453 )
...
Currently only applies to PyTorch nightly releases. (>=20250208)
2025-02-08 17:00:56 -05:00
Jedrzej Kosinski
b03763bca6
Merge branch 'multigpu_support' into worksplit-multigpu
2025-02-07 13:27:49 -06:00
comfyanonymous
079eccc92a
Don't compress http response by default.
...
Remove argument to disable it.
Add new --enable-compress-response-body argument to enable it.
2025-02-07 03:29:21 -05:00
Jedrzej Kosinski
476aa79b64
Let --cuda-device take in a string to allow multiple devices (or device order) to be chosen, print available devices on startup, potentially support MultiGPU Intel and Ascend setups
2025-02-06 08:44:07 -06:00
Jedrzej Kosinski
441cfd1a7a
Merge branch 'master' into multigpu_support
2025-02-06 08:10:48 -06:00
comfyanonymous
14880e6dba
Remove some useless code.
2025-02-06 05:00:37 -05:00
comfyanonymous
37cd448529
Set the shift for Lumina back to 6.
2025-02-05 14:49:52 -05:00
comfyanonymous
94f21f9301
Upcasting rope to fp32 seems to make no difference in this model.
2025-02-05 04:32:47 -05:00
comfyanonymous
60653004e5
Use regular numbers for rope in lumina model.
2025-02-05 04:17:25 -05:00
comfyanonymous
a57d635c5f
Fix lumina 2 batches.
2025-02-04 21:48:11 -05:00
comfyanonymous
8ac2dddeed
Lower the default shift of lumina to reduce artifacts.
2025-02-04 06:50:37 -05:00
comfyanonymous
3e880ac709
Fix on python 3.9
2025-02-04 04:20:56 -05:00
comfyanonymous
e5ea112a90
Support Lumina 2 model.
2025-02-04 04:16:30 -05:00
comfyanonymous
44e19a28d3
Use maximum negative value instead of -inf for masks in text encoders.
...
This is probably more correct.
2025-02-02 09:46:00 -05:00
Dr.Lt.Data
0a0df5f136
better guide message for sageattention ( #6634 )
2025-02-02 09:26:47 -05:00
KarryCharon
24d6871e47
add disable-compres-response-body cli args; add compress middleware; ( #6672 )
2025-02-02 09:24:55 -05:00
Jedrzej Kosinski
99a5c1068a
Merge branch 'master' into multigpu_support
2025-02-02 03:19:18 -06:00
comfyanonymous
9e1d301129
Only use stable cascade lora format with cascade model.
2025-02-01 06:35:22 -05:00
comfyanonymous
8d8dc9a262
Allow batch of different sigmas when noise scaling.
2025-01-30 06:49:52 -05:00
Jedrzej Kosinski
02747cde7d
Carry over change from _calc_cond_batch into _calc_cond_batch_multigpu
2025-01-29 11:10:23 -06:00
filtered
222f48c0f2
Allow changing folder_paths.base_path via command line argument. ( #6600 )
...
* Reimpl. CLI arg directly inside folder_paths.
* Update tests to use CLI arg mocking.
* Revert last-minute refactor.
* Fix test state polution.
2025-01-29 08:06:28 -05:00
comfyanonymous
13fd4d6e45
More friendly error messages for corrupted safetensors files.
2025-01-28 09:41:09 -05:00
Jedrzej Kosinski
0b3233b4e2
Merge remote-tracking branch 'origin/master' into multigpu_support
2025-01-28 06:11:07 -06:00
Jedrzej Kosinski
eda866bf51
Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet
2025-01-27 06:25:48 -06:00
comfyanonymous
255edf2246
Lower minimum ratio of loaded weights on Nvidia.
2025-01-27 05:26:51 -05:00
Jedrzej Kosinski
c7feef9060
Cast transformer_options for multigpu
2025-01-26 05:29:27 -06:00
comfyanonymous
67feb05299
Remove redundant code.
2025-01-25 19:04:53 -05:00
Jedrzej Kosinski
51af7fa1b4
Fix multigpu ControlBase get_models and cleanup calls to avoid multiple calls of functions on multigpu_clones versions of controlnets
2025-01-25 06:05:01 -06:00
Jedrzej Kosinski
46969c380a
Initial MultiGPU support for controlnets
2025-01-24 05:39:38 -06:00
comfyanonymous
14ca5f5a10
Remove useless code.
2025-01-24 06:15:54 -05:00
Jedrzej Kosinski
5db4277449
Make sure additional_models are unloaded as well when perform
2025-01-23 19:06:05 -06:00
comfyanonymous
96e2a45193
Remove useless code.
2025-01-23 05:56:23 -05:00
Chenlei Hu
dfa2b6d129
Remove unused function lcm in conds.py ( #6572 )
2025-01-23 05:54:09 -05:00
Jedrzej Kosinski
02a4d0ad7d
Added unload_model_and_clones to model_management.py to allow unloading only relevant models
2025-01-23 01:20:00 -06:00
comfyanonymous
d6bbe8c40f
Remove support for python 3.8.
2025-01-22 17:04:30 -05:00
chaObserv
e857dd48b8
Add gradient estimation sampler ( #6554 )
2025-01-22 05:29:40 -05:00
comfyanonymous
fb2ad645a3
Add FluxDisableGuidance node to disable using the guidance embed.
2025-01-20 14:50:24 -05:00
Jedrzej Kosinski
ef137ac0b6
Merge branch 'multigpu_support' of https://github.com/kosinkadink/ComfyUI into multigpu_support
2025-01-20 04:34:39 -06:00
Jedrzej Kosinski
328d4f16a9
Make WeightHooks compatible with MultiGPU, clean up some code
2025-01-20 04:34:26 -06:00
comfyanonymous
d8a7a32779
Cleanup old TODO.
2025-01-20 03:44:13 -05:00
Jedrzej Kosinski
bdbcb85b8d
Merge branch 'multigpu_support' of https://github.com/Kosinkadink/ComfyUI into multigpu_support
2025-01-20 00:51:42 -06:00
Jedrzej Kosinski
6c9e94bae7
Merge branch 'master' into multigpu_support
2025-01-20 00:51:37 -06:00
Sergii Dymchenko
ebf038d4fa
Use torch.special.expm1
( #6388 )
...
* Use `torch.special.expm1`
This function provides greater precision than `exp(x) - 1` for small values of `x`.
Found with TorchFix https://github.com/pytorch-labs/torchfix/
* Use non-alias
2025-01-19 04:54:32 -05:00
catboxanon
b1a02131c9
Remove comfy.samplers self-import ( #6506 )
2025-01-18 17:49:51 -05:00
comfyanonymous
507199d9a8
Uni pc sampler now works with audio and video models.
2025-01-18 05:27:58 -05:00
comfyanonymous
2f3ab40b62
Add warning when using old pytorch versions.
2025-01-17 18:47:27 -05:00
Jedrzej Kosinski
bfce723311
Initial work on multigpu_clone function, which will account for additional_models getting cloned
2025-01-17 03:31:28 -06:00