Minor - Fix CUDA OOM for 16GB VRAM GPUs#14
Open
changhaowuwu wants to merge 2 commits intoTele-AI:mainfrom
Open
Minor - Fix CUDA OOM for 16GB VRAM GPUs#14changhaowuwu wants to merge 2 commits intoTele-AI:mainfrom
changhaowuwu wants to merge 2 commits intoTele-AI:mainfrom
Conversation
- Enable VAE tiling to reduce peak memory during encode/decode - Keep only one large model (VAE or transformer) on GPU at a time, swapping between CPU and GPU at each inference stage - Return raw latents from pipeline and decode separately after offloading transformer, avoiding both models on GPU simultaneously - Auto-kill stale GPU processes from previous interrupted runs at startup - Add atexit/signal handlers for graceful GPU memory cleanup - Set PYTORCH_ALLOC_CONF=expandable_segments:True by default
…rting PIL frames to numpy arrays
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix CUDA OOM for 16GB GPUs
Problem
telestylevideo_inference.py consistently hit torch.OutOfMemoryError on GPUs with 16GB VRAM (e.g. RTX 4080). Two issues:
1.Both models on GPU simultaneously — The VAE (3.5 GB) and transformer (3.5 GB) were both loaded to GPU at init, leaving insufficient headroom for activation tensors during the transformer forward pass (340 MiB for patch_embedding2) and VAE encoding (8.7 GiB for 129-frame 3D convolutions).
2.Stale GPU processes — Interrupted runs (Ctrl+C, OOM kills → exit code 137) left zombie processes holding 11+ GiB of VRAM, starving subsequent runs.
Solution
Model offloading — Only one model resides on GPU at any time:
Additional optimizations:
Testing
Verified end-to-end on a 16GB GPU with the default 129-frame, 720×1248 configuration. All 25 diffusion steps complete at ~77s/step with peak VRAM usage under 12 GiB.