- VDC efficient and fast condition optimization with zero inference overhead
VDC entails
one-time task optimization; once learned, the condition is applied instantly to any new image. While we reported 30
mins on a single GPU for condition optimization for peak fidelity, our ablation below shows that VDC outperforms
OmniGen in just 10 steps (∼2 mins). Unlike prior works requiring millions of samples, VDC defines tasks from a single pair,
yielding a significant efficiency gain.
Inference incurs zero overhead, as VDC replaces CFG, leaving latency
determined by the underlying diffusion model.
- VDC allows one-shot editing adaptations
With just
one visual example VDC yields clean results supassing other methods incluing specific purpose restoration methods.
Text- and example-based methods struggle with complex edits due to misalignment or degradation priors.
- VDC generalize to real-data from synthetic examples
Our method optimize the condition to access already learned visual attributes related to the semantics of degradations such as “rain” or “noise” learned from real data.
This makes it easier for our method to
extend to out-of-distribution data, allowing it to
work on real images.
We compare our method to state-of-the-art All-in-One Image Restoration (IR) on real image De-Rain. Our method is able to generalize to real data while prior works fail.
- VDC can extend to a general editing method
We center our benchmark on fine-detail edits, global adjustments, and image restoration tasks—categories where existing methods often struggle due to visual–text misalignment.
Nonetheless, our approach is a
general editing framework: it extracts the transformation from a given example and applies it to a new input.
However, VDC prioritizes structural fidelity over non-rigid flexibility to prevent hallucinations, which limits large changes.
VDC resolves this by
supporting textual control: as shown below, VDC handles visual patterns (DeRain) while text drives semantic shifts (e.g., bears→cats) and non-rigid edits (e.g., closing eyes).