GateControl

GateControl: Efficient and Flexible Controllable Generation for Linear-Attention Diffusion Models
Accepted to CVPR 2026

This repository contains the official PyTorch implementation of GateControl, an efficient and flexible controllable generation framework tailored for linear-attention diffusion backbones (such as SANA).

Beyond providing a lightweight tool for on-device deployment, the core contribution of this work lies in a deeper insight. A common belief in prior work is that naïve additive fusion of conditional features breaks down on non-spatially aligned tasks (like subject-driven generation). Our findings challenge this assumption: with our proposed token-wise gated modulation, simple additive fusion remains highly robust for subject-driven control, while simultaneously driving a dramatic acceleration in convergence for spatial tasks.

📖 Abstract

Recent advances in diffusion-based controllable visual generation have led to remarkable improvements in image quality. However, these powerful models are typically deployed on cloud servers due to their large computational demands, raising serious concerns about user data privacy. To enable secure and efficient on-device generation, we explore in this paper controllable diffusion models built upon linear attention architectures, which offer superior scalability and efficiency, even on edge devices. Yet, our experiments reveal that existing controllable generation frameworks, such as ControlNet and OminiControl, either lack the flexibility to support multiple heterogeneous condition types or suffer from slow convergence on such linear-attention models.

To address these limitations, we propose a novel controllable diffusion framework tailored for linear attention backbones like SANA. The core of our method lies in a unified gated conditioning module working in a dual-path pipeline, which effectively integrates multi-type conditional inputs, such as spatially aligned and non-aligned cues. Extensive experiments on multiple tasks and benchmarks demonstrate that our approach achieves state-of-the-art controllable generation performance based on linear-attention models, surpassing existing methods in terms of fidelity and controllability.

✨ Key Features

Unifying Control Mechanism: Challenges the common assumption that additive fusion breaks down on non-spatial tasks. Our token-wise gated modulation perfectly unifies robust control across both spatially aligned (Canny, Coloring, Deblurring, Depth, HED) and spatially unaligned (Subject-driven) conditions.
Dramatic Convergence Acceleration: Brings a significant advantage to spatial tasks by radically accelerating the optimization process (e.g., achieving convergence in ~1k steps vs. 10k steps under strict apples-to-apples comparisons).
Extreme Efficiency (~0.09M Params): Validating this intrinsic design yields an exceptionally lightweight control module requiring only ~0.09M additional parameters, guaranteeing negligible computational overhead.
Tailored for Linear Attention (e.g., SANA): By operating as a learnable gate, it effectively mitigates the information compression inherent in linear-attention interactions, making it ideal for robust on-device generation. Importantly, this intrinsic insight extends beyond a backbone-specific engineering solution and yields consistent benefits even under standard softmax attention.

🚀 Quick Start

1. Training

To train GateControl, simply navigate and adjust parameters inside train_scripts/train_sana_gatecontrol.sh:

export HF_HOME='./cache'
export XDG_CACHE_HOME='./cache'

CONDITION_TYPE="COLORING"  # Choices: SUBJECT, CANNY, COLORING, DEBLURRING, DEPTH, HED
RESOLUTION=512             # e.g., 512 or 1024

MODEL_NAME="/path/to/SANA_ckpt"  
DATASET_PATH="/path/to/dataset"  

accelerate launch --config_file ./accelerate_config.json train_sana_gatecontrol.py ...

2. Generation

Use our decoupled inference logic module to test image generations locally:

import torch
from generate.pipeline import sana_pipeline_gatecontrol
from model.sana_gatecontrol import SanaTransformer2DModelGateControl

# Implement your conditioned mappings via the provided pipeline abstraction

📝 Citation

If you find our work useful in your research or project, please consider citing:

@article{liu2026gated,
  title={Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers},
  author={Liu, Yuhe and Tan, Zhenxiong and Hu, Yujia and Liu, Songhua and Wang, Xinchao},
  journal={arXiv preprint arXiv:2603.27666},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
generate		generate
model		model
train_scripts		train_scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GateControl

📖 Abstract

✨ Key Features

🚀 Quick Start

1. Training

2. Generation

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GateControl

📖 Abstract

✨ Key Features

🚀 Quick Start

1. Training

2. Generation

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages