GitHub - alibaba/mm-diff: MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

👉 MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang

🔥 Examples

🎇 Pipeline

We propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. On the left, the vision-augmented text embeddings and a small set of detail-rich subject embeddings are injected into the diffusion model through the well-designed multi-modal cross-attention. On the right, we illustrate the details of the innovative implementation of cross-attention with LoRAs, as well as the attention constraints that facilitate multi-subject generation.

🔧 Preparations

Environment Setup

conda create -n mmdiff python=3.9
conda activate mmdiff
pip install -r requirements.txt

Download Models

We provide the pretrained checkpoints. One can download and put them in the root path of the current project. To run the demo, you should also download the following models:

Training Data Annotation (Optional)

We provide the demo code for training data annotation in data_annotation. To avoid package conflicts, it is best to configure a new conda or docker environment.

python data_labeling_imagenet.py --data_path="path_to_data"

✨ Customized Generation

Currently, we provide two ways to customize your images as follows. We also provide some reference images in demo_data.

Use Jupyter Notebook

mmdiff_demo, image generation with single reference image.
mmdiff_multiple_reference_demo, image generation with multiple reference images.
mmdiff_id_mixing_demo, image generation with identity mixing.

Start a Gradio Demo

python mmdiff_gradio_demo.py

🚩 Updates

[2024/05/30] Fuse lora weights into orignal weights to improve inference speed.
[2024/05/29] Release an enhanced version of MM-Diff for portrait generation, employing face embeddings to improve subject fidelity.

Citation

If you find MM-Diff useful for your research, please cite our paper:

@article{wei2024mm,
  title={MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration},
  author={Wei, Zhichao and Su, Qingkun and Qin, Long and Wang, Weizhi},
  journal={arXiv preprint arXiv:2403.15059},
  year={2024}
}

Acknowledgements

This code is built on some excellent repos, including diffusers, FastComposer, PhotoMaker and IP-Adapter. Thanks for their great work!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data_annotation		data_annotation
demo_data		demo_data
mmdiff		mmdiff
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
mmdiff_demo.ipynb		mmdiff_demo.ipynb
mmdiff_gradio_demo.py		mmdiff_gradio_demo.py
mmdiff_id_mixing_demo.ipynb		mmdiff_id_mixing_demo.ipynb
mmdiff_multiple_reference_demo.ipynb		mmdiff_multiple_reference_demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👉 MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

🔥 Examples

🎇 Pipeline

🔧 Preparations

Environment Setup

Download Models

Training Data Annotation (Optional)

✨ Customized Generation

Use Jupyter Notebook

Start a Gradio Demo

🚩 Updates

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

👉 MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

🔥 Examples

🎇 Pipeline

🔧 Preparations

Environment Setup

Download Models

Training Data Annotation (Optional)

✨ Customized Generation

Use Jupyter Notebook

Start a Gradio Demo

🚩 Updates

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages