🔗 📄 Paper (CVPR 2025) | 🌐 🎨 Project Website | 🤗 Dataset (Hugging Face)
🎉 The code and analysis data are now released!
Explore our implementation and start your own analysis right away.
You can use this codebase in two main ways:
- Leverage the config injection mechanism to gain more insight into model behavior during runtime.
- Perform deeper analysis of attention scores using the comprehensive data we provide on the Hugging Face dataset.
- Location:
llava/config/strategy.py - We introduce a
Strategyclass that acts as a singleton configuration manager. - This design allows you to inject and access configuration at any point during runtime—even deep inside model internals.
- The singleton pattern ensures consistent config usage and easy modification, making it ideal for dynamic experimentation and runtime control.
- Location:
llava/model/forward.py - Three core functions are provided:
analyze: Records attention scores during runtime for later analysis.maskout: Masks out the attention of specific heads on-the-fly.modify: Dynamically modifies attention scores during runtime.
- The
Strategyclass auto-loads the relevant configuration and strategies as soon as the code starts running, enabling seamless integration and control.
- Attention scores can be very large.
- To facilitate large-scale analysis, we will upload the complete attention score data to a Hugging Face dataset.
- This allows you to perform your own analysis without running the full model locally.
If you use this work in your research, please cite our paper:
@inproceedings{visual-head-2025,
title={Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach},
author={Jing Bi and Lianggong Bruce Wen and Zhang Liu and JunJia Guo and Yunlong Tang and Chenliang Xu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}⭐ Star this repository to get notified about future updates and data releases.