Unveiling Visual Perception in Language Models:
An Attention Head Analysis Approach

🌐 Links

🔗 📄 Paper (CVPR 2025) | 🌐 🎨 Project Website | 🤗 Dataset (Hugging Face)

🚀 News

🎉 The code and analysis data are now released!
Explore our implementation and start your own analysis right away.

🏗️ Key Architecture Highlights

You can use this codebase in two main ways:

Leverage the config injection mechanism to gain more insight into model behavior during runtime.
Perform deeper analysis of attention scores using the comprehensive data we provide on the Hugging Face dataset.

1. Singleton Strategy Class for Config Injection

Location: llava/config/strategy.py
We introduce a Strategy class that acts as a singleton configuration manager.
This design allows you to inject and access configuration at any point during runtime—even deep inside model internals.
The singleton pattern ensures consistent config usage and easy modification, making it ideal for dynamic experimentation and runtime control.

2. Attention Head Analysis & Manipulation Functions

Location: llava/model/forward.py
Three core functions are provided:
- analyze: Records attention scores during runtime for later analysis.
- maskout: Masks out the attention of specific heads on-the-fly.
- modify: Dynamically modifies attention scores during runtime.
The Strategy class auto-loads the relevant configuration and strategies as soon as the code starts running, enabling seamless integration and control.

📊 Attention Score Data

Attention scores can be very large.
To facilitate large-scale analysis, we will upload the complete attention score data to a Hugging Face dataset.
This allows you to perform your own analysis without running the full model locally.

If you use this work in your research, please cite our paper:

@inproceedings{visual-head-2025,
  title={Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach},
  author={Jing Bi and Lianggong Bruce Wen and Zhang Liu and JunJia Guo and Yunlong Tang and Chenliang Xu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

📢 Stay Tuned

⭐ Star this repository to get notified about future updates and data releases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llava		llava
scripts		scripts
.gitignore		.gitignore
README.md		README.md
att_meta_point.py		att_meta_point.py
att_meta_pope.py		att_meta_pope.py
att_meta_qbench.py		att_meta_qbench.py
att_meta_seed.py		att_meta_seed.py
heatmap.py		heatmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unveiling Visual Perception in Language Models:
An Attention Head Analysis Approach

🌐 Links

🚀 News

🏗️ Key Architecture Highlights

1. Singleton Strategy Class for Config Injection

2. Attention Head Analysis & Manipulation Functions

📊 Attention Score Data

📢 Stay Tuned

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unveiling Visual Perception in Language Models:An Attention Head Analysis Approach

🌐 Links

🚀 News

🏗️ Key Architecture Highlights

1. Singleton Strategy Class for Config Injection

2. Attention Head Analysis & Manipulation Functions

📊 Attention Score Data

📢 Stay Tuned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Unveiling Visual Perception in Language Models:
An Attention Head Analysis Approach

Packages