Skip to content

jing-bi/visual-head

Repository files navigation

Unveiling Visual Perception in Language Models:
An Attention Head Analysis Approach

Project

🌐 Links

🔗 📄 Paper (CVPR 2025) | 🌐 🎨 Project Website | 🤗 Dataset (Hugging Face)

🚀 News

🎉 The code and analysis data are now released!
Explore our implementation and start your own analysis right away.

🏗️ Key Architecture Highlights

You can use this codebase in two main ways:

  1. Leverage the config injection mechanism to gain more insight into model behavior during runtime.
  2. Perform deeper analysis of attention scores using the comprehensive data we provide on the Hugging Face dataset.

1. Singleton Strategy Class for Config Injection

  • Location: llava/config/strategy.py
  • We introduce a Strategy class that acts as a singleton configuration manager.
  • This design allows you to inject and access configuration at any point during runtime—even deep inside model internals.
  • The singleton pattern ensures consistent config usage and easy modification, making it ideal for dynamic experimentation and runtime control.

2. Attention Head Analysis & Manipulation Functions

  • Location: llava/model/forward.py
  • Three core functions are provided:
    • analyze: Records attention scores during runtime for later analysis.
    • maskout: Masks out the attention of specific heads on-the-fly.
    • modify: Dynamically modifies attention scores during runtime.
  • The Strategy class auto-loads the relevant configuration and strategies as soon as the code starts running, enabling seamless integration and control.

📊 Attention Score Data

  • Attention scores can be very large.
  • To facilitate large-scale analysis, we will upload the complete attention score data to a Hugging Face dataset.
  • This allows you to perform your own analysis without running the full model locally.

If you use this work in your research, please cite our paper:

@inproceedings{visual-head-2025,
  title={Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach},
  author={Jing Bi and Lianggong Bruce Wen and Zhang Liu and JunJia Guo and Yunlong Tang and Chenliang Xu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

📢 Stay Tuned

Star this repository to get notified about future updates and data releases.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors