Skip to content

LivXue/VCIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering

Status GitHub stars License Paper ICCV 2023 Paper TPAMI 2024


📋 Table of Contents


🔍 About

This repository contains the official implementation of two papers:

  1. VCIN (ICCV 2023): Variational Causal Inference Network for Explanatory Visual Question Answering
  2. Pro-VCIN (TPAMI 2024): Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering

Authors

Dizhan Xue, Shengsheng Qian, and Changsheng Xu

Affiliation

State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences


📰 News

  • 2024: Pro-VCIN accepted to TPAMI 2024
  • 2023: VCIN accepted to ICCV 2023

💻 Installation

Clone this repository and set up the environment:

git clone https://github.com/LivXue/VCIN.git
cd VCIN

# Create conda environment
conda env create -f environment.yaml
conda activate vcin

📦 Data Preparation

Follow these steps to prepare the datasets:

1. Download Datasets

2. Download Features

Download the bottom-up features and unzip them.

3. Extract Features

Important: You need to run this in Linux:

python ./preprocessing/extract_tsv.py --input $TSV_FILE --output $FEATURE_DIR

4. GQA-REX Annotations

We provide the annotations of GQA-REX Dataset in:

  • model/processed_data/converted_explanation_train_balanced.json
  • model/processed_data/converted_explanation_val_balanced.json

(Optional) You can construct the GQA-REX Dataset by yourself following instructions by its authors.

5. Generated Programs

Download our generated programs of the GQA dataset from Google Drive.

(Optional) You can generate programs by yourself following this project.


🤖 Models

We provide four models in model/model/model.py:

Baselines

Model Description Backbone
REX-VisualBert From REX project VisualBert
REX-LXMERT REX-VisualBert with LXMERT backbone LXMERT

Our Methods

Model Paper Backbone
VCIN ICCV 2023 LXMERT
Pro-VCIN TPAMI 2024 LXMERT

🚀 Training & Evaluation

Step 1: Generate Dictionary

Before training, generate the dictionary for questions, answers, explanations, and program modules:

cd ./model
python generate_dictionary.py --question $GQA_ROOT/question --exp $EXP_DIR --pro $PRO_DIR --save ./processed_data

Step 2: Training

python main.py --mode train \
    --anno_dir $GQA_ROOT/question \
    --ood_dir $OOD_ROOT/data \
    --sg_dir $GQA_ROOT/scene_graph \
    --lang_dir ./processed_data \
    --img_dir $FEATURE_DIR/features \
    --bbox_dir $FEATURE_DIR/box \
    --checkpoint_dir $CHECKPOINT \
    --explainable True

Step 3: Evaluation

To evaluate on GQA-testdev set or generate submission file:

python main.py --mode $MODE \
    --anno_dir $GQA_ROOT/question \
    --ood_dir $OOD_ROOT/data \
    --lang_dir ./processed_data \
    --img_dir $FEATURE_DIR/features \
    --weights $CHECKPOINT/model_best.pth \
    --explainable True

Set $MODE to eval or submission accordingly.


📝 Citation

If you find our papers or code helpful, please cite:

@inproceedings{xue2023variational,
  title={Variational Causal Inference Network for Explanatory Visual Question Answering},
  author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2515--2525},
  year={2023}
}

@article{xue2024integrating,
  title={Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering},
  author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

📬 Contact

For questions, please open an issue or contact:


Made with ❤️ by the VCIN Team

About

Authors's code for "Variational Causal Inference Network for Explanatory Visual Question Answering" and "Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages