This repository contains the official implementation of two papers:
- VCIN (ICCV 2023): Variational Causal Inference Network for Explanatory Visual Question Answering
- Pro-VCIN (TPAMI 2024): Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering
Dizhan Xue, Shengsheng Qian, and Changsheng Xu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
- 2024: Pro-VCIN accepted to TPAMI 2024
- 2023: VCIN accepted to ICCV 2023
Clone this repository and set up the environment:
git clone https://github.com/LivXue/VCIN.git
cd VCIN
# Create conda environment
conda env create -f environment.yaml
conda activate vcinFollow these steps to prepare the datasets:
- GQA Dataset: Download here
- GQA-OOD Dataset: Download here
Download the bottom-up features and unzip them.
Important: You need to run this in Linux:
python ./preprocessing/extract_tsv.py --input $TSV_FILE --output $FEATURE_DIRWe provide the annotations of GQA-REX Dataset in:
model/processed_data/converted_explanation_train_balanced.jsonmodel/processed_data/converted_explanation_val_balanced.json
(Optional) You can construct the GQA-REX Dataset by yourself following instructions by its authors.
Download our generated programs of the GQA dataset from Google Drive.
(Optional) You can generate programs by yourself following this project.
We provide four models in model/model/model.py:
| Model | Description | Backbone |
|---|---|---|
| REX-VisualBert | From REX project | VisualBert |
| REX-LXMERT | REX-VisualBert with LXMERT backbone | LXMERT |
| Model | Paper | Backbone |
|---|---|---|
| VCIN | ICCV 2023 | LXMERT |
| Pro-VCIN | TPAMI 2024 | LXMERT |
Before training, generate the dictionary for questions, answers, explanations, and program modules:
cd ./model
python generate_dictionary.py --question $GQA_ROOT/question --exp $EXP_DIR --pro $PRO_DIR --save ./processed_datapython main.py --mode train \
--anno_dir $GQA_ROOT/question \
--ood_dir $OOD_ROOT/data \
--sg_dir $GQA_ROOT/scene_graph \
--lang_dir ./processed_data \
--img_dir $FEATURE_DIR/features \
--bbox_dir $FEATURE_DIR/box \
--checkpoint_dir $CHECKPOINT \
--explainable TrueTo evaluate on GQA-testdev set or generate submission file:
python main.py --mode $MODE \
--anno_dir $GQA_ROOT/question \
--ood_dir $OOD_ROOT/data \
--lang_dir ./processed_data \
--img_dir $FEATURE_DIR/features \
--weights $CHECKPOINT/model_best.pth \
--explainable TrueSet $MODE to eval or submission accordingly.
If you find our papers or code helpful, please cite:
@inproceedings{xue2023variational,
title={Variational Causal Inference Network for Explanatory Visual Question Answering},
author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={2515--2525},
year={2023}
}
@article{xue2024integrating,
title={Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering},
author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}For questions, please open an issue or contact:
- Dizhan Xue: xuedizhan17@mails.ucas.ac.cn
Made with ❤️ by the VCIN Team