Name	Name	Last commit message	Last commit date
parent directory ..
data	data
espnet_wer_calculation	espnet_wer_calculation
runs	runs
sc_utils	sc_utils
softcorrect	softcorrect
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENCE.TXT	LICENCE.TXT
NOTICE.md	NOTICE.md
README.md	README.md
cal_wer_aishell.sh	cal_wer_aishell.sh
detect_error_token.py	detect_error_token.py
eval_corrector.py	eval_corrector.py
eval_detector.py	eval_detector.py
install_sctk.sh	install_sctk.sh
train_sp_units.txt	train_sp_units.txt

SoftCorrect

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition, by Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin and Tie-Yan Liu in AAAI 2023, is a novel non-autoregressive error correction method for automatic speech recognition with soft error detection. It uses an encoder trained with a language modeling loss to detect error tokens and a constrained decoder to correct errors.

Dependencies

Currently we implement SoftCorrect on the fairseq-0.10.1. Please refer to the fairseq installation. Some dependencies are as follows:

Python 3
NumPy
PyTorch==1.6.0
fairseq==0.10.1

SoftCorrect Modules

SoftCorrect consists an encoder as error detector and a decoder as error corrector, which can be trained seperately.

Since our model is a Chinese character-based model, arbitrary SentencePiece model can be used during the inference phase. We provide a SentencePiece model in here, which is trained on Chinese wiki data.

Error Detector (Encoder)

The error detector can be trained on pseudo data only, which only requires unpaired text data. We release the model trained on our internal unpaired text dataset (400M sentences). The unpaired text data can be easily obtained from wiki or other corpus.

Generate the databin

After collecting the unpaired data, we binarize it with runs/data_gen_unpaired.sh

Train the BERT generator

We use a BERT generator to construct the pseudo data for ASR correction (sentence with errors simulated by BERT). We can train the BERT generator with runs/train_bert_generator.sh. The pretrained BERT model on the internal data can be downloaded in here.

Train the error detector

Since SoftCorrect makes use of multiple candidates, we use BERT generator to construct the pseudo ASR correction data with multiple candidates. The data generation is achieved in an online manner. To stablize the training, the error detector can be trained firstly with runs/pretrain_detector.sh for 1 epoch (BERT style loss with multiple candidate as input) and then trained with runs/finetune_detector.sh (Anti-copy loss). The model trained on two stage can be download in first stage and second stage.

Test the error detector

The test data can be downloaded and unzipped from here. To test the error detector model after the second-stage training, we can score each token in each candidate with the error detector with runs/test_detector.sh and combine the score with the acoustic score with runs/detect_with_lmscore.sh. The results will be used by error corrector.

Note that the model after the second-stage training is the final error detector. We release the pretrained BERT generator and the first-stage model to help the potential future work based on SoftCorrect. And the first-stage training (for 1 epoch) is not a necessary step for SoftCorrect. It will be fine if the detector is trained directly with Anti-copy loss (second-stage).

Error Corrector (Decoder)

Pretrained model

The error corrector is pretrained on unpaired text data with runs/pretrain_corrector.sh. The pretrained model can be downloaded from here.

Finetuning data generation

The pretrained model can be finetuned on AISHELL-1 data. The finetuning databin can be generate with runs/data_gen_corrector_finetune.sh. The raw data required by the script can be downloaded and unzipped in here. For the preparation of finetuning data, please refer to Step 2 of FastCorrect since the procedure is similar.

Finetuning model

We can finetune the pretrained corrector model with runs/finetune_corrector.sh (for about 10 epochs).

Test the error corrector

The test data can be downloaded and unzipped from here (If not downloaded in previous section). We test the error corrector with runs/test_corrector.sh.

After installing sctk (./install_sctk.sh), we can calculate WER with cal_wer_aishell.sh.

Reference

If you find SoftCorrect useful in your work, you can cite the paper as below:

@inproceedings{leng2023softcorrect,
    title={SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition},
    author={Leng, Yichong and Tan, Xu and Liu, Wenjie and Song, Kaitao and Wang, Rui and Li, Xiang-Yang and Qin, Tao and Lin, Edward and Liu, Tie-Yan},
    booktitle={AAAI},
    year={2023}
}

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct, trademark notice, and security reporting instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SoftCorrect

Dependencies

SoftCorrect Modules

Error Detector (Encoder)

Generate the databin

Train the BERT generator

Train the error detector

Test the error detector

Error Corrector (Decoder)

Pretrained model

Finetuning data generation

Finetuning model

Test the error corrector

Reference

Code of Conduct

Related Works

FilesExpand file tree

SoftCorrect

Directory actions

More options

Directory actions

More options

Latest commit

History

SoftCorrect

Folders and files

parent directory

README.md

SoftCorrect

Dependencies

SoftCorrect Modules

Error Detector (Encoder)

Generate the databin

Train the BERT generator

Train the error detector

Test the error detector

Error Corrector (Decoder)

Pretrained model

Finetuning data generation

Finetuning model

Test the error corrector

Reference

Code of Conduct

Related Works