GSoC Week 9

15 Jul 2018

Reading time ~3 minutes

Google Summer of Coding Week 9

This is my 7th blog in this category (Google Summer of Code 2018). This is a short report on my work during Week 9. This week, we make a big breakthrough and successfully run the whole workflow in DeepSpeech2 on CWRU server. I think it an important milestone for second evaluation.

Run Aishell on CWRU HPC

1.Login and open screen

ssh sxx186@redhen1.case.edu
screen
ssh sxx186@rider.case.edu
srun -p gpu -C gpup100 --mem=180gb --gres=gpu:1 --pty bash
module load singularity/2.5.1

Note: remember to require larger memory, otherwise it will occur the error “srun out of memory”.

2.Get into the image

cd /mnt/rds/redhen/gallina/Singularity/DeepSpeech2/DeepSpeech/
singularity shell --nv -e -H `pwd` deepspeech2_suwi_singularity.simg

Note 1: remember to add “–nv”, otherwise it will show CUDA error.
Note 2: there’s no need to UNSET HOME anymore, since I created a Singularity recipe to set environment.

3.Run the code

cd examples/aishell/
sh run_data.sh
sh run_test_golden.sh
sh run_infer_golden.sh

4.Results

Singularity deepspeech2_suwi_singularity.simg:~/examples/aishell> sh run_infer_golden.sh 
-----------  Configuration Arguments -----------
alpha: 2.6
beam_size: 300
beta: 5.0
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
error_rate_type: cer
infer_manifest: data/aishell/manifest.test
lang_model_path: models/lm/zhidao_giga.klm
mean_std_path: models/aishell/mean_std.npz
model_path: models/aishell/params.tar.gz
num_conv_layers: 2
num_proc_bsearch: 2
num_rnn_layers: 3
num_samples: 10
rnn_layer_size: 1024
share_rnn_weights: 0
specgram_type: linear
trainer_count: 2
use_gpu: 1
use_gru: 1
vocab_path: models/aishell/vocab.txt
------------------------------------------------
I0713 04:53:39.325980 112135 Util.cpp:166] commandline:  --use_gpu=1 --rnn_use_batch=True --trainer_count=2 
[INFO 2018-07-13 04:53:41,068 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-07-13 04:53:41,069 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-07-13 04:53:41,069 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-07-13 04:53:41,070 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-07-13 04:53:41,070 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-07-13 04:53:41,071 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-07-13 04:53:45,501 model.py:243] begin to initialize the external scorer for decoding
[INFO 2018-07-13 04:53:49,109 model.py:253] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2018-07-13 04:53:49,109 model.py:254] end initializing scorer
[INFO 2018-07-13 04:53:49,109 infer.py:104] start inference ...
I0713 04:53:49.117202 112135 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=2 numDevices=2

Target Transcription: 核武器并不能征服类似美国这样的国家
Output Transcription: 和武器并不能征服类似美国这样的国家
Current error rate [cer] = 0.058824

Target Transcription: 由于不可能从根本上改变供求关系
Output Transcription: 由于不可能从根本上改变供求关系
Current error rate [cer] = 0.000000

Target Transcription: 个人寄快递必须登记有效的身份证件
Output Transcription: 个人既快递必须登记有效的身份证件
Current error rate [cer] = 0.062500

Target Transcription: 在这场亚洲国家锁定胜局的申办博弈中
Output Transcription: 在这场亚洲国家所定胜局的申办博弈中
Current error rate [cer] = 0.058824

Target Transcription: 可以有效的抵消年龄所带来的速度劣势
Output Transcription: 可以有效地抵消年龄所带来的速度劣势
Current error rate [cer] = 0.058824

Target Transcription: 要加大保障性安居工程建设资计划落实力度
Output Transcription: 要加大保障性安居工程建设投资计划落实力度
Current error rate [cer] = 0.052632

Target Transcription: 财政能力和硬件设施的优势是我们最终取胜的关键原因
Output Transcription: 财政能力和硬件设施的优势是我们最终取胜的关键原因
Current error rate [cer] = 0.000000

Target Transcription: 因而痛斩情丝她除了拥有模特儿火辣身材
Output Transcription: 因而痛感清斯他除了拥有模特火辣身材
Current error rate [cer] = 0.277778

Target Transcription: 他们会拥有较快的速度
Output Transcription: 他们会拥有较快的速度
Current error rate [cer] = 0.000000

Target Transcription: 可以实现在敌国网络中的长期潜伏
Output Transcription: 可以实现在中国网络中的长期潜伏
Current error rate [cer] = 0.066667
[INFO 2018-07-13 04:53:50,868 infer.py:125] finish inference

Note 1: I modified the run_infer_golden.sh file to change lm model as larger 70GB model. And skip the repeating download step to realize quicker execution.
Note 2: I modified the infer.py file to make the target transcription support UTF-8 Chinese by adding .encode(utf-8).
Note 3: I modified the batch_size from 128 to 64 in run_test_golden.sh file to meet the memory requirement of CWRU server.

Conclusion

Up till now, we can successfully run the whole workflow in DeepSpeech2 on CWRU server. Our ultimate target is to make my code goes into permanent production in Red Hen pipelines. The next step is to produce the transcripts in batch using data in Red Hen Lab. Zhaoqing have wrote some shell scripts to automatically find all the Chinese videos in the datasets and transform them into the form that could be input in our DeepSpeech2 model, see reference. Lastly, congratulations on passing the second eveluation! And great thanks for the help and guidance from Red Hen Lab. In the last period, I will do my best to get an excellent outcome! Thank you for reading. 🙂

GSoC