Google Summer of Coding Week 9
This is my 7th blog in this category (Google Summer of Code 2018). This is a short report on my work during Week 9. This week, we make a big breakthrough and successfully run the whole workflow in DeepSpeech2 on CWRU server. I think it an important milestone for second evaluation.
Run Aishell on CWRU HPC
1.Login and open screen
- Note: remember to require larger memory, otherwise it will occur the error “srun out of memory”.
2.Get into the image
- Note 1: remember to add “–nv”, otherwise it will show CUDA error.
- Note 2: there’s no need to UNSET HOME anymore, since I created a Singularity recipe to set environment.
3.Run the code
4.Results
- Note 1: I modified the
run_infer_golden.sh
file to change lm model as larger 70GB model. And skip the repeating download step to realize quicker execution. - Note 2: I modified the
infer.py
file to make the target transcription support UTF-8 Chinese by adding.encode(utf-8)
. - Note 3: I modified the batch_size from 128 to 64 in
run_test_golden.sh
file to meet the memory requirement of CWRU server.
Conclusion
Up till now, we can successfully run the whole workflow in DeepSpeech2 on CWRU server. Our ultimate target is to make my code goes into permanent production in Red Hen pipelines. The next step is to produce the transcripts in batch using data in Red Hen Lab. Zhaoqing have wrote some shell scripts to automatically find all the Chinese videos in the datasets and transform them into the form that could be input in our DeepSpeech2 model, see reference. Lastly, congratulations on passing the second eveluation! And great thanks for the help and guidance from Red Hen Lab. In the last period, I will do my best to get an excellent outcome! Thank you for reading. 🙂