GSoC Week 1: Coding begins
Before I start, I wish to highlight that from now onward, I will be continuing my #10WeeksChallenge in the form of GSoC weekly blogs.
So, the coding period for GSoC 2018 began on May 14. This period will last for a total of 12 Weeks (May 14— August 14). This is my second blog regarding Google Summer of Code. If you like, you can read the first one here. Let’s start 😎
1. Deep Speech 2 on PaddlePaddle
This week, the most important breakthrough is that we found an open source program named DeepSpeech2 on PaddlePaddle which better suit Chinese Pipeline rather than Mozilla’s DeepSpeech. DeepSpeech2 on PaddlePaddle is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on Baidu’s DeepSpeech2 paper, with PaddlePaddle platform. The paper shows that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. Accordingly, I decided to practice on Deep Speech 2 Project using PaddlePaddle this week. Since we are still not able to run PaddlePaddle on remote server, I started my first attempt on my local mac.
1.1 PaddlePaddle Installation on Local Mac
You can use pip to install PaddlePaddle with a single command. But there are many little problems during installation and cost me a lot of time to fix(See the following Notes for detail).
- Note 1: Make sure that your default python version Python 2.7 series.
- Note 2: pip only supports manylinux1 standard, you’ll need to upgrade your pip to >9.0.0.
- Note 3: Use sudo pip instead or you’ll get permission denied error.
1.2 PaddlePaddle Use on Local Mac
Create a new file called housing.py
, and paste this Python code:
Run python housing.py
and voila! It should print out a list of predictions for the test housing data.
1.3 PaddlePaddle Use on Local Mac
- Make sure these libraries or tools installed: pkg-config, flac, ogg, vorbis, boost and swig,(I installed them via homebrew with proxy):
- Run the setup script for the remaining dependencies.
Note : Remember to use “sudo” and using “brew install gcc” to install Fortran compiler.
1.4 PaddlePaddle Installation on CWRC HPC
Our project is based on PaddlePaddle framework. However, we ran into some trouble about installing PaddlePaddle on HPC. The reason is that the makers of the Docker image put stuff in /root, which is not accessible in Singularity unless you have root rights on the host machine (see error message below). And that defeats the main purpose in using Singularity with Docker images, i.e. have the students do the work without our intervention and without sudo rights directly on the Case HPC.
The Red Hen is still working on this problem, I hope we can work out ASAP so that we can run PaddlePaddle on the server😊
Reference
2. Reading Chinese data on gallina
2.1 Modify the path
Since the server path changed to /mnt/rds/redhen/gallina
, I updated my function day()
in ~/.bashrc accordingly
. The demo is shown as follows.
2.2 Basic commands
3. Submit Jobs on CWRU HPC
3.1 First Try Using Batch mode
I tried a little “hello world!” python demo to test my work.
- Firstly, I wrote a SLURM script
- Then submit this batch script
- Run!
Reference
Tutorial from Peking University
4. Conclusion
It was a pleasant first week of Google Summer of Code. If all goes well, I will be able to run Deep Speech 2 model at HPC in the next week. I am strictly following my timeline. The 2nd week ends on May 27. Till then, happy coding and cheers. 😀