Tutorial for Cluster Distributed Training using Slurm+Singularity
This tutorial covers how to setup a cluster of GPU instances on AWS and use Slurm to train neural networks with distributed data paralleli...
Importance-Aware Learning for Neural Headline Editing
Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We aim to automate the headline editing process to ...
Notes on CVPR 2019
This is a note of thoughts and summaries of what is seen and heard at CVPR 2019. It will mainly be about papers related to NLG and Language+Vision.
Explore Gradient-Checkpointing in PyTorch
This is a practical analysis of how Gradient-Checkpointing is implemented in Pytorch, and how to use it in Transformer models like BERT and GPT2.