This paper nicely summarizes various classical sequence level loss functions used in structured prediction literature in the past and show that these methods work well for neural sequence prediction tasks. They demonstrate their results on two machine translation datasets-- IWSLT14, and WMT14 En-Fr, and Gigaword abstractive summarization dataset.