Multilingual Code Snippets Training for Program Translation – Sanghani Center for Artificial Intelligence and Data Analytics

Ming Zhu, Chandan Reddy

Abstract

Program translation aims to translate source code from one programming language to another. It is particularly useful in applications such as multiple-platform adaptation and legacy code migration. Traditional rule-based program translation methods usually rely on meticulous manual rule-crafting, which is costly both in terms of time and effort. Recently, neural network based methods have been developed to address this problem. However, the absence of high-quality parallel code data is one of the main bottlenecks which impedes the development of program translation models. In this paper, we introduce CoST, a new multilingual Code Snippet Translation dataset that contains parallel data from 7 commonly used programming languages. The dataset is parallel at the level of code snippets, which provides much more fine-grained alignments between different languages than the existing translation datasets. We also propose a new program translation model that leverages multilingual snippet denoising auto-encoding and Multilingual Snippet Translation (MuST) pre-training. Extensive experiments show that the multilingual snippet training is effective in improving program translation performance, especially for low-resource languages. Moreover, our training method shows good generalizability and consistently improves the translation performance of a number of baseline models. The proposed model outperforms the baselines on both snippet-level and program-level translation, and achieves state-of-the-art performance on CodeXGLUE translation task. The code, data, and appendix for this paper can be found at https://github.com/reddy-lab-code-research/MuST-CoST.

Ming Zhu, Karthik Suresh, Chandan K. Reddy: Multilingual Code Snippets Training for Program Translation. AAAI 2022: 11783-11790

People

Chandan Reddy

Associate Professor of Computer Science

Ming Zhu

Alumni

Publication Details

Date of publication:: June 28, 2022
Conference:: AAAI Conference on Artificial Intelligence
Page number(s):: 11783-11790
Volume:: 36
Issue Number:: 10