Microsoft reckons machine-generated code should be treated with a "mixture of optimism and caution" because programming can be automated with large language models, but the code also can't always be trusted.
These large pre-trained language models include OpenAI's Codex, Google's BERT natural language program and DeepMind's work on code generation. OpenAI's Codex, unveiled in August, powers the Microsoft-owned GitHub's Copilot tool.
To address the question of code quality from these language models, Microsoft researchers have created Jigsaw, a tool that can improve the performance of these models using "post-processing techniques that understand the programs' syntax and semantics and then leverages user feedback to improve future performance."
SEE: Software development is changing again. These are the skills companies are looking for
It's currently designed to synthesize code for Python Pandas API using multi-modal inputs, says Microsoft. Pandas is a popular data manipulation and analysis library for data scientists who use the Python programming language.
"With Project Jigsaw, we aim to automate some of this vetting to boost the productivity of developers who are using large language models like Codex for code synthesis," explains the Jigsaw team at Microsoft Research.
Microsoft reckons Jigsaw can "completely automate" the entire process of checking whether code compiles, addressing error messages, and testing whether the code produces what the developer wanted it to output.
"Jigsaw takes as input an English description of the intended code, as well as an I/O example. In this way, it pairs an input with the associated output, and provides the quality assurance that the output Python code will compile and generate the intended output on the provided input," they note.
The paper, Jigsaw: Large Language Models meet Program Synthesis, looks at the approach in Python Pandas.
Using Jigsaw, a data scientist or developer provides a description of the intended transformation in English, an input dataframe, and the corresponding output dataframe. Jigsaw then synthesizes the intended code.
SEE: Remote-working jobs vs back to the office: Why tech's Great Resignation may have only just begun
Microsoft found that Jigsaw can create the correct output 30% of the time. In this system, natural language and other parameters are pre-processed, fed into Codex and GPT-3, and then the post-process output is returned to the human for verification and editing. That final human check is fed back into the pre- and post-process mechanisms to improve them. If the code fails, Jigsaw repeats the repair process during the post-processing stage.
Jigsaw improves the accuracy of output to greater than 60% and, through user feedback, the accuracy improves to greater than 80%, according to Microsoft Research.
Microsoft notes that several challenges need to be overcome before it has a true "pair programmer". For example, it only tested quality of I/O of synthesized code. In reality, code quality would include whether the code performance is good, does not have security flaws, and respects licensing attribution.