Top programming language for data science: Python still rules, followed by SQL

But while data scientists might be creating clever data visualizations, many decision makers still don't understand stories told through data.
Written by Liam Tung, Contributing Writer

Data science and machine learning professionals have driven adoption of the Python programming language, but data science and machine learning are still lacking key tools in business and has room to grow before becoming essential for decision-making, according to Anaconda, the maker of a data science distribution of Python. 

Python could soon be the most popular programming language, battling it out for top spot with JavaScript, Java and C, depending on which language ranking you look at. But while Python adoption is booming, the fields that are driving it — data science and machine learning — are still in their infancy. 

Most respondents (63%) said they used Python frequently or always while 71% of educators said they're teaching machine learning and data science with Python, which has become popular because of its ease of use and easy learning curve. An impressive 88% of students said they were being taught Python in preparation to enter the data science/machine learning field. 

Given Anaconda's audience, it's not surprising Python was by far the most popular language used. It was followed by SQL, R, JavaScript, HTML/CSS, Java, Bash/Shell, C/C++, C·, Typescript, PHP, Rust, Julia, and Go. 

Over a third (37%) of 4,299 data science professionals, students and academics who responded to Anaconda's online survey this April to May said their organizations decreased investments in data science, while 26% increased their investment and 24% said investments were flat. It's not clear what impact the pandemic has had on investments in data science tools and technology. 

Still, some 39% said reported that "many" of their business decisions rely on data science, while 35% said only some business decisions were based on insights from their team.  

A quarter of respondents said they lacked the resources for effective analysis, while another quarter said decision-makers at their organization struggle with data literacy, and 11% said they or their team couldn't demonstrate a business impact. 

Only 36% described their organization's decision-makers as "very data literate" and actually understood data visualization and models. Just over half (52%) said decision-makers were "mostly data literate". 

Anaconda also asked respondents to nominate all the skills they believe their organization were currently lacking. The top missing skill was in "big data management" at 38%, while 26% said their organization was lacking advanced mathematics, and a quarter cited "business knowledge" as lacking. 

Other commonly cited skills in short supply were deep learning (27%), communication skills (22%), data visualization (22%), machine learning (21%), Python (20%), and probability and statistics (19%). 

The top problem that most data science folks felt needed to be tackled in artificial intelligence and machine learning was "social impacts from bias in data and models" (31%), followed by "impacts to individual privacy". Both of these issues have been highlighted by the adoption of AI and facial recognition in public surveillance systems. Microsoft president Brad Smith recently called for the government to regulate facial recognition due to racial bias.   

Other top concerns included job losses from automation (19%), advanced information warfare (15%), and lack of diversity and inclusion in the profession (10%).

Just 10% of respondents said their organization had implemented a solution to ensure fairness and mitigate bias, but Anaconda found 30% were planning to implement a step in the next year.

Explainability and interpretability of ML models was another large gap. Some 31% said their organization lacked plans to ensure explainability and interpretability, but 41% said plans were in place to implement some steps in the next 12 months or have one step already.     

Most respondents (65%) said their employers encouraged them to contribute to open-source projects, but 18% of respondents said employer support for open source decreased due to COVID-19 or other factors. 

Some 41% said security bugs in open source software was the main obstacle preventing their organization using open source software. Python and many of its popular data science and machine learning packages/libraries, such as NumPy and TensorFlow, are open source projects.  

Interesting, a quarter of respondents said they were not securing their open-source pipeline while 20% didn't know what steps their organization was taking to ensure vulnerabilities are managed. Anaconda provides an enterprise service to help organizations block or include packages that meet an enterprise's standards. It also has a managed library of 7,500 open-source packages for Python. 

Editorial standards