X
Business

Programming languages: Julia touts its speed edge over Python and R

Benchmarks suggest programming language Julia may be the best choice for big-data analysis using CSV format files.
Written by Liam Tung, Contributing Writer

If you're a data scientist and need to analyze loads of CSV files for insights into, say, stock-price and market movements, the Julia programming language trumps machine-learning rivals Python and R, according to Julia supporters.  

Machine learning has propelled Python upwards to make it probably the most popular programming language among developers these days, along with Java and JavaScript. 

However, Julia, a young language with roots in MIT's Computer Science and Artificial Intelligence Lab (CSAIL), has also become one to watch, having found a core audience among data scientists. 

SEE: Virtual hiring tips for job seekers and recruiters (free PDF) (TechRepublic)

Julia is not among the top 10 programming languages that developers use but it is in the top 10 most-loved programming languages in this year's survey from Stack Overflow, putting it up there with Rust, TypeScript, Python, Kotlin, Go, Dart, C#, Swift, JavaScript and SQL. 

Some languages such as Rust aren't widely used by developers but they are appreciated by programmers for qualities that excel in systems programming, versus application programming. For example, Microsoft is looking to Rust for the memory-safety features lacking in C and C++, which are extensively employed in Windows and other Microsoft projects.

Julia on the other hand has been adopted by some programmers for its C-like speed, but it has a much smaller ecosystem of packages than Python. 

A recent update to Julia has improved multi-threading to offer more speed enhancements, and that's what Julia developers argue is giving it a sizable edge over Python and statistical programming language R at the task of parsing CSV files for data analysis. 

According to Deepak Suresh, a machine-learning engineer at Julia Computing, multithreading capabilities give Julia libraries an advantage over both machine-learning rivals with a range of different datasets accessed from CSV files, or comma-separated values text files. 

Suresh has benchmarked statistical programming language R's fread, Pandas' read_csv for Python, and Julia's CSV.jl CSV parsers and reckons that Julia comes out on top.

"Julia's CSV.jl is 1.5 to 5 times faster than Pandas even on a single core; with multithreading enabled, it is as fast or faster than R's read_csv," he notes. 

The benchmarks were carried out on a machine with Ubuntu 18.04 powered by an Intel Xeon Silver 4114 processor running at 2.20GHz.    

As he explains, Julia's CSV.jl is the only tool that is "fully implemented in its higher-level language rather than being implemented in C and wrapped from R/Python". 

The benchmarks are meant to demonstrate the speed of loading data in Julia and also indicate the performance of Julia code during data analysis. 

One of the example benchmarks looks at Apple stock price states – open, high, low and close – using a 2.5GB dataset with 50 million rows and five columns. 

"The single threaded CSV.jl is about 1.5 times faster than R's fread from data.table. With multithreading CSV.jl is about 22 times faster. Pandas' read_csv takes 34s to read, this is slower than both R and Julia," Suresh declares. 

SEE: Programming languages: Developers reveal what they love and loathe, and what pays best

Another looks at performance with a mortgage risk dataset from Google-owned data-science platform, Kaggle, which contains mixed type dataset, with 356,000 rows and 2,190 columns.

"Pandas takes 119s to read in this dataset. Single-threaded fread is about twice faster than CSV.jl. However, with more threads Julia is either as fast or slightly faster than R," says Suresh. 

Another is the acquisition dataset from US mortgage lender, Fannie Mae, which has four million rows and 25 columns.

"Single-threaded data.table is 1.25 times faster than CSV.jl. But, the performance of CSV.jl keeps increasing with more threads. CSV.jl gets about 4 times faster with multi-threading," he says. 

juliabenchmarksjune20.jpg

Julia Computing says, across all eight datasets, Julia's CSV.jl is always faster than Pandas, and with multi-threading it is competitive with R's data.table.  

Image: Julia Computing

More on Julia and programming languages

  • Microsoft lead engineer: Programming language TypeScript took off thanks to Google's Angular  
  • JavaScript creator Eich: My take on 20 years of the world's top programming language  
  • Programming languages: Java still rules over Python and JavaScript as primary language  
  • Julia programming language: Users reveal what they love and hate the most about it    
  • Mozilla is funding a way to support Julia in Firefox
  • MIT: We're building on Julia programming language to open up AI coding to novices
  • Programming languages: Developers reveal what they love and loathe, and what pays best  
  • Programming languages: Rust enters top 20 popularity rankings for the first time  
  • Microsoft: Here's why we love programming language Rust and kicked off Project Verona  
  • Microsoft: Bosque is a new programming language built for AI in the cloud  
  • Programming languages: Python apps might soon be running on Android  
  • Programming languages: Python developers reveal what they use it for and their top tools  
  • Microsoft: Our new free Python programming language courses are for novice AI developers  
  • Goodbye Python 2 programming language: This is the final Python 2.7 release  
  • New programming language rankings: Python now as popular as Java, as TypeScript climbs  
  • Programming languages: Java developers flock to Kotlin and ditch Oracle JDK for OpenJDK  
  • Programming languages: Go and Python are what developers most want to learn  
  • Netflix: Our Metaflow Python library for faster data science is now open source  
  • Tech jobs: Python programming language and AWS skills demand has exploded  
  • Python programming language creator retires, saying: 'It's been an amazing ride'
  • Programming languages: How Instagram's taming a multimillion-line Python monster
  • Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics  
  • Microsoft: We want you to learn Python programming language for free
  • Is Julia the next big programming language? MIT thinks so, as version 1.0 lands TechRepublic  
  • Mozilla's radical open-source move helped rewrite rules of tech CNET
  • Editorial standards