X

Business

Home Business Enterprise Software

Programming languages: Julia touts its speed edge over Python and R

Benchmarks suggest programming language Julia may be the best choice for big-data analysis using CSV format files.

Written by Liam Tung, Contributing Writer June 22, 2020 at 6:38 a.m. PT

If you're a data scientist and need to analyze loads of CSV files for insights into, say, stock-price and market movements, the Julia programming language trumps machine-learning rivals Python and R, according to Julia supporters.

Machine learning has propelled Python upwards to make it probably the most popular programming language among developers these days, along with Java and JavaScript.

Developer

However, Julia, a young language with roots in MIT's Computer Science and Artificial Intelligence Lab (CSAIL), has also become one to watch, having found a core audience among data scientists.

SEE: Virtual hiring tips for job seekers and recruiters (free PDF) (TechRepublic)

Julia is not among the top 10 programming languages that developers use but it is in the top 10 most-loved programming languages in this year's survey from Stack Overflow, putting it up there with Rust, TypeScript, Python, Kotlin, Go, Dart, C#, Swift, JavaScript and SQL.

Some languages such as Rust aren't widely used by developers but they are appreciated by programmers for qualities that excel in systems programming, versus application programming. For example, Microsoft is looking to Rust for the memory-safety features lacking in C and C++, which are extensively employed in Windows and other Microsoft projects.

Julia on the other hand has been adopted by some programmers for its C-like speed, but it has a much smaller ecosystem of packages than Python.

A recent update to Julia has improved multi-threading to offer more speed enhancements, and that's what Julia developers argue is giving it a sizable edge over Python and statistical programming language R at the task of parsing CSV files for data analysis.

According to Deepak Suresh, a machine-learning engineer at Julia Computing, multithreading capabilities give Julia libraries an advantage over both machine-learning rivals with a range of different datasets accessed from CSV files, or comma-separated values text files.

Suresh has benchmarked statistical programming language R's fread, Pandas' read_csv for Python, and Julia's CSV.jl CSV parsers and reckons that Julia comes out on top.

"Julia's CSV.jl is 1.5 to 5 times faster than Pandas even on a single core; with multithreading enabled, it is as fast or faster than R's read_csv," he notes.

The benchmarks were carried out on a machine with Ubuntu 18.04 powered by an Intel Xeon Silver 4114 processor running at 2.20GHz.

As he explains, Julia's CSV.jl is the only tool that is "fully implemented in its higher-level language rather than being implemented in C and wrapped from R/Python".

The benchmarks are meant to demonstrate the speed of loading data in Julia and also indicate the performance of Julia code during data analysis.

One of the example benchmarks looks at Apple stock price states – open, high, low and close – using a 2.5GB dataset with 50 million rows and five columns.

"The single threaded CSV.jl is about 1.5 times faster than R's fread from data.table. With multithreading CSV.jl is about 22 times faster. Pandas' read_csv takes 34s to read, this is slower than both R and Julia," Suresh declares.

SEE: Programming languages: Developers reveal what they love and loathe, and what pays best

Another looks at performance with a mortgage risk dataset from Google-owned data-science platform, Kaggle, which contains mixed type dataset, with 356,000 rows and 2,190 columns.

"Pandas takes 119s to read in this dataset. Single-threaded fread is about twice faster than CSV.jl. However, with more threads Julia is either as fast or slightly faster than R," says Suresh.

Another is the acquisition dataset from US mortgage lender, Fannie Mae, which has four million rows and 25 columns.

"Single-threaded data.table is 1.25 times faster than CSV.jl. But, the performance of CSV.jl keeps increasing with more threads. CSV.jl gets about 4 times faster with multi-threading," he says.

Julia Computing says, across all eight datasets, Julia's CSV.jl is always faster than Pandas, and with multi-threading it is competitive with R's data.table.
Image: Julia Computing

More on Julia and programming languages

Microsoft lead engineer: Programming language TypeScript took off thanks to Google's Angular

JavaScript creator Eich: My take on 20 years of the world's top programming language

Programming languages: Java still rules over Python and JavaScript as primary language

Julia programming language: Users reveal what they love and hate the most about it

Mozilla is funding a way to support Julia in Firefox

MIT: We're building on Julia programming language to open up AI coding to novices

Programming languages: Developers reveal what they love and loathe, and what pays best

Programming languages: Rust enters top 20 popularity rankings for the first time

Microsoft: Here's why we love programming language Rust and kicked off Project Verona

Microsoft: Bosque is a new programming language built for AI in the cloud

Programming languages: Python apps might soon be running on Android

Programming languages: Python developers reveal what they use it for and their top tools

Microsoft: Our new free Python programming language courses are for novice AI developers

Goodbye Python 2 programming language: This is the final Python 2.7 release

New programming language rankings: Python now as popular as Java, as TypeScript climbs

Programming languages: Java developers flock to Kotlin and ditch Oracle JDK for OpenJDK

Programming languages: Go and Python are what developers most want to learn

Netflix: Our Metaflow Python library for faster data science is now open source

Tech jobs: Python programming language and AWS skills demand has exploded

Python programming language creator retires, saying: 'It's been an amazing ride'

Programming languages: How Instagram's taming a multimillion-line Python monster

Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics

Microsoft: We want you to learn Python programming language for free

Is Julia the next big programming language? MIT thinks so, as version 1.0 lands TechRepublic

Mozilla's radical open-source move helped rewrite rules of tech CNET

Editorial standards

Show Comments

Related

coding6gettyimages-1410034699

Agile development can unlock the power of generative AI - here's how

ai-color-doc-gettyimages-157568820

Adobe's PDF-reading AI Assistant starts at $4.99/month - here's how to try it for free

Close-up of an Escort radar detector in-use in someone's car

The best radar detectors you can buy