If you're a data scientist and need to analyze loads of CSV files for insights into, say, stock-price and market movements, the Julia programming language trumps machine-learning rivals Python and R, according to Julia supporters.
Julia on the other hand has been adopted by some programmers for its C-like speed, but it has a much smaller ecosystem of packages than Python.
A recent update to Julia has improved multi-threading to offer more speed enhancements, and that's what Julia developers argue is giving it a sizable edge over Python and statistical programming language R at the task of parsing CSV files for data analysis.
According to Deepak Suresh, a machine-learning engineer at Julia Computing, multithreading capabilities give Julia libraries an advantage over both machine-learning rivals with a range of different datasets accessed from CSV files, or comma-separated values text files.
Suresh has benchmarked statistical programming language R's fread, Pandas' read_csv for Python, and Julia's CSV.jl CSV parsers and reckons that Julia comes out on top.
"Julia's CSV.jl is 1.5 to 5 times faster than Pandas even on a single core; with multithreading enabled, it is as fast or faster than R's read_csv," he notes.
The benchmarks were carried out on a machine with Ubuntu 18.04 powered by an Intel Xeon Silver 4114 processor running at 2.20GHz.
As he explains, Julia's CSV.jl is the only tool that is "fully implemented in its higher-level language rather than being implemented in C and wrapped from R/Python".
The benchmarks are meant to demonstrate the speed of loading data in Julia and also indicate the performance of Julia code during data analysis.
One of the example benchmarks looks at Apple stock price states – open, high, low and close – using a 2.5GB dataset with 50 million rows and five columns.
"The single threaded CSV.jl is about 1.5 times faster than R's fread from data.table. With multithreading CSV.jl is about 22 times faster. Pandas' read_csv takes 34s to read, this is slower than both R and Julia," Suresh declares.