Korea’s Tajo data warehouse to strut its stuff at Hadoop Summit

Open source data warehouse engine Tajo aims to attract new contributors at the June 3 Hadoop Summit in San Jose, California.

Choi Hyun-sik, vice president of the Apache Tajo project and a research engineer at Korean big data infrastructure start-up, Gruter, told ZDNet Korea: "the focus of my presentation will be on how to advance Tajo’s local processing engine.”

Tajo logo: elephant-free big data imagery

“Vectorized engine will be discussed as a solution. How bottlenecks are formed, what causes performances to slow down, and what we did to solve these using the engine,” he said.

Tajo is one of many emerging SQL-on-Hadoop options designed to analyze data stored on Hadoop Distributed File System (HDFS) and other data sources with low-latency. Choi and his team of developers are confident their engine is equal to or better than competitors like Hive and Cloudera Impala. Tajo provides an extract, transform and load (ETL) feature set and an extensible query re-write system.

Tajo development started in 2010 and was picked up by the Apache Software Foundation in March last year. After a year of “incubating” it was named an Apache top-level project in April, a first for a Korean analytics project.

“There are more contributors now since being named as an Apache top-level project. One who was very active was added as a committer recently,” says Choi. He says once Tajo 0.8.0, released in May, reaches the stability and capabilities of a commercialized data base they will release version 1.0.

Korea’s largest telco, SK Telecom, started using Tajo for big data analytics last year. The company claims Tajo increased processing times by a factor of 3.7 compared to Hive while workload decreased 70 percent.

“I think the most important thing in open source is people,” says Choi. “People start being attracted to open source [products] by seeing how technologically cool they are at communities and conferences.” He is headed for Los Angeles afterwards to give the lowdown on Tajo at Big Data Camp 2014, starting June 14.

“If someone does something progressive and forward-looking in a project, you are drawn to it,” says Choi. “I hope to show during the Hadoop Summit presentation that ‘we’ve gone this far,’ so if you want, you should join us. I’ve love to have great contributors join our project.”