Generally speaking, the Linux kernel works just fine for most tasks. But, to get the most from it for a particular job, you must fine-tune it by setting its parameters for the best possible results. There's only one little problem with this approach. There are thousands of parameters. Even for a Linux expert, tuning them for optimal performance is a long, hard job. And, of course, different workloads require different tunings for different sets of Linux kernel parameters. Thus, as Wang said, "In large-scale data centers like ByteDance's, it has become nearly impossible to tune Linux kernel parameters manually for hundreds of different workloads."
Tools such as System Management Interface Tool (SMIT), Sysctl, and TuneD can help. But they just enable you to make manual kernel tuning changes more easily. There are also "smart" programs, such as Red Hat's BayOp, that use ML to optimize network application efficiency specifically, but it's not a general-purpose AI/ML program; it's meant for a specific kind of Linux tuning.
What ByteDance is working on is a first attempt to automate the entire Linux kernel parameter tuning process with minimal engineering efforts. Specifically, ByteDance is working on tuning Linux memory management. ByteDance has found that with machine learning algorithms, such as Bayesian optimization, automated tuning could even beat most Linux kernel engineers.
Why? Well, the idea, as Wang wryly put it, "is not to put Linux kernel engineers out of business." No, the goal is "to liberate human engineers from tuning performance for each individual workload. While making better decisions with historical data, which humans often struggle with. And, last, but never least, find better solutions than those we come up with using our current trial and error, heuristic methods.
How? The autotuning system is designed to automatically adjust the Linux kernel's internal settings based on the specific workload and hardware configuration. This dynamic adjustment ensures optimal performance, addressing a long-standing challenge in the Linux community of manually tuning the kernel for specific scenarios. To do this, the AI/ML framework uses multiple algorithms such as Bayesian Optimization, Genetic Algorithm, and the Simulated Annealing/Evolutionary Algorithm
Dynamic Optimization: The system continuously monitors the kernel's performance, making real-time adjustments to settings such as CPU frequency scaling and memory management.
Enhanced Efficiency: By optimizing resource usage, the autotuning system significantly improves the efficiency of Linux systems, particularly in environments with varying workloads.
User-Friendly Interface: The system includes a user-friendly interface, allowing even those with limited technical knowledge to benefit from enhanced kernel performance.
Customizable Settings: Advanced users can customize the autotuning parameters, tailoring the system to their specific needs.
It's still early days, but ByteDance is already seeing some success. For example, by using DAMON, a Linux kernel subsystem for memory access monitoring and optimization, with the framework, they were able to find the best scheme for a MySQL application. It did this by running different DAMON schemes and comparing their performance. They found they could reduce the application's memory usage by 30%. For massive applications, that's a real savings.
In another case, ByteDance was able to optimize HTTP network latency on an NGINX server by optimizing the tuning of 16 kernel sysctl parameters. In its best scenario, the ML tuning gave the NGNIX network performance a 12% boost over expert manual tuning. Again, that's a significant improvement.
ByteDance isn't claiming its AI/ML approach will work for every Linux tuning job, but Wang did say, "Although there are limitations, we believe that kernel machine learning is not only possible but also necessary."
Me? I think this is a potential game-changer for Linux applications. By simplifying the kernel optimization, it will make Linux more accessible and efficient for a broader range of users and applications. In particular, I see the autotuning system kicking up the performance on almost all servers, cloud computing, and data center applications.