From the cloud to an intelligent era: Data centres are transforming

There is a need to filter out and automatically reorganise the information before mining it for useful information. This is where artificial intelligence (AI) comes in.

Over the past decade, data centre services have shifted from predominantly web-centric to cloud-centric. Today, they are shifting once again, this time from the era of cloud computing to the intelligent era.

The big difference is in the massive amounts of data generated during digitalisation and how that is turned into valuable and actionable information. There is a need to filter out and automatically reorganise the information before mining it for useful information. This is where artificial intelligence (AI) comes in.

Some 97% of large enterprises around the world intend to use AI by 2025, according to Huawei's Global Industry Vision (GIV). Indeed, more enterprises regard AI as the primary strategy for digital transformation. The ability to leverage AI — in decision making, for reshaping business models and ecosystems, and rebuilding positive customer experiences — will be the key to driving successful digital transformation.

One factor that is driving the leap into the intelligent era is the sheer amount of data produced through digitisation. The amount of global data produced annually will reach 180 ZB in 2025, according to Huawei's GIV. It also predicted that the proportion of unstructured data (such as raw voice, video, and image data) will also continue to increase, reaching more than 95% in the near future.

With manual big data analysis and processing methods unable to handle such large volumes of data, deep learning AI algorithms based on machine arithmetic can be used to filter out invalid data and automatically reorganise useful information. This results in more efficient decision-making suggestions and smarter behavioral guidance. In the intelligent era, enterprise data centres will shift from quick service provisioning to efficient data processing.

huawei-article-2-data-centres-1.jpg

As AI continues to advance, deep learning server clusters will emerge, along with high-performance storage media, such as Solid-State Drives (SSDs). This means more demanding requirements (μs-level) on communication latency.

In a performance-sensitive High-Frequency Trading (HFT) environment in the financial industry, low latency is needed to process large trading volumes. The fastest transaction speed of an order is approximately 100 microseconds on Nasdaq, for example.

Too slow and a trade might not go through as intended. Such an AI-oriented data operation requires a lossless network with zero packet loss, low latency, and high throughput. For a modern setup, latency has to be cut in two ways:

1. Change the protocol stack

A server's internal communication protocol stack needs to be changed. In AI data computing and SSD distributed storage systems, data processing using the traditional TCP/IP protocol stack has a latency of tens of microseconds. Therefore, it has become industry practice to replace TCP/IP with Remote Direct Memory Access (RDMA). The newer protocol can improve computing efficiency six- to eight-fold; and the 1 μs transmission latency of servers makes it possible to reduce the latency of SSD distributed storage systems from milliseconds to microseconds. For the latest Non-Volatile Memory Express (NVMe) interface protocol, RDMA has become a mainstream default network communication protocol stack.

2. Cut the latency in optical fibre transmission

To reduce latency involved in optical fibre transmission, data centers need to be deployed near the physical locations of latency-sensitive applications. As a result, distributed data centres have become the norm. However, they have important requirements that have to be met, especially when it comes to deploy services rapidly.

Here, data Communication Network (DCN) and Data Center Interconnect (DCI) solutions are increasingly popular because they boost DCN/DCI bandwidth, and ensure zero packet loss, low latency, and high throughput of lossless networks. Moore's Law supports the increase of data centre bandwidth, and the capacity of a single DCN interface for DCI will exceed 100G. The DCI network connecting data centres has evolved to a 10 Tbit/s Wavelength Division Multiplexing (WDM) interconnection network.

Supporting elastic operation and expansion of DCNs.

Networks are increasingly important to the success of high-performance services, such as AI and High-Performance Computing (HPC). A lossless network's congestion control algorithm requires collaboration between network adapters and networks themselves. At the start when the network is designed, it is necessary to quickly and accurately learn the real-time status of network-wide devices and links during Operations and Maintenance (O&M) to support stable service operation and expansion.

Optical fibre transmission systems with multi-wavelength multiplexing are widely used in DCI today. The service provisioning and maintenance modes of optical systems differ from those of digital networks, and operators usually have large teams of skilled personnel ensuring optical network maintenance.

Conversely, in the Internet Service Provider (ISP) and finance industries, the required experience and skills of IT personnel who construct and maintain data center networks are much lower. For these companies, it then becomes a challenge to deliver rapid service provisioning and accurate troubleshooting. As more data centres are built to support increasingly demanded tasks, DCI requirements increase have to on a large scale. Otherwise, it will become a key bottleneck in a data centre's development.

1. Simplify DCI system O&M

As cloud services are quickly developed and rolled out, network reconstruction and expansion have become more frequent. Traditional WDM device installation, fibre connection, configuration, and commissioning require professional planning and configuration.

The automatic planning and configuration system frees up O&M personnel from complex and professional site deployment, ensures automatic and efficient deployment, and supports quick service cloudification as well as frequent capacity expansion.

Instead of manually setting things up, automatic configuration greatly improves rollout efficiency and configuration accuracy. Think of the accuracy as well. The probability of errors in traditional manual fibre connections can often be as low as 5%. Moreover, troubleshooting, cross-checking, and verification are time-consuming and labor-intensive tasks.

2. A promise of proactive O&M for data centres

More applications are being run on the cloud and data centres, ever more dependent on key infrastructure for digitalisation. Any fault that occurs in DCI often has a severe impact. DCI will support efficient and intelligent O&M, transforming and optimising O&M from manual to automatic, from passive to active.

Compared with traditional network monitoring systems, intelligent O&M systems use built-in optical sensors to deliver optical network global visualisation (including optical fibers and optical transmission devices). In addition, intelligent O&M systems provide warnings about changes in an optical network's health, especially physical parameters such as optical power attenuation and optical wavelength drift. The system automatically analyses and previous cases. These features ensure the network failure rate is reduced and network availability is greatly improved.

Read on for more information about Huawei's cloud computing solutions.