Home & Office

Network management: Smart enough?

There is more autonomy and self-diagnosis in network management products today. More importantly, network intelligence is now more attuned to business goals. We look at the latest round of products.

Written by Stephen Withers, Contributor July 29, 2004 at 4:17 a.m. PT

With everything from intelligent switched routers to self-diagnosing servers, networks are getting smarter and better at ferrying your packets around the place. How are vendors putting more smarts into networks?
According to Cisco Systems distinguished engineer Michael Boland, intelligence comes into the basic task of routing when the behaviour of a router changes according to the packets it is forwarding, for example based on where they originated or their age.
With policy-based routing, anything can trigger a policy. An unpaid bill might relegate traffic to a low-performance link, a network failure would lead to rerouting around the fault, or traffic from certain classes of user might always be sent over encrypted links.
This can even extend to the core network, for example to put more bandwidth at the disposal of a particular part of the network.
Network Access and Security
Traditionally, network security has been concerned with preventing break-ins by using firewalls, intrusion detection systems, and so on, says Scott Atkinson, networking solutions director at NetForce. Today, problems are coming from inside, whether as the result of internal hacking with the help of readily obtained and easy-to-use tools, or when notebook users unintentionally bring viruses and other malware inside the firewall.
"The network is taking on more responsibility about who is using it," says his colleague, network solutions specialist Daniel Baldry, and that includes checking devices are up to date with patches and antivirus protection. "Once you're inside, the network's your oyster," he adds, as packet sniffing software can sift out unencrypted usernames and passwords, and software can be designed to sporadically bring down a network, giving the impression of an intermittent hardware fault.
Switches can work with RADIUS and other authentication servers to force users to identify themselves before they can connect to the network, says Baldry. Dick Bussiere, chief technology officer at Enterasys Networks, says his company embeds security into the switching and routing infrastructure so every switch port becomes a dynamically self-configuring firewall, with policies based on the user or device type. This improves uptime and availability as well as security, he says, and while it does come at a price, it is cheap compared with cost of outages.
This technology can be deployed at the edge of the network (ie, in the switches to which PCs and other devices are connected), or only at the distribution layer. The latter is more cost effective but not optimal, he says, because using simple edge switches means a worm on one device can spread to others connected to the same switch.
Similarly, HP has been adding support for Access Control Lists (ACLs) and open standards like IEEE 802.1X to its ProCurve edge switches. They also provide features such as MAC address lockdown and source port filtering to provide access only to appropriate users and protect open ports from inappropriate use.
When a user authenticates via 802.1X, ProCurve switches can place the user on the appropriate virtual LAN (VLAN) based on information from the RADIUS server so they can only access the relevant network resources. Where appropriate, the switch can also be set up to put a user onto a guest VLAN if authentication fails.
Fotios Kotsiopoulos, pre-sales technical -- South Pacific at HP says it is important to make these decisions at the edge. "An analogy would be a front security door at someone's home. You don't let strangers into your house and then ask who they are and what they want access to," he says.
"The use of intelligent edge switches ensures data security, [and] increases network availability by preventing the impact of unauthorised network access and denial of service attacks," he adds.
According to Kotsiopoulos, HP ProCurve switches have the ability to authenticate non-802.1X capable clients using a standard Web interface, avoiding the need to install or configure any additional 802.1X client software.
Echoing Atkinson, Bussiere says mobile computing is a serious security threat thanks to the prevalence of desktop replacement systems, so network infrastructure must play a role in protecting the organisation. Since non-PC devices such as PDAs, cameras, and IP phones are being attached to the network, you can't rely on PC-based security measures. "It's about time the network stood up and played an active role in security," says Bussiere.

According to Enterasys managing director Gary Mitchell, network management used to be about capacity and connectivity, but now continuity, context, and control are the watchwords.
Continuity is not just about raw reliability, it also concerns ensuring bandwidth is available for real business data rather than being consumed by worms, viruses, and other junk traffic. Other quality of service issues include the ability to throttle some classes of traffic (eg, e-mail) to ensure good performance for other more time-critical applications such as an ERP system.
Context means identifying who is sending the information, and from what type of device. While the number of users isn't likely to change much, the number and variety of devices will grow. It will be increasingly important to ensure that devices do not send inappropriate types of data -- for example, a printer shouldn't send e-mail. This may require fine-grained control -- if a printer was able to generate e-mail service alerts you might want to let those through, but you wouldn't let it send hundreds of e-mails per minute. One possibility is to set overall policies which are then modified for specific user classes.
Control requires a quick response to emerging threats and anomalies. An organisation is better placed if you detect and act quickly.
"The network itself has a part to play in the overall security posture of any organisation," says Mitchell.
Enterasys customers are mainly large corporations and educational institutions. The latter are "a breeding ground for lots of different types of network abuse," says Bussiere, such as improper DHCP or DNS servers, and they often need to rate-limit certain types of traffic (eg, peer-to-peer file sharing).
The dynamic distributed intrusion response can shut down traffic from a port in seconds, he says. In a test Bussiere carried out, the Blaster worm generated 175 packets per second from a PC he deliberately infected, showing the need to respond quickly. The technology can block unwanted protocols and services completely. Devices must authenticate before they can join the network, and then will only be allowed to use protocols authorised for that user. For example, only a mail server should generate certain types of SMTP traffic, so if it starts coming from an ordinary PC it is safe to block it, as it is most likely to indicate an infection by a worm containing a spambot.
When a problem is detected, the system will generate an alert to the management console and may put the user into quarantine. That can range from rate-limiting that type of traffic thorough blocking a specific protocol to taking the user off the network temporarily or indefinitely.
This approach permits a real-time response without human intervention while maintaining human oversight. "We optimise ‘time to find'," says Bussiere, identifying the switch port originating the suspect traffic in less than one minute.
Another security issue is the installation of rogue access points. Since there is no guarantee they have been securely configured, it's important to locate them quickly and take them off the network, says Baldry. Systems are available that can triangulate the positions of access points and plot their locations on a floor plan. Atkinson says some switches can be configured to disallow the connection of unauthorised access points.
It is even possible to restrict the location of wireless clients by using Newbury Networks' WiFi Watchdog, which uses a network of sensors to locate wireless clients, and when they are outside a predefined boundary their connections are denied or broken.
"We believe in the evolution of intelligence into the network itself," says Boland. "The network's coming out of the transmission function, and starting to play an integral part in the system function."
For example, a network can supplement antivirus and other protective measures by isolating devices that don't conform to a security policy. The Cisco Security Agent can work with products from vendors such as Symantec and McAfee and a RADIUS server to check devices for up-to-date patches and virus signature files, and if appropriate either deny access completely or put the device into a "walled garden" where the user can do nothing other than update the software. Other vendors such as Fortinet and Trend Micro offer similar capabilities.
Similarly, an IDS can detect abnormal activity and then isolate either an individual server or a section of the network. It may not be possible to identify a brand new threat if it enters a network before antivirus and other vendors have updated their products to identify it, says Boland, but should be possible to recognise the abnormal traffic it generates and quickly shut down network access to prevent its spread.

Rate control and QoS
Vendors such as Packeteer and Sitara (now Converged Access) are providing appliances for traffic management that mainly work at the network edge, because network vendors have not adequately addressed these issues, says Bjarne Munch, senior analyst, META Group. "I see this as being more of a short term value proposition."
These edge devices can be good if you have congested links that would be costly to expand (eg, communications between Australia and Fiji) or that are already high bandwidth (eg, Melbourne to Sydney) and you want to avoid buying more capacity just to cope with peak loads, he says. Packet shaping will prioritise the most important traffic and let the rest wait.
Such appliances are also good for monitoring and reporting, as gaining an understanding of the applications being run is a non-trivial task for a large organisation where not all purchases are centralised. Once profiles are created that identify applications' network requirements, the location of users, etc, it is possible to manage the traffic properly.
"First and foremost, you have to understand your network, then you can control it," says Steve House, Packeteer's senior manager -- product management. Packet shaping requires collection of application-specific information, so Packeteer's products now report that information, identifying response times and causes of problems for individual applications. This information can already be fed into BMC's management software, HP's OpenView and IBM's autonomic computing model.
This understanding of the data flowing through a network can also be used to support compression. Traffic awareness means the appliance only attempts to compress the compressible traffic, and applies different algorithms according to the traffic type. This approach can "effectively double or triple the bandwidth of an existing pipe," says House.
Deep packet inspection tools "are evolving very quickly," according to Boland, and are becoming more application aware. These tools permit the network fabric to take action according to the contents of data packets, for example applying rate shaping or changing the VPN. They might apply policies to determine access rights (eg, access to certain servers might be prohibited if the request comes from an outside IP address), or they could redirect URL requests to a cache.
Policy servers can redirect traffic as necessary to maintain the required quality of service, he says, basing these decisions on data collected by service assurance agents. By using open interfaces, Cisco's policy servers are able to operate with other vendors' products.
"We are beyond transmission... the real intelligence is in the areas of measuring, control, and dynamic function," he says.
Time-critical applications such as VoIP require end-to-end traffic management, says Munch, so it will need to be implemented throughout the network, for example with quality of service support in each router. In such cases, the router must be able to handle all traffic types. Some applications are very "chatty" and require low latency for good results, which means proper management is essential. VoIP and similar traffic requires all switches and routers to queue all traffic types correctly, and network equipment vendors are adding this capability to their products, he says.
House has a different opinion. He suggests that if too much high priority traffic (eg, too many voice calls) is directed across one link, QoS mechanisms will attempt to handle them all, overloading the link and giving poor performance for each stream. A Packeteer appliance will instead guarantee the required bandwidth for each call and completely deny attempted calls that would cause overload.
Another issue with QoS is that dropping lesser queues to give priority to voice or video packets can cause problems with other applications. Packet shaping can guarantee minimum bandwidth allocation for those other functions, he says.
The objective is to give "good" applications the highest priority and let "bad" ones have what's left -- in this context, bad could mean something innocuous as a large, once-off FTP transfer. "The general network will always survive with Packeteer," claims House.
Boland points out that some applications may need intelligent degradation. For example, MPEG streams consist of syncing frames followed by a series of difference frames. If you have to drop packets, you don't want to drop them from syncing frames. "We can degrade gracefully," he says.
Jonathan Spellman, IP network specialist with Damovo, agrees with Munch about the need for QoS support in the network infrastructure but warns that it is easy to make inconsistent settings on different switches, which can lead to dropped VoIP calls.
Furthermore, such switches aren't very smart in that any changes to the overall system (even something as simple as an application upgrade) may disturb its equilibrium and prevent proper data flows. This means ongoing monitoring -- whether in-house or on a managed service basis -- is required, using tools such as those from NetIQ.
According to Boland, another attribute of VoIP is that the digital signal processor (DSP) chips used in VoIP gateways and other devices are able to measure the quality of a call, and any deviation below the norm may indicate a network problem. This information should be accessible to the network management system.

Storage Area Networks
SAN functionality beyond the basic routing of data is beginning to move into storage routers. Graham Schultz, Brocade's strategic alliance manager -- Australia and New Zealand, says that while the first application for the company's Fabric Application Platform was Brocade's own multiprotocol router (which is designed to link multiple SANS via fibre-to-fibre routing, iSCSI-to-fibre bridging or fibre-IP extension), it is also available to third party developers. They are able to use the platform as the basis for a variety of services, such as virtual tape (storage that presents itself to the system as a tape library, but which actually stores the data on disk; in this context, that disk is virtualised through the SAN).
"This is a major step forward," says Schultz, but one that can be implemented progressively rather than in an immediate switchover. Organisations need to understand vendors' directions so they can plan for the next two or three years, he says. "There's a lot of work being done in the virtualisation space," says Boland. Requests to access a storage array can be handled in the network rather than by either the host or the array, and that avoids duplicating traffic on the network.
Cisco is combining with Veritas to put storage virtualisation functions into switches in order to create virtual SANs.
Automatic Correction
The previous focus of the automatic management of servers and other devices has been on event handling (eg, by OpenView and Tivoli), but according to Munch the emphasis is now moving towards end-to-end performance monitoring. This functionality is not being embedded in the network itself, but is being achieved by devices providing the hooks for management systems.
"People are starting to correlate alarms to give network actions," says Boland.
What's needed, says Munch, is a combination of load management, application level response monitoring (with different metrics for different applications), root cause analysis (as manual processes are too slow), and business level reporting (eg, in terms of a service level agreement).
In any case, change management requires proper processes, he says. Reconfiguring a network should not be an ad hoc process -- it's important to simulate and evaluate first, and doing this for a large network requires the right tools. However, there is a need for automation in order to improve the response time when problems arise.
Intrusion detection or prevention systems are capable of automated responses, but it should be a business decision to allow such automation, says Nick Day, a technical specialist with NetStar. He suggests automatic actions should be perhaps restricted to times when an appropriately skilled engineer is present to undo the mess if things go wrong, or that they should be disabled during business-critical times such as end-of-month processing.
House says that at some stage Packeteer will start to offer automatic actions based on correlated information. For example, if connections to a device suddenly spike, the reaction might be to cap bandwidth to that device and alert the appropriate administrator. If you believe such a spike could only be the result of some sort of attack, the cap could be set at a very low value. Even if you don't want to go as far as automating the response, automated diagnosis can be a big timesaver. Felix Marks, technical services manager, Australia and New Zealand at Micromuse says it can be very expensive to have sufficient staff to handle the problems that can arise in a large network with multiple routers, but the cost of an outage can be extreme.
The company's Netcool/Visionary product automates the diagnosis of problems concerning routers, switches, and associated equipment. Unless multiple aspects of devices' behaviour are considered, it can be difficult to detect misconfigured routers and other types of problems, says Marks. Visionary "is like having many of these [skilled] engineers constantly looking at those devices," he says.
Visionary ships with around 1000 pre-defined rules. For example, it knows that border gateway protocol update problems are manifested by three different symptoms occurring simultaneously. While it can take a skilled engineer up to two weeks to diagnose this type of problem, Visionary can detect it before services are affected, Marks says.
Ericsson is using Netcool products including Realtime Active Dashboards (RAD) in the management of its customers' networks. General manager network operations Michael Pease says Netcool was originally used to manage alarms, but Ericsson is moving to proactive management and the tools help maintain customer service levels and provide those customers with the information to make business and investment decisions. For example, RAD helps the company to prioritise different alarms, while customers can detect trends that will require high-level action rather than merely ongoing network management.
Computer Associates' neugent (neural agent) technology is implemented in the Unicenter suite to predict events, says principal consultant Robert Cruchley. The software finds clusters of activity, and determines patterns of movement from one cluster to another. Predictions can then be made about the likely future state of the system based on real-time information about its current state. This works well for the behaviour of servers, but "network usage tends to be unpredictable," says Cruchley, as it relates to real-world conditions -- even the weather (people are more likely to stay in the office at lunchtime on wet days and surf the Web instead of going for a walk in the park).
IBM is probably the company most associated with the move towards self-managing systems. The company suggests there are five steps along this path:

Basic. Everything monitored and managed by people.
Managed. Information from multiple subsystems is correlated into a small number of consoles.
Predictive. Management software suggests actions for human approval.
Adaptive. The system takes those actions without intervention.
Autonomic. Systems and components are dynamically managed by business rules and policies.

Most medium to large organisations would be at the managed or predictive level. Some operations are at the adaptive level. For example, IBM used its ThinkDynamics Intelligent Orchestrator earlier this year to automatically allocate infrastructure between the Australian Open Web site, grid-based credit analysis, and protein folding experiments. According to IBM officials, server resources were shifted as needed to manage the unpredictable spikes in demand for the Web site, allowing visitors to the site to access continuous, uninterrupted live scores and results. Some IBM server products already include features for self-healing, self-optimising, and other characteristic of autonomic computing.

Business Orientation
It's important to remember that all this activity takes place to support business or other organisational goals, so network intelligence should take those goals and policies into account. For example, the technical solution to a sudden wave of network activity might be to move some of the traffic to alternative links, but what if that activity is unofficial Web surfing in response to breaking news?
CA's Sonar technology puts network activity in a business context. It relates traffic to applications, and applications to system elements (such as routers and servers), giving insight into operational issues and costings.
Cruchley points out that autonomic, adaptive, and similar strategies promoted by various vendors mean systems will change dynamically, so such insights become essential. But like Munch, he calls for good change management processes. "The easy part is switching a new blade on... the hard part is managing the change around it," he says; you wouldn't start up a new server manually without following the established change process, so why would you do that in an autonomic environment?
CA's distributed intelligent architecture (DIA) combines facilities for policy-based self-deployment combined with the automatic discovery of servers and their sub-elements. Business rules are also incorporated, so the payroll system can be treated differently during pay and non-pay weeks, for example. When resources are scarce, it's important they are allocated according to business priorities.
Sonar can operate passively by watching for particular traffic and identifying the touchpoints, or in an active configuration, generating synthetic transactions. For example, it might identify the SAP-related traffic, and then monitor the service level agreements and sources of cost for those processes. Similarly, Micromuse's Netcool/RAD identifies the key components in the service path between applications and users (eg, a mainframe, Web servers, load balancers, network links), and visualises the result. It takes the logical topology of the network and maps switches, servers, etc into applications. This has several advantages, according to Marks. Firstly, it allows a green/yellow/red light indication of a service's status. Secondly, monitoring can relate to service level agreements. Thirdly, the use of a service topology permits identification of the source of any problem.
With this tool, an IT department "can really prove its worth and compete against the threat of an outsourcing arrangement," he says.
The status of individual components is determined from Netcool agents running on servers, system logs, SNMP traps, and information collected by other software such as Netcool/Visionary and synthetic users at key locations around the network in order to get a true picture of response time. RAD assembles this information, determines if it is potentially service affecting, and then performs root cause analysis to display the nature and location of the problem and recommend a fix.

This article was first published in Technology & Business magazine.
Click here for subscription information.

Editorial standards

Show Comments

Network management: Smart enough?

Related

Four reasons to buy the Apple's 2024 iPad Pro (especially if you own an older model)

I've used every iPad since the original. Here's my buying advice for the new 2024 models

5 ways to make your Echo Show less annoying