For many organizations, the intranet is one of the most valuable corporate resources. Even so, they have not developed a comprehensive strategy for managing intranet performance now and into the future. As a result of unplanned network growth, companies are running mission-critical applications on intranets without monitoring and managing the health of the network. When network problems occur, no one knows why, how or where. Fixing even trivial problems becomes a headache.
Why? Intranets often are unreliable and inefficient because they evolved piecemeal. There was no strategy driving technical choices. Instead, the need for applications like file sharing and e-mail encouraged intranets to grow in a random, decentralized fashion. Technical quick fixes, like using a bridge to tie two networks together, complicated the problem. With such a hodgepodge of hardware and software and no blueprint for intranet growth, getting the most out of the network--or even keeping it up and running--poses a thorny challenge.
The term "intranet" has many different meanings. Some people would say that intranets aren't new at all--that companies have been using them for years and calling them internetworks. What's new with intranets is the use of IP software and standards on private networks. Some intranets comprise only a few Web servers and IP client software, but more typically intranets encompass a range of new IP tools, including firewalls, virtual private network (VPN) devices, load balancers, and the like. Increasingly the trend is for companies to place all their internal computing resources on an intranet. In addition, many intranets span multiple locations, moving IP traffic over several different types of WAN links.
When intranets first appeared a couple of years back, it didn't matter if they were unreliable because the applications they supported were nonessential. Today, in contrast, intranets support core business applications. In this environment, network problems can translate into significant problems for the business. Perhaps the company's network slows every Thursday afternoon but by Friday afternoon it's working fine, without intervention. The company knows there's a problem but not what it is or how it resolves itself. The implications are staggering for a business whose most valuable applications depend on the intranet.
To avoid endangering business operations because of intranet problems, organizations need to define a strategy for managing intranet performance. Tools like network monitoring devices and network management software are important to ensure network health, but only within the framework of an overall network management strategy.
An effective strategy boils down to just four steps. First, obtain a clear understanding of the existing environment. Second, understand the business drivers behind intranet growth. Third, define the data collection approach. Finally, define data analysis techniques. With these steps in place, virtually any aspect of intranet growth can be managed.
Four Easy Pieces
The first task is to understand the existing environment. This doesn't mean collecting a bunch of measurements; in fact, network monitoring tools aren't needed at this stage. The key here is to interview users and review existing information sources like network diagrams.
The user interviews provide information on applications, work patterns, user perceptions and perspectives on the network, and data flows for the applications (vis-a-vis server-to-user location). It's important to discover which applications are most critical, who uses them, and how often (see "Four-Step Crash Course" sidebar for sample questions). This qualitative information eventually can be coupled with quantitative data to gain greater insight into the network's strengths and weaknesses.
Although studying network diagrams may seem simplistic, it's all too often overlooked; the majority of companies cannot produce accurate basic diagrams. Ideally, network diagrams should show not only topological information but also identify key servers and protocols in use.
Understanding the Business Case
Understanding an organization's business drivers is a prerequisite to any study of intranet performance requirements. What shapes the way an organization operates? Business factors like acquisitions and consolidations play a role, as do technology considerations. For example, a company may currently have an all-IP environment; however, through acquisitions, it may be bring in organizations that are all SNA and all IPX. Companies need to plan for such exigencies.
Defining a strategic direction for network management also is crucial. As part of the strategy, companies typically formulate guidelines for recommended technologies, preferred vendors, and selective outsourcing. For all of these guidelines, technology decisions must be informed by business drivers, not the other way around. For example, a company might examine what business drivers would justify use of an ATM backbone for its intranet. If the company is looking to cut travel costs, it may be able to justify ATM to support a videoconferencing system.
With intranets, business case issues are more important than ever before. Network managers today have more new tools at their disposal, and the pace of product rollouts will only accelerate in the future. With the explosion in intranet-related products, network managers can't possibly deploy every new technology that comes along. The only way to control the chaos is to implement guidelines for evaluating how each new product or service will serve the organization's core business.
As a case in point, consider the case of a large organization whose firewall segment suffered severe congestion because more than 5,000 intranet users had downloaded push-based screen savers from the public Internet. The company's network managers, in a predictable but justifiable step, closed down access to the push application by disabling its TCP port on the firewall. In this instance, there was no business justification for the screen savers, which displayed stock quotes. But a company in financial services might very well want employees (or at least some employees) to use the push-based application.
Collecting the Data
In developing a data collection strategy, the operative words are "when," "where," and "what." In other words, when and where is it desirable to gather information from the network? What information should net managers collect?
Many companies think the correct answer to "when?" is to gather data only during peak periods of network traffic. This is a way of planning for the worst-performance, worst-case scenario. The company engineers its network to meet the most traumatic possibility: a serious network problem during a peak period. The result is an over-engineered network. A more cost-effective approach is to look at both peak and non-peak, or representative, periods of network use. This provides a fuller picture of network patterns, showing both the norm and the delta between the norm and peak. Ultimately, it provides a better understanding of how the network performs.
While peak and non-peak issues apply to virtually any type of networks, there are some "when" questions that apply specifically to intranets. For example, what's the work pattern for a particular Web server? When will a Web server be down for maintenance? What kinds of access patterns will a new Web server encourage? Will access be random or will it conform to patterns that can be quantified over time?
Oftentimes Web servers encourage access from totally unexpected places. An accounting department, for example, might put up a Web server, drawing requests to post information on the server from other departments at the same location. This in turn may lead to requests from users at other locations to post data--leading to access attempts over slow WAN links. The result may be congestion in the WAN--a totally unforeseen consequence of a local Web server.
Another data collection question is where to place monitoring agents. Data collection can be standalone or embedded devices (i.e., switches, hubs) physically attached to the network. To remedy network bottlenecks, the IT staff must be able to find the problem; instrumentation makes that possible. It can be cost-prohibitive to put monitors in every component on the network, but there are ways to monitor high-profile areas and obtain all the necessary data.
When it comes to intranets, monitoring performance of Web servers and proxy servers is critical. Given the explosive growth of intranet devices, it's also important to be sure the monitors are placed where they can offer a true picture of network traffic. Fortunately, technology can help here. The use of LAN switches may allow the use of fewer, more intelligently placed monitoring agents.
Instrumentation is straightforward if the organization has server farms; outfitting the server farms with monitoring agents delivers all server information. Server farm monitors don't reveal client-to-client communications, but that's not necessarily a problem. Client-to-server and server-to-server communication accounts for nearly all intranet problems.
The last, and probably most important, consideration in data collection is determining what to measure. It pays to be choosy about what to measure. It's possible to collect hundreds of metrics on the network and on particular servers. But not all of those metrics are useful in improving network performance. When choosing metrics, focus on business needs rather than trying to collect every piece of data that can possibly be collected. Ideally, metrics relate closely to business priorities.
For example, what would the organizations need to measure to make sure Web server transactions are processed quickly enough? Measuring application response time would be the best metric. Application response time is a direct measurement of transaction processing speed and is a primary indicator of network performance.
However, given the current level of monitoring technology, it is not usually feasible to measure application response time, though software and net management vendors are beginning to try. There are two approaches. First, companies can monitor application response time by modifying key applications to add reporting hooks. This approach is costly and time-consuming--so much so that it's typically used on only one or two mission-critical apps. Second, companies can use bolt-on monitoring tools now appearing from net management vendors. Here again, these tools focus heavily on one or two tools--for example, monitoring SAP R3 from Software AG (Munich) or Oracle8 from Oracle Corp. (Redwood Shores, Calif.) to the exclusion of all other apps. There is no one response time monitor that covers the entire universe of applications.
Since primary metrics might not be available, companies can turn to secondary indicators. The clearest of these is network response time--the amount of delay incurred as packets travel from end to end through the various routers, switches of the intranet.
Tertiary indicators are one more step removed; these include metrics like segment utilization, number of users on the system, network throughput, latency, and the number of errors or collisions. Primary indicators are most closely related to the actual activities of the business, and tertiary indicators are the most removed. Metrics beyond the tertiary level, like low-level disk I/O measurements or the jabber rate on a network, are distant from the core business and may confuse the performance analysis.
Analyzing the Data
After all the information is collected, data analysis should drill down from primary to secondary to tertiary data or, if there are no primary indicators, from secondary to tertiary. For example, if the secondary indicator is Web traffic, the analyst could look at patterns of Web traffic over time. From there, he or she could select a period--a typical bottleneck time, perhaps--and identify users who surf the Web at that time. Then, the analyst could look at overall network performance over time to determine the effect of the Web application on the network. By comparing the Web traffic with overall traffic, we can see the impact of the Web on the network. The analyst can then decide if there is enough network infrastructure to support the Web traffic, and if not, decide on a plan to improve the infrastructure (for going to a switched environment or higher speed media) to ensure that overall network quality is not affected.
The tools available to gather information on network performance include standalone monitoring devices as well as embedded agents in hubs, routers and switches. Standalone or embedded tools can be further divided into hardware and software categories, each with distinct and different purposes.
Hardware tools reveal information such as which parts of the network are up and which are down, who is using the network and for what applications, network traffic patterns, who and where the top users are and to whom they are "talking." Software tools provide automated network performance management, network security monitoring, fault management and availability, capacity planning and more.
In both standalone and embedded monitoring tools, the underlying technology is based on standards like the management information base II (MIB II), remote monitoring MIB version 1 (RMON1), and RMON2.
MIB II offers the most fundamental remote monitoring capability, and RMON2 offers the most sophisticated. Typically, some level of RMON capability is embedded in hubs and switches; stand-alone instruments can be used to complement embedded functionality.
RMON1 focuses on the bottom two layers of the network stack. That makes RMON economical for vendors to embed in hubs and switches, because the device collects MAC-layer data only. However, RMON does not provide protocol information or application-layer data or show client/server communications across the network.
RMON2 takes up where RMON leaves off. RMON2 offers application- and network-layer information. In addition, it can reveal IP conversations and reveal which clients are talking to which servers across the network, which RMON cannot. RMON2 also addresses concerns about security. On the downside, RMON2 requires much more processing power than RMON, which makes it more costly to embed in hubs and switches. Many vendors have consequently delayed implementation of RMON2 or used limited implementations in their products. Thus, network managers may not benefit from the full value of RMON2, even if their hubs and switches supposedly support the standard.
Still, for organizations looking to get a complete picture of intranet performance, standalone RMON2 probes may well be the way to go. With RMON2's application-layer monitoring capabilities, it's possible to keep track of who's requesting Web pages and which pages they request. While Web servers offer similar capabilities, RMON2 goes a step further: Its built-in alarm facility tips off network managers to unauthorized access attempts as they happen. Other RMON2 data also offers a much more complete picture of performance than does any one server in the intranet.
---------------------------------------------------------------------------------------------------------------------------------
SIDEBAR
A Four-Step Crash Course
Measuring intranet performance involves hundreds or thousands of variables, but the overall process can be summed up in just four steps. Here are some key questions to ask at each step.
2. Understand the Business Case
3. Determine when, where, and what to measure
4. Analyze the Data
--G.V.F. and C.W.
Reprinted with permission by Data Communications Magazine, a McGraw-Hill Company.