|
|
|||||||||||||||||||||
Scott H. Davis
Technical Director, Clusters for Windows NT
February 1995
Version 2.0
Momentum is growing for the client-server local-area network (LAN) computing model, in which desktop PCs and workstations access and rely on services provided by specialized server systems. Many firms are upsizing their existing PC LAN environments to take advantage of the higher capacity and increased performance and reliability of new-generation server systems. Similarly, many corporate IT departments are downsizing their operations to exploit client-server's flexibility and access to low-cost commodity hardware and software products.
As organizations deploy client-server solutions, each confronts issues of system reliability and cost-effective growth paths that are critical to supporting the user community and running the business. Digital's cluster technology on Microsoft® Windows NT addresses these concerns by increasing the availability, scalability, and management of data and key services in client-server LAN environments.
A cluster is a loosely-coupled set of systems that behaves (is addressed and managed) like a single system, but provides high levels of availability through redundant CPUs, storage, and data paths. Clusters are also highly scaleable, meaning that CPU, I/O, storage, and application resources can be added incrementally to expand capacity efficiently. For customers, this translates to reliable access to system resources and data, and investment protection of both hardware and software. Digital's Clusters for Windows NT provides these valuable attributes in a manner compatible with the Windows NT operating system and consistent with the requirements of client-server LANs.
In a typical client-server LAN, a single server system provides file, print, and/or application services to a group of desktop clients. In a cluster client-server configuration, the notion of a single server serving clients is extended to include multiple server systems. The collection of servers, called a cluster, works as, and is viewed by clients as, a single server system. This is accomplished by way of cluster software, which performs the management, integration, and synchronization of the cluster members.
Like the single-server environment, a cluster provides a single security and management environment. Clients can also view resources and services in the cluster as if they were local. A major advantage of a cluster LAN is the ability to add system components incrementally to both expand server capacity and to build in component redundancy for higher availability.
Windows NT is an excellent LAN server platform, delivering the capabilities and performance customers need to run client-server applications using standard, low-cost hardware and software components. Digital's Clusters for Windows NT is a perfect complement to Windows NT, taking advantage of its power, ease of operation, manageability, broad application base, and easy integration with existing LAN infrastructures.
Digital has designed Clusters for Windows NT software from the ground up to bring the benefits of distributed availability and scalability to client-server workloads and usage models. It is not simply a port of existing VMS cluster technology. The Cluster design is also architecturally compatible with the Windows NT operating system. It uses standard, extensible Windows NT interfaces and mechanisms, such as Windows NT's layered driver and network architecture. As a layered implementation, cluster software is isolated from core Windows NT components, yet it integrates seamlessly with the Windows NT environment. For example, management utilities work unchanged with clusters, and the Windows NT network browsers transparently show cluster "shares" or resources.
The design for Digital's Clusters for Windows NT focuses on providing a server solution in which a clustered set of servers exports resources and services to heterogeneous clients, including PCs, Macintoshes®, and workstations. The cluster does this through support of industry standard communications interconnections and protocols. The cluster design also enables clients to view the cluster as a single server environment, and to access resources transparently, as if the resources were local to the client desktop.
The cluster itself is made up of Windows NT Server systems, including Intel® 486/Pentium® and/or Alpha AXP processors, running cluster software. Within the cluster, there is component redundancy in various dimensions (CPU, storage, communications, and so forth) that serves to increase availability of the clustered server environment. For example, Clusters for Windows NT supports multiple hosts for SCSI storage, which enables customers to increase storage availability by configuring primary and backup paths to SCSI storage resources. In the event of a failure in, for example, the primary server or its SCSI controller, cluster software will automatically failover access to the requested resource by way of the backup path. Similarly, clusters support redundancy in network controllers or other single points of failure that can affect the availability of resources served by the cluster. [Editor's note: We understand "failover" (in the noun form) to mean a fallback mechanism--a combination of "fail-retry-override" that allows a system to be considered fail-safe in some measure; in the verb form it seems to refer to the process of renegotiating access to a denied or failed resource.]
A major advantage of clusters over fault-tolerant solutions is that fault-tolerant systems, while providing extremely high levels of availability, typically employ passive standby components that remain idle until a failure occurs. This approach is very expensive, especially considering that the duplicate components go virtually unused. In a cluster, high availability is achieved using active backup subsystems. These backup subsystems perform normal, routine functions, and are themselves primary servers for a given set of cluster resources. The cluster approach enables customers to get the most out of their computing resource investments, and still achieve high levels of availability.
With Clusters for Windows NT, customers can protect their current and future investments in hardware and software. Clusters for Windows NT integrates well into existing LAN infrastructures by supporting standard LAN protocols and standard desktop platforms. For example, customers can augment their existing Novell® NetWare® LANs by integrating a cluster of Windows NT servers. Windows NT customers can also build clusters using their existing Windows NT server platforms. Over time, they can easily expand their server capacity by simply adding additional CPU, storage, I/O, and software as needed. The scalability of clusters eliminates the need for expensive platform migrations, which often involve the replacement of the current operating system and all related application software. Finally, Clusters supports all Windows®-based applications unchanged and out-of-the-box, protecting existing and future investments in software.
Clusters for Windows NT offers a cost-effective software solution for delivering highly available, scalable resources in a client-server LAN. It is fully compatible with Windows NT, and it supports a wide range of industry standard components. For customers who choose to run their enterprise on Windows NT, Digital's Clusters for Windows NT offers the reliability and investment protection they require.
Windows NT is designed from the ground up to bring the benefits of distributed availability, scalability, investment protection, and manageability to client-server workloads and usage models. Client-server applications are typified by a functional decomposition of the technology implementation between machines. It is inherently asymmetric in that the workload is divided up into separate, discrete units that execute on different machines. Digital's Clusters for Windows NT technology both assumes this paradigm as its application model and exploits this design internally.
This model can be contrasted with the original OpenVMS VAXcluster model, which was designed for symmetric, time-sharing workloads and applications. As a result, the Clusters for Windows NT design excels at providing these cluster benefits to the commodity PC-LAN environments and the expected asymmetric configurations.
The Clusters for Windows NT design is architecturally compatible with Windows NT; standard, extensible Windows NT interfaces and mechanisms are used. Windows NT has a layered driver and network architecture that readily lends itself to extending functionality through additional drivers inserted into the I/O stacks. Management utilities work unchanged with clusters. For example, the Windows NT Network Browsers transparently show Cluster shares or resources. There are minimal dependencies on Windows NT kernel changes. This is a truly layered cluster architecture, designed to be isolated from core Microsoft components.
The cluster product uses fully designed APIs in Windows NT and exports its own similarly designed APIs. This minimizes dependencies on operating-system-specific features, and provides better support of cluster-aware application development.
Clusters for Windows NT is designed to provide a server solution. A clustered server exports capabilities to a heterogeneous clients.
The Clusters for Windows NT design utilizes industry standards. The processors can be by Intel or RISC, the interconnections are popular LAN technologies, and the transport protocols are industry standards, such as TCP/IP and IPX/SPX. The design is modular and extensible, using a building-block approach. It is Digital's intention to add functionality over time, without disturbing the base cluster product.
The Clusters for Windows NT hardware design center is driven by the goal of providing cluster technology on commodity, industry standard hardware. This design goal is exhibited across all aspects of the hardware strategy, including processors, interconnections, and storage.
There is nothing in the Clusters for Windows NT design that ties it to a given processor architecture. Support of both the Intel and Alpha processor families within the same cluster is expected to be present in version 1.0. Other RISC processor architectures will be considered in future versions and is largely a qualification exercise. Of course, all processors in a Windows NT cluster must be running the Windows NT operating system.
Clusters for Windows NT takes advantage of optional hardware assistance where present and useful. Any such attributes are transparently utilized when present. An example of hardware in this category is the Memory Channel or Reflective Memory Interconnect technology.
There are two types of interconnections involved in cluster technology: processor-to-processor interconnection, and a processor-to-storage interconnection. The original VAXcluster model utilized the CI for both purposes.
For processor-to-processor communication, the Windows NT Cluster uses native Windows NT transports for its communication requirements. These include the Internet suite of protocols, such as TCP/IP, the Novell NetWare transports, IPX/SPX and the Microsoft LAN manager technology. These transports are utilized over industry standard hardware, such as Ethernet, FDDI, ATM, Token Ring, and others.
Clusters for Windows NT can also make use of a high-speed, low-latency network interconnection, known as Reflective Memory or Memory Channel. This interconnection presents a shared memory style interface over a PCI adapter. It is transparently utilized like any other network connection by the cluster software.
The Clusters for Windows NT storage strategy is open and flexible. It begins with dual-host, parallel SCSI storage. This will evolve as storage technologies evolve. We expect future clustered storage to utilize the serial SCSI architectures, such as SSA and Fibrechannel, or even some type of LAN-based storage. Clusters for Windows NT will continue to utilize commodity storage.
Client-server technology is about functionally decomposing an application or solution between systems. It typically entails a client user interface initiating an operation and utilizing services provided by one or more server systems. Clusters for Windows NT is fundamentally about exporting the same service or resource from the multiple systems that constitute the cluster.
The software design center for Clusters for Windows NT is the partitioned data model. The partitioned data model means that the workload is split up into segments, and that each segment is locally controlled on a member of the cluster. A contrasting model is the shared everything model, in which a monolithic workload is executed across multiple systems with a distributed control scheme. Note that Clusters for Windows NT is capable of supporting both models, particularly in the database arena. With either model, the cluster presents a coherent set of resources from what appears to be a single system.
Clusters for Windows NT denotes a single, coherent name space for its served resources. The architecture supports one or more servers exporting the same resource(s) at a given time.
Like Windows NT itself, the cluster APIs are open and extensible. Cluster-aware applications can be built in both kernel or user mode.
Clusters for Windows NT is a server-oriented solution. Clients are not considered to be members of the cluster, even though they benefit from its capabilities. A unique aspect of the Clusters for Windows NT design is that clients always talk directly to the best server for a particular resource. Data never travels multiple hops through intermediate systems with this design. The client talks directly to a controlling server for the resource.
Scalability is achieved in a partitioned data model by dividing the workload among the server systems at a fine enough granularity to achieve a balanced workload. In version 1.0, that granularity is one share.
Availability is achieved in this model by way of failover to a replicated path. Context replication can be utilized to varying degrees to provide different levels of seamlessness in failover scenarios.
Clusters are typically associated with a common management and security domain. These capabilities are present in other products in the Windows NT product set. Common account management and security is provided by the Windows NT Server Domain capability. Common software distribution is achieved by way of the SMS product. Clusters for Windows NT management utilizes these capabilities and adds to it a set of tools, extensions, and wrappers to enable the cluster to be managed as a single system.
Cluster management concentrates on graphical user interface (GUI)-based management tools and techniques. Cluster management is not intended to solve all system administration issues, but is designed to extend single-system management functions to the cluster, and to provide tools to manage the clustered resources and services. To provide local management of cluster resources within the cluster, we will build a centralized Windows-based management solution.
In parallel, .DLL extensions to various Windows NT utilities with cluster ramifications will be added. For example, Cluster shares are manipulated from File Manager. Management functions that affect all nodes in the cluster are remote-procedure-called automatically to the other nodes.
Lastly, cosmetic modifications or wrappers for basic Windows NT utilities for clustering may be made. An example of this would be a cluster-wide event viewer.
Two cluster data access models prevail in the industry today: the shared disk model and the partitioned data model. Conceptually, the shared disk model can be thought of as symmetric--the same workload synchronized across multiple systems executing in parallel. The partitioned data model is asymmetric--the workload is decomposed into functionally separate units of work, that are performed by different systems in an independent manner.
With respect to physical hardware, the shared disk model is characterized by symmetric access: All processors have equal access to the storage elements. Contrast this with the asymmetric access paths associated with the partitioned data access scheme.
From the software perspective, the shared disk model involves distributed control algorithms, typically synchronization primitives such as distributed lock managers and a shared disk. The partitioned data model involves partitioning of the workload and failover primitives.
The Clusters for Windows NT File System supports the native Windows NT file systems transparently and seamlessly. Utilizing the partitioned data model, each file system partition is exported and controlled by a single cluster member at any instant in time. The partitions are divided among the systems in the cluster.
For high-end configurations, additional I/O scaling can be achieved through distributed striping technology. This means that below a file system partition on a single node, that partition can actually be a stripe set whose physical disks span multiple cluster members.
Clusters for Windows NT supports both prevalent parallel database models in order to achieve both compute and I/O scaling.
The partitioned data database model is characterized by SQL queries decomposed into sub-functions, which are then distributed across the proper nodes in the cluster. This allows for autonomous control of the database segments by the individual systems.
The shared disk database model views the database as a monolithic entity that is directly accessible from multiple systems. It requires a real or emulated shared disk capability and a distributed lock manager for cross-system synchronization of colliding operations.
The following illustration presents a high-level view of the Clusters for Windows NT architecture.
Elements include the following:
All applications benefit from clustering. More specifically, there are two types of benefits:
Databases gain availability benefits as well. The cluster can be utilized to failover access to the database or its components. The result is continued service in the presence of failures.
Clusters provide a robust suite of APIs for building available, scalable applications. Broadly, these APIs include:
The following sections contrast Clusters for Windows NT with various availability and scalability solutions such as fault tolerance, RAID, and Symmetric Multi-Processing. A key point of these comparisons is to show that Clusters is a solution that provides high system-level availability and scalability.
RAID is very popular in the marketplace today as a technology for storage availability. It is primarily a disk availability solution that protects against disk failures. However, RAID solutions are limited in the availability they provide, because the controller or server for a RAID set represents a single point of failure.
The Clusters for Windows NT product complements subsystem-level solutions such as RAID, in that it adds system-level availability by eliminating all single points of failure.
Fault-tolerant computing refers to systems that provide non-stop, 24-hour-a-day, 7-day-a-week availability. This is usually achieved by configuring a complete, mirrored backup of the primary system. This backup is usually in "hot standby" mode, meaning that it is not adding any capacity to the system until a failure occurs and its capacity is needed. Because the backup system is not active and there is usually significant processing needed to mirror state to the backup system, fault-tolerant solutions do not typically scale; two systems provide one system's capacity.
Although clustering does not provide non-stop availability, it does provide very high levels of availability at a very low cost, using industry-standard, commodity components. In a cluster, component redundancy is important for availability, but this does not mean that a complete mirrored set of systems is required. For example, in Digital's Clusters for Windows NT design, storage is shared and access is balanced between systems. This is a major advantage of clustering over fault-tolerant systems.
Another major advantage of clusters over fault-tolerant solutions is that clusters scale, because all functional systems are used for application work. Also, the amount of system synchronization effort is significantly less than required for fault-tolerant solutions, leading to a more scalable solution. Two-node clusters typically approach two systems' worth of capacity in steady state situations. The cluster approach enables customers to maximize their computing resource investments, and still achieve high levels of availability.
Symmetric Multi-Processing is a processor scalability solution that allows for the extension of processing capacity through the addition of processors. This is a tightly coupled arrangement in which the processors share memory. This tight coupling with shared memory, along with other factors, such as the type of operating system and applications, imposes limits on the scalability of SMP systems.
Clusters is a system scalability solution that is composed of loosely coupled systems, each with its own dedicated memory. Clusters allows for broader scaling in multiple dimensions, including CPU, Storage, and I/O capacity.
It should be noted that SMP and Clusters also differ in the application programming model. SMP can be characterized as multi-threading utilizing shared memory programming constructs. Clusters can be characterized as a message-passing programming paradigm.
This section outlines specific plans for Digital Clusters for Windows NT version 1.0.
Our plan for version 1.0 of Digital's Clusters for Windows NT is to deliver highly available, highly scalable file server capabilities. Specifically, the version 1.0 product will offer the following attributes:
The SDK will include a variety of development tools, documentation, sample code, and build procedures intended to aid developers in their efforts.
Follow-on versions of the cluster product will add additional capabilities, extend the Clusters architecture, and incorporate new technologies such as "Cairo" (the next generation of the Windows NT operating system).
A major milestone for the Clusters for Windows NT project was Digital's demonstration of the technology at the Fall 1994 COMDEX event in Las Vegas. In fact, Digital Clusters for Windows NT won the Byte Magazine award for Most Significant Technology at this show. The Most Significant Technology award is for the technology predicted to have the greatest impact on the industry in the coming years.
The COMDEX Cluster configuration consisted of a three-node mixed processor cluster. There were shared SCSI buses for redundant storage paths and both major Windows NT file systems were in use. PC clients accessed services from the cluster in the demonstrations.
The COMDEX demo showed three main areas of clusters:
|
|
|||||||||||||||||||||