Thursday, January 19, 2006

Clustering (setup and configuring)


Cluster Configuration

Objectives
After completing this lab, we will be able to:
· Configure the cluster service.
· Verify Cluster service installation.
· Do the most important maintenance tasks.

Estimated time to complete this lab: 240 minutes




Exercise 1Installing Cluster Service

Goal
In this exercise, the student will install Cluster service on nodes DCSggs1, DCSggs2, DCSggs3 and DCSggs4.

Tasks

Detailed Steps
1. Configuring the first node.

Note
During installation of Cluster service on the first node, all other nodes must either be turned off or stopped prior to Windows 2000 booting. All shared storage devices should be powered up.

a. Click Start, click Settings, and click Control Panel.

b. Double-click Add/Remove Programs.

c. Click Add/Remove Windows Components.

d. Select Cluster Service. Click OK.

e. Ask your instructor for the location of the Cluster service files. Click OK.

f. Click Next.

g. The Cluster Service Configuration Wizard dialog box appears. Click I Understand to accept the condition that Cluster service is supported on hardware from the Hardware Compatibility List only.

h. Because this is the first node in the cluster, you must create the cluster itself. Select The first node in the cluster, and then click Next.

i. Enter CLUSTERgg for the cluster name, and click Next.

j. Type DCSUSERgg for the user name of the cluster service account. Specify the correct password. Type DCS for the domain name, and click Next.

Note
You would normally provide a secure password for this user account.

k. The Cluster Service Configuration Wizard will validate the user account and password. Click Next.

2. Configuring cluster disks.

Note

By default, all SCSI disks not residing on the same bus as the system disk will appear in the Managed Disks list. Therefore, if the node has multiple SCSI buses, some disks may be listed that are not to be used as shared storage (for example, an internal SCSI drive). Such disks should be removed from the Managed Disks list.

l. The Add or Remove Managed Disks dialog box specifies which disks on the shared SCSI bus will be used by Cluster service. Add or remove disks as necessary and then click Next.

m. The first partition of the first disk is selected as the quorum resource by default. Change this to denote the quorum disk (in this lab, drive F). Click Next.

Note
The order in which the Cluster Service Configuration wizard presents networks varies.

n. Click Next in the Configuring Cluster Networks dialog box.

o. Make sure that the network name and IP address correspond to the network interface for the public network. The IP address for the public network is 192.168.x.gg0.

p. Check the box Enable this network for cluster use.

q. Select the option All communications (mixed network).

r. Click Next.

s. The next dialog box configures the private network. Make sure that the network name and IP address correspond to the network interface used for the private network. The IP address for the private network is 10.0.x.gg0.

t. Check the box Enable this network for cluster use.

u. Select the option Internal cluster communications only.

v. Click Next.

w. In this lab, both networks are configured in such a way that both can be used for internal cluster communication. The next dialog window offers an option to modify the order in which the networks are used. Because Private Cluster Connection represents a direct connection between nodes, it is left at the top of the list. In normal operation this connection will be used for cluster communication. In case of the Private Cluster Connection failure, cluster service will automatically switch to the next network on the list – in this case Public Cluster Connection. Make sure the first connection in the list is the Private Cluster Connection and click Next.

Note
Always set the order of the connections so that the Private Cluster Connection is first in the list.

x. Enter the unique cluster IP address (192.168.x.gg4) and Subnet mask (255.255.255.0), and click Next. The Cluster Service Configuration Wizard automatically associates the cluster IP address with one of the public or mixed networks. It uses the subnet mask to select the correct network.

y. Click Finish to complete the cluster configuration on the first node. The Cluster Service Setup Wizard completes the setup process for the first node by copying the files needed to complete the installation of Cluster service. After the files are copied, the Cluster service registry entries are created, the log files on the quorum resource are created, and the Cluster service is started on the first node.

z. A dialog box appears telling you that Cluster service has started successfully. Click OK. When the Windows Components complete installation and when prompted, click Finish.

3. Configuring the other nodes.

Note
For this section, leave the first node and all shared disks powered on. Power up the other nodes. Follow the same procedures used for installing Cluster service on first node, except with the differences in the following steps.

aa. In the Create or Join a Cluster dialog box, select The second or next node in the cluster, and click Next.

bb. Enter the cluster name that was previously created and click Next.

cc. Leave Connect to cluster as unchecked. The Cluster Service Configuration Wizard will automatically supply the name of the user account selected during the installation of the first node. Always use the same account used when setting up the first cluster node.

dd. Enter the password for the account (if there is one) and click Next.

ee. At the next dialog box, click Finish to complete configuration.

ff. The Cluster service will start. Click OK. Click Finish.

gg. Close Add/Remove Programs.

Exercise 2To Verify Cluster Service Installation

Goal
In this exercise, the student will verify that Cluster service has been installed by checking to see if the Cluster service installation process wrote the correct entries to the registry.

Tasks
Detailed Steps
1. To verify the registry entries.

a. On both nodes, from Administrative Tools, start the Cluster Administrator utility.

b. After Cluster Administrator connects to the cluster, the state of both nodes should be up.

c. Open Regedt32. Look for HKEY_LOCAL_MACHINE\Cluster to confirm that the cluster hive has been added to the registry. Look in WinNT Cluster directory Clusdb.

2. Verify disk failover works.

d. In Cluster Administrator, right-click a disk group and select Move Group.


Exercise 3The Cluster Registry

Goal
In this exercise, the student will examine the cluster registry on each node to identify how cluster configuration is stored locally on each node.

Tasks

Detailed Steps
1. To view the cluster registry

a. Run Regedt32.exe.

b. View the HKEY_LOCAL_MACHINE\Cluster key.

c. Write down the names of the subkeys stored here.

d. Select the Quorum subkey.

e. What is the Path set to?

f. What does the Resource value in the Quorum subkey represent?

g. Note the first eight characters of the Resource value in the Quorum subkey.

h. Select the resource in HKEY_LOCAL_MACHINE\Cluster \Resources that begins with these eight characters.

i. View the values stored in this key.

j. Select the Parameters subkey.

k. Which values are stored here?

l. Examine the Parameters subkey for each resource. Which resource has no subkeys under the Parameters subkey?

m. Look at the Resource Types key. What default information is stored here for all resources of a specific type?

n. Where in the cluster registry are the node IP addresses stored?

o. Where are the subnet IDs stored?

p. Which network has the lowest priority (highest number)?

q. What type of cluster communication will this network be used for? How can this be told from the registry?

r. What do the Address and ClusnetEndpoint values identify in the subkeys for the network interfaces?

s. According to the data in the Nodes key, which node has NodeID 1? Which has NodeID 2?





Exercise 4Creating a Group

Goal
In this exercise, the student will create a new group in the cluster. This group will be used to create a cluster file share.

Tasks
Detailed Steps

1. To connect to the cluster, and then create a group.

a. Start Cluster Administrator.

b. To create a new group, on the File menu, click New Group.The New Group dialog box appears.

c. In the Name box, type File Share Group, and then in the Description box, type Group for file share resources.

d. Click Next.The Preferred Owners dialog box appears.

e. Click Finish.A Cluster Administrator message appears, indicating that the group was successfully created.

f. Click OK.

g. Right-click on Group and select Bring online.


Exercise 5Transferring a Resource

Goal
In this exercise, the student transfers a resource to the new group. The resource is being transferred so that resources can be created that are dependent upon the disk resource.

Tasks

Detailed Steps

1. To move the disk resource to the new group.

a. In Cluster Administrator, click the Resources folder.

b. Right-click the non-quorum disk and then select Change Group.A popup menu appears that lists the groups available in the cluster.

c. Click the File Share Group.A dialog box appears to verify moving the resource.

d. Click Yes.

e. The disk resource is now shown as part of the File Share Group in the Group column. Having the disk resource as part of the group will allow addition of resources to the Cluster Printer Group that have a dependency on a disk resource.



Exercise 6Creating an IP Address Resource

Goal

In this exercise, the student will create another IP address resource in the cluster. This IP address resource will be used when creating a file share resource.

Tasks
Detailed Steps
1. To create an IP address resource.

a. In Cluster Administrator, click the Groups folder.

b. Under Groups, right-click File Share and then click New.A pop-up menu appears, listing Group and Resource.

c. Click Resource.The New Resource dialog box appears.

d. In the New Resource dialog box, enter the following information:Option Use…Name File Share IP addressDescription IP address for the File Share Resource type IP addressGroup File Share

e. Click Next.The Possible Owners dialog box appears.

f. By default, both nodes should be in the Possible Owners list. If not, add them and click Next. The Dependencies dialog box appears.

g. Verify that the Resource dependencies box is blank, and then click Next.The TCP/IP Address Parameters dialog box appears.

h. In the IP Address Parameters dialog box, enter the following information:Option Use…Network to use Public networkAddress 192.168.x.ggy, where y is a number not yet used in the final octet of any other IP address on the cluster.Subnet mask 255.255.255.0

Caution
Only use the address and subnet mask shown in the preceding step when using the simulations provided with the course materials. If configuring the Cluster service software on a real cluster, ask the instructor or network administrator what values to use.

i. Click Finish.A message appears, indicating that the printer IP address resource was successfully created.

j. Click OK.Now that an IP address resource has been created, it is possible to add a network name resource to the group.

k. Bring the IP address online.



Exercise 7Creating a Network Name Resource

Goal
In this exercise, the student will create another network name resource in the cluster. This new network name will also be used when creating a cluster printer.

Tasks
Detailed Steps

1. To create a network name resource

a. In Cluster Administrator, click the Groups folder.

b. Under Groups, right-click File Share and then click New.A pop-up menu appears listing Group and Resource.

c. Click Resource.The New Resource dialog box appears.

d. In the New Resource dialog box, enter the following information:Option Use…Name File Share nameDescription Network name for the file shareResource type Network nameGroup File Share Group

e. Click Next.The Possible Owners dialog box appears.

f. By default, both nodes should be in the list of Possible Owners. If not, add them and then click Next.

g. In the Resource dependencies box, add the Printer IP address resource and then click Next.The Network Name Parameters dialog box appears.

h. Type a name for the virtual print server, e.g. VSRVggFILE, and then click Finish.A message will appear, indicating that the File Share name resource was successfully created.

i. Click OK.There are now all of the resources in the group to create a cluster printer.

j. Bring the network name online.




Exercise 8Creating a File Share Resource

Goal
In this exercise, the student will create a file share resource in the cluster.

Tasks
Detailed Steps

1. To create a file share resource

a. Use Windows Explorer to create a \Test folder on the root of the non-quorum drive
G:.Connection should now be possible to \testshare">\\\testshare.

b. In Cluster Administrator, click the Groups folder.

c. Under Groups, right-click File Share and then click New.A pop-up menu appears, listing Group and Resource.

d. Click Resource.The New Resource dialog box appears.

e. In the New Resource dialog box, enter the following information:Option Use…Name Test ShareDescription Test file shareResource type File ShareGroup File Share

f. Click Next.The Possible Owners dialog box appears.

g. By default, both nodes should be in the list of Possible Owners. If not, add them and then click Next.

h. In the Resource dependencies box, add the non-quorum disk resource, and then click Next.The File Share Parameters dialog box appears.

i. In the File Share Parameters dialog box, enter the following information:Option Use…Share Name TestSharePath :\TestComment Test share for the cluster

j. Click Finish.A message appears, indicating that the resource was successfully created.

k. Click OK.A file share has now been created on the cluster.

l. Bring the file share online.

m. Close Cluster Administrator.

Exercise 9Creating a Generic Application Resource

Goal
In this exercise, the student will create a generic application resource in the cluster.

Tasks

Detailed Steps

1. To create a generic application resource

a. Copy Notepad.exe from the Windows 2000 installation directory to the root of the non-quorum drive G:.

b. In Cluster Administrator, click the Groups folder.

c. Right-click the group containing the non-quorum disk resource, and then click New. A pop-up menu appears, listing Group and Resource.

d. Click Resource. The New Resource dialog box appears.

e. In the New Resource dialog box, enter the following information:Option Use…Name Generic applicationDescription Generic application resource for the clusterResource type Generic applicationGroup File Share

f. Click Next. The Possible Owners dialog box appears.

g. By default, both nodes should be in the list of Possible Owners. If not, add them and then click Next.

h. In the Resource Dependencies box, add the non-quorum disk, and then click Next. The Generic Application Parameters dialog box appears.

i. In the Generic Application Parameters dialog box, enter the following information:Option Use…Command Line G:\notepad.exeCurrent Directory G:
j. Click Allow application to interact with desktop, and then click Next. A dialog box appears, which can be used to configure registry replication.

k. Click Finish. A message appears indicating that the resource was successfully created.

l. Click OK. Right-click and choose Bring online.

m. Close Cluster Administrator.



Exercise 10Configuring Failover Properties

Goal

In this exercise, the student will configure and test the failover properties for the generic application resource created in the previous exercise.

Tasks

Detailed Steps

1. To configure failover properties.

a. In Cluster Administrator, view the properties of the generic application resource created previously.

b. Make sure that both nodes are possible owners of the resource. (If the group containing the resource does not failover correctly to the other node, repeat this for all other resources in the group.)

c. From the Dependencies tab, note the resources upon which this resource depends. These resources must be online before this resource can be brought online.

d. View the Advanced tab. Will this resource be restarted automatically?

e. Does failure of this resource affect the group? If so, when will the group be affected and a failover initiated?

f. What are the default values for the LooksAlive and IsAlive polling intervals? Where do these values come from?

g. Close the Resource Properties dialog.

h. Take the resource offline. This can be done from Cluster Administrator on either node.

i. Does Notepad restart automatically?

j. Bring the resource online in Cluster Administrator.

k. Right-click the File Share group. Select Properties.

l. Select the Failover tab. Change the threshold to two. Click OK.

m. Close Notepad on the node on which it is currently running but not by taking it offline in Cluster Administrator. Does Notepad restart automatically?

n. Repeat step m until Notepad appears on the other node. Did the whole group failover successfully to this node?

o. To which node did the resource failover and why?

p. Close Notepad again on the node on which it is currently running but not by taking it offline in Cluster Administrator. What happens?

q. Repeat step o until the Notepad resource fails. Can it be brought online? Explain.

r. How can the counter be reset for resource failures that have exceeded their threshold?

s. Repeat step m again until the restart threshold has been exceeded. Does the resource failover or does it go back offline? Explain.






Exercise 11Configuring Failback Properties

Goal

In this exercise, the student will configure the group containing the non-quorum disk to failback to one of the nodes. The student will trace the communication between nodes using Network Monitor.

Tasks

Detailed Steps

1. To configure failback properties.

a. Make sure that the group containing the non-quorum disk is online on the node that will be its preferred owner and that both nodes are running.

b. In Cluster Administrator, view the properties of the group containing the non-quorum disk.

c. In the General tab, click the Modify button and add this node to the list of preferred owners.

d. In the Failback tab, allow failback immediately.

e. Click OK to close the group properties dialog.

f. Start Network Monitor and begin to capture a trace of traffic between the nodes.

g. Initiate failover by shutting down the preferred owner node. Shut down gracefully by stopping the Cluster service or shutting down the server, or simulate a server failure by turning the power off to the node, or using the Emergency Shutdown (press ctrl-alt-del, keep pressing the ctrl key and click Shutdown). Or remove the network connections between the nodes if the other node currently owns the quorum resource.

h. Does the group come online on the remaining nodes?

i. Restart the other node. Does the group move back to this node?

j. Stop and view the capture in Network Monitor.



Exercise 12Registry Checkpointing

Goal

In this exercise, the student will create a new generic application resource. The student will add a registry key for the resource and configure the key to be replicated between nodes in the cluster. The student will examine the file structures used for registry checkpointing on the quorum resource and load a checkpoint file into Regedt32.

Tasks

Detailed Steps

1. To investigate registry checkpointing

a. Following the steps outlined previously, create a new generic application resource, using e.g. Calc.exe as the executable file placed on a shared SCSI disk.

b. Bring the resource online.

c. Open Regedt32 on the node on which the new resource is running. In HKEY_LOCAL_MACHINE\Software, add a key MyCorp and then a subkey MyApp in MyCorp. Add a value. Choose a name and data type, e.g. Watched, REG_SZ. Set the value to “true.”

d. In Cluster Administrator, in the properties dialog for the resource, click the Registry Replication tab then click the Add… button.

e. Type the root path below HKEY_LOCAL_MACHINE for the registry key that will be replicated between nodes. Following the example in step c, enter software\MyCorp\MyApp.

f. Click OK to close the property dialog then bring the resource online.

g. Check HKEY_LOCAL_MACHINE\Software on both nodes. Do they both have the MyCorp subkey?

h. In Cluster Administrator, move the group containing the generic application resource to the other node.

i. Check HKEY_LOCAL_MACHINE\Software on both nodes. Do they both have the MyCorp subkey?

j. Using Regedt32 on the node on which the generic application resource is running, make a change to the Watched value. Does this change get replicated immediately to the other node?

k. Move the group to the other node. Has anything happened to either registry? What can be deduced from this about when the registry keys are replicated?

l. On the node which owns the quorum resource, look in the \MSCS directory. There should be a subdirectory that has recently been created. Where does the directory name come from?

m. Which files are present in the subdirectory?

n. In Regedt32, select HKEY_LOCAL_MACHINE then choose Load Hive from the Registry menu. Open the .CPT file in the subdirectory under \MSCS. Give it the name “checkpoint.”

o. What is stored in the .CPT file in the subdirectory of \MSCS on the quorum resource?



Exercise 13Viewing Cluster Heartbeats

Goal
In this exercise, the student will use Network Monitor to view the cluster heartbeat traffic between the nodes of a cluster.

Tasks

Detailed Steps

1. To view cluster heartbeats

a. Launch Network Monitor.

b. Start a capture over the private network.

c. Wait for thirty seconds and then stop and view the capture.

d. How many heartbeats are in the capture?

e. How many heartbeats did each node send?

f. What is the time interval between successive heartbeats sent by each node?

g. Which transport were the heartbeats sent on?To which destination port number?




Exercise 14Viewing the Quorum Log

Goal

In this exercise, the student will view the cluster configuration changes stored in the quorum log while one node is offline. The student will then shut down the remaining node and reboot the other to see if the changes it missed are applied from the quorum log. The student must shut down the remaining node so that the changes are read from the log and not via RPC to an existing node.

Tasks

Detailed Steps

1. To activate quorum logging

a. On the node that currently owns the quorum, open Explorer and view the contents of the \MSCS folder on the quorum drive. Note the name and modified timestamp of the checkpoint and log files.

b. Shut down the other nodes in the cluster.

c. Did anything happen in the \MSCS folder on the quorum drive?

d. On the remaining node, make a change to the cluster configuration, e.g. change a resource name or description, add a resource or group, etc.

e. Did anything happen in the \MSCS folder on the quorum drive?

2. To view the contents of the quorum log file

f. Open the file quolog.log in Notepad.

g. Can any data be read here? If so, what does it represent?

h. Are the changes just made present in the log file? If not, wait a few minutes and check again.

i. Make a note of the final meaningful entries seen in the file.

3. To apply the changes from the log to a node forming a cluster

j. Shut down the remaining node.

k. Restart the other node.

l. When the Cluster service has started, open Cluster Administrator.

m. Are the changes made while this node was offline visible?

4. To view the contents of the quorum log file

n. Open the file :\MSCS\quolog.log in Notepad.

o. Has anything been added to the file since the final entries were noted above? Can this observation be explained?

p. Do not reboot the other nodes at this time.





Exercise 15Viewing the Cluster Database Checkpoint File in Regedt32

Goal

In this exercise, the student will use the Load Hive command in Regedt32 to examine the contents of the checkpoint file.

Tasks

Detailed Steps

1. To view the current sequence number in the cluster configuration database

a. In Explorer, view the contents of :\MSCS.

b. What is the name and modified time of the latest checkpoint file?Are these the same as the ones recorded in the previous exercise?

c. Open Regedt32 and select HKEY_LOCAL_MACHINE\Cluster. What is the RegistrySequence value?

d. Compare this with the number in the latest checkpoint file name, chk____.tmp. What can be deduced about quorum logging since the checkpoint file was taken last?

2. To open the checkpoint file in Regedt32

e. Select the root of HKEY_LOCAL_MACHINE.

f. From the Registry menu, choose Load Hive.

g. Open the latest checkpoint file in :\MSCS, choosing a suitable name for the key when asked, e.g. Checkpointed.

h. Compare the contents of the Checkpointed key with the current Cluster key. Are there many differences?

i. What is the value of RegistrySequence in the checkpointed hive?

j. With the checkpointed hive selected, choose Unload Hive from the Registry menu. Click Yes to confirm the operation.


Exercise 16Backing Up and Restoring the Cluster Configuration with Backup and ClusRest

Goal

In this exercise, the student will back up the cluster configuration using Backup and ClusRest.

Tasks

Detailed Steps

1. To backup the cluster configuration

a. Start Backup.

b. Check the System State box so that System State will be backed up.

c. Run the backup operation.


2. Delete quorum log

d. Stop the Cluster service on a node.

e. On the remaining node, stop the Cluster service and then restart the service with the –fixquorum parameter.

f. Bring the quorum resource online in Cluster Administrator.
g. Delete the files located in the MSCS folder using Explorer.

3. To restore the cluster configuration

h. Using Backup, run a restore operation.

i. When prompted, reboot your computer.

j. Did the Cluster service start? Why?

k. At the command prompt, run Clustrest.exe (Clustrest.exe is located in the Resource Kit directory in the program files folder). When prompted, click Yes to continue.

l. Attempt to start the Cluster service.

m. Stop the Cluster service.

n. Delete the cluster configuration files from the quorum device.

o. Using Backup, run a restore operation.

p. At a command prompt, run ClusRest.exe.

q. Verify that the configuration files have been restored to the quorum device.


Exercise 17Recovering from a Corrupted Quorum Log

Goal
In this exercise, the student will use the -fixquorum option to start the Cluster service without requiring access to the quorum yet still allowing the shared disks to be accessed. This permits the student to see the quorum disk and troubleshoot problems accessing the quorum log file that is required to start the Cluster service.

Tasks
Detailed Steps

1. To recover from a corrupted quorum log

a. Stop the Cluster service on three nodes.

b. On the remaining node, stop the service. Then restart it with the –fixquorum resource.

c. Bring the quorum resource online in Cluster Administrator.

d. Browse to the quorum drive, and locate the quolog.log file.

e. Open the file in Notepad. Make some random modifications. Select File, Save.

f. Stop the Cluster service.

g. Attempt to restart the Cluster service. What error messages did you receive?

h. In the Services applet, start the Cluster service with the –resetquorumlog parameter.



Exercise 18Changing the Quorum Resource

Goal

In this exercise, the student will simulate a failure in the quorum resource, which will prevent the Cluster service from starting. The student will then use the –fixquorum parameter to start the Cluster service without access to a quorum resource. To fix the problem, the student will use Cluster Administrator to move the quorum resource to the other disk on the shared SCSI bus.

Tasks

Detailed Steps

1. To simulate a quorum resource failure

a. Make sure to know which physical disk contains the current quorum resource. (Hint: the quorum drive letter can be found from the registry or by looking at the cluster property sheet in Cluster Administrator. Then copy a reasonably large directory, e.g. %systemroot% to that drive and see which disk is accessed from the disk light.)

b. Turn off the disk that contains the quorum resource.

c. Wait a short while. Do any messages appear to inform that there is a problem with the cluster? (Hint: try using Cluster Administrator).

d. Look in the Event Viewer for recent ClusSvc entries (there should also be plenty of entries with a source of Disk).

e. Is the Cluster service still running?

f. Check the entries in the Event Log. How often does Cluster service attempt to join or form a cluster before failing altogether (i.e. without retrying after a back-off interval)?

2. To assign another disk as the quorum resource

g. If the Cluster service is running on either node, stop it using the Services snap-in.

h. On the node that is still running, use the Services snap-in to specify "-fixquorum" as the startup parameter for the Cluster service and start the service.

i. Did the Cluster service start successfully, albeit after a delay?

j. Check for ClusSvc entries in the Event Log.

k. Open Cluster Administrator, connecting to the node name if the cluster name is not available.

l. Right-click the Cluster icon, and then click Properties.

m. Select the Quorum tab, select the other disk as the quorum resource, and then click OK.

n. In Services, stop the Cluster service, and then start the Cluster service without startup parameters. After the service starts, start the service on the other node.

o. Bring the other node online.

p. Make a note of the new quorum resource drive letter:

q. Shut down both nodes. Start the shared disk shut down earlier. Restart both nodes.


That is all folks !!