Remove a Node

From pressy's brainbackup
Jump to: navigation, search

How to remove a cluster node

In this example we will remove a host from the cluster. We can always use the “scconf” command, to see, what still need to done, before we can finally remove the node. In this example it’s a three way cluster, the steps are similar inside a two node cluster.

# scconf –r –h <nodename>
scconf:  Failed to remove node (nodename) - node is still cabled or otherwise in use.
scconf:    The node is still cabled.
scconf:    The node is still in use by resource group "nfs-rg".
scconf:    The node is still in use by resource group "pt1-db-rg".
scconf:    The node is still in use by resource group "pt1-ci-rg".
scconf:    The node is still in use by resource group "pt1-ai30-rg".
scconf:    The node is still in use by resource group "pi1-db-rg".
scconf:    The node is still in use by device group "ora1-ds".
scconf:    The node is still in use by device group "dsk/d31".
scconf:    The node is still in use by device group "dsk/d30".
scconf:    The node is still in use by device group "dsk/d29".
scconf:    The node is still in use by device group "dsk/d28".
scconf:    The node is still in use by device group "dsk/d27".
scconf:    The node is still in use by device group "dsk/d26".
scconf:    The node is still in use by device group "dsk/d25".
scconf:    The node is still in use by device group "dsk/d24".
scconf:    The node is still in use by device group "dsk/d23".
scconf:    The node is still in use by device group "dsk/d22".
scconf:    The node is still in use by device group "dsk/d21".
scconf:    The node is still in use by device group "dsk/d20".
scconf:    The node is still in use by device group "dsk/d19".
scconf:    The node is still in use by device group "dsk/d18".
scconf:    The node is still in use by device group "dsk/d17".
scconf:    The node is still in use by device group "dsk/d16".
scconf:    The node is still in use by device group "dsk/d15".
scconf:    The node is still in use by device group "dsk/d14".
scconf:    The node is still in use by device group "ora-ds".
scconf:    The node is still in use by device group "sap-ds".
scconf:    The node is still in use by device group "dsk/d8".
scconf:    The node is still in use by device group "dsk/d7".
scconf:    The node is still in use by device group "dsk/d6".
scconf:    The node is still in use by quorum device "d28".

1) To remove all services running on the host (the node to be removed), let’s evacuate the service from this host:

# scswitch –S –h <nodename>

2) To see if the switch worked, look into the Web BUI or issue the command:

# scstat –g 

3) Remove the node from the configured cluster-services. To see:

# scrgadm -pv | grep -i nodelist.*<nodename>
  (nfs-rg) Res Group Nodelist:                     node1 node2 node3
  (pt1-db-rg) Res Group Nodelist:                  node1 node2 node3
  (pt1-ci-rg) Res Group Nodelist:                  node1 node2 node3
  (pt1-ai30-rg) Res Group Nodelist:                node1 node2 node3
  (pi1-db-rg) Res Group Nodelist:                  node1 node2 node3
#

Redefine the possible primaries for each group:

# scrgadm -c -g nfs-rg -h node1,node2
# scrgadm -c -g pt1-db-rg -h node1,node2
# scrgadm -c -g pt1-ci-rg -h node1,node2
# scrgadm -c -g pt1-ai30-rg -h node1,node2
# scrgadm -c -g pi1-db-rg -h node1,node2

4) If Metasets are used, we need to remove the host from the disksets. This step is not necessary with ZFS Pools

# metaset -s sap-ds -d -h node3
# metaset -s ora-ds -d -h node3

Let’s remove the unnecessary mediator hosts (only needed with dual storage configuration)

# metaset -s sap-ds -d -m node1 node2
# metaset -s ora-ds -d -m node1 node2 

5) Remove the node from the remaining IPMP Configurations for each LH-res:

# scrgadm -pvv | grep ":NetIfList) Res property value"
      (nfs-rg:lh6:NetIfList) Res property value: ipmpA@1 ipmpB@2
      (pt1-db-rg:lh1:NetIfList) Res property value: ipmpC@1 ipmpD@2 ipmpE@3
      (pt1-ci-rg:lh2:NetIfList) Res property value: ipmpA@1 ipmpB@2 ipmpF@3
      (pt1-ci-rg:lh3:NetIfList) Res property value: ipmpC@1 ipmpD@2 ipmpE@3
      (pt1-ai30-rg:lh4:NetIfList) Res property value: ipmpA@1 ipmpB@2 ipmpF@3


# scrgadm -c -j lh3 -x netiflist=ipmpC@1,ipmpD@2
# scrgadm -c -j lh2 -x netiflist=ipmpA@1,ipmpB@2

6) Change the localonly flag of those that are in the output of both commands:

# scconf -pvv | grep -i local_disk
  (dsk/d21) Device group type:                     Local_Disk
  (dsk/d20) Device group type:                     Local_Disk
# scconf -r -h node3
scconf:  Failed to remove node (node3) - node is still cabled or otherwise in use.
scconf:    The node is still cabled.
scconf:    The node is still in use by device group "ora1-ds".
scconf:    The node is still in use by device group "dsk/d31".
[...]
scconf:    The node is still in use by device group "dsk/d6".
scconf:    The node is still in use by quorum device "d28".

# scconf -c -D name=dsk/d20,localonly=false
# scconf -c -D name=dsk/d21,localonly=false

7) Remove the node from all the devices displayed in this output:

# scconf -r -h node3
scconf:  Failed to remove node (node3) - node is still cabled or otherwise in use.
scconf:    The node is still cabled.
scconf:    The node is still in use by device group "ora-ds".
scconf:    The node is still in use by device group "dsk/d31".
[...]
scconf:    The node is still in use by device group "dsk/d6".
scconf:    The node is still in use by quorum device "d28".

# for x in dsk/d30 dsk/d29 dsk/d28 dsk/d27 dsk/d26 dsk/d25 dsk/d24 dsk/d23 dsk/d22 dsk/d21 dsk/d20 dsk/d19 dsk/d18 dsk/d17 dsk/d16 dsk/d15 dsk/d14 dsk/d8 dsk/d7 dsk/d6; \
> do scconf -r -D name=$x,nodelist=node3;
> done
# metaset -s ora1-ds

Set name = ora1-ds, Set number = 1

Host                Owner
  node1
  node2
Drive Dbase

d29   Yes

d30   Yes
# metaset -s ora1-ds -a -h node3
Proxy command to: node1

8) Shutdown the node which should be removed by:

# init 0 

9) Put the node into maintenance state:

# scconf -c -q node=node3,maintstate

10) Display the current transport configuration:

# scstat -W

-- Cluster Transport Paths --

                    Endpoint               Endpoint               Status
                    --------               --------               ------
  Transport path:   node1:ce5           node2:ce5           Path online
  Transport path:   node1:ce0           node2:ce0           Path online
  Transport path:   node1:ce5           node3:bge1          faulted
  Transport path:   node1:ce0           node3:bge0          faulted
  Transport path:   node2:ce0           node3:bge0          faulted
  Transport path:   node2:ce5           node3:bge1          faulted

11) Remove the remaining interconnects for the node:

# scsetup
4) Cluster interconnect
-> 4) Remove a transport cable
	-> name + adapter (for each interconnect)

12) Now we have to remove the Quorum (only in two way cluster)

# scconf -c -q installmode
# scstat -q 
# scconf -r -q globaldev=d# 

13) Now we can remove the node from the cluster

# scconf -r -h node3

14) In a three way cluster, you have a “chicken and egg” problem... in order to remove a third node attached to a quorum device, you are required to both remove the quorum device and to have a quorum device....

To solve that problem: Remove the Quroum:

# scconf -r -q globaldev=d28
add a bogusnode node:
# scconf -a -h dummy
remove the third node:
# scconf -r -h node3
scrub the scsi reservation:
# /usr/cluster/lib/sc/scsi –c scrub –d /dev/rdsk/c#t#d#s2
now you can add the quorum back:
# scconf -a -q globaldev=d28
and remove the dummy-node:
# scconf -r -h dummy