Hi all,
I'm running the Veritas Storage Foundation Standard HA 5.0MP3 under Suse Linux Enterprise Server 11 on two Oracle X6270 servers.
There was a power outage, causing an immediate brutal shutdown of both servers. After power was restored, the server on which the Oracle service group was active ("db1-hasc") could not boot at all (mainboard failure). The other server ("db2-hasc") booted, but reported during boot that cluster cannot start, and that manual reseeding might be needed to start it, so I started the cluster from the working server db2-hasc using command gabconfig -x (found it after some googling).
In the meantime, the failed server db1-hasc was fixed and cluster is now working (all service groups online, but on the db2-hasc, the one which started successfully after power outage). No attempt has been made yet to try to switchover any of the service groups (except network service groups which are online) to the db1-hasc server.
However, I have noticed some problems with several volumes in disk group "oracledg":
db2-hasc# vxprint -g oracledg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
dg oracledg oracledg - - - - - -
dm oracled01 - - - - NODEVICE - -
dm oracled02 sdb - 335462144 - - - -
v archvol fsgen ENABLED 62914560 - ACTIVE - -
pl archvol-01 archvol DISABLED 62914560 - NODEVICE - -
sd oracled01-02 archvol-01 DISABLED 62914560 0 NODEVICE - -
pl archvol-02 archvol ENABLED 62914560 - ACTIVE - -
sd oracled02-02 archvol-02 ENABLED 62914560 0 - - -
v backupvol fsgen ENABLED 167772160 - ACTIVE - -
pl backupvol-01 backupvol DISABLED 167772160 - NODEVICE - -
sd oracled01-03 backupvol-01 DISABLED 167772160 0 NODEVICE - -
pl backupvol-02 backupvol ENABLED 167772160 - ACTIVE - -
sd oracled02-03 backupvol-02 ENABLED 167772160 0 - - -
v dbovol fsgen ENABLED 62914560 - ACTIVE - -
pl dbovol-01 dbovol DISABLED 62914560 - NODEVICE - -
sd oracled01-01 dbovol-01 DISABLED 62914560 0 NODEVICE - -
pl dbovol-02 dbovol ENABLED 62914560 - ACTIVE - -
sd oracled02-01 dbovol-02 ENABLED 62914560 0 - - -
db2-hasc# vxprint -htg oracledg
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg oracledg default default 0 1265259474.12.db1-HASc
dm oracled01 - - - - NODEVICE
dm oracled02 sdb auto 65536 335462144 -
v archvol - ENABLED ACTIVE 62914560 SELECT - fsgen
pl archvol-01 archvol DISABLED NODEVICE 62914560 CONCAT - WO
sd oracled01-02 archvol-01 oracled01 62914560 62914560 0 - NDEV
pl archvol-02 archvol ENABLED ACTIVE 62914560 CONCAT - RW
sd oracled02-02 archvol-02 oracled02 62914560 62914560 0 sdb ENA
v backupvol - ENABLED ACTIVE 167772160 SELECT - fsgen
pl backupvol-01 backupvol DISABLED NODEVICE 167772160 CONCAT - WO
sd oracled01-03 backupvol-01 oracled01 125829120 167772160 0 - NDEV
pl backupvol-02 backupvol ENABLED ACTIVE 167772160 CONCAT - RW
sd oracled02-03 backupvol-02 oracled02 125829120 167772160 0 sdb ENA
v dbovol - ENABLED ACTIVE 62914560 SELECT - fsgen
pl dbovol-01 dbovol DISABLED NODEVICE 62914560 CONCAT - WO
sd oracled01-01 dbovol-01 oracled01 0 62914560 0 - NDEV
pl dbovol-02 dbovol ENABLED ACTIVE 62914560 CONCAT - RW
sd oracled02-01 dbovol-02 oracled02 0 62914560 0 sdb ENA
Does anyone have some ideas on how to recover the disabled plexes/subdisks? Or which other commands to run to ascertain the current state of the cluster, in order to have a clear(er) picture what is wrong and which steps to take to remedy the problem?
If so, I would appreciate if you can share any tips/suggestions.
The physical disks seem fine (no errors reported in ILOM diagnostics).
Thanks,
/Hrvoje