Sunday, 3 April 2011

RECOVERY

The following scenarios are implemented:

1-system-tablespace-loss                  1b-recover-system-tablespace
2-user-datafile-loss                           2b-recover-users-tablespace
3-online-redo-loss                             3b-recover-from-redo-loss
4-controlfile-loss                               4b-recover-from-controlfile-loss
5-database-loss                                5b-recover-from-total-loss

1-system-tablespace-loss

This script generates a crash that removes the system tablespace datafile, in order to be able to remove the datafile a crash of the ASM instance is produced.

Once the system tablespace datafile is removed the database is brought up
and the error messages

ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
and
ORA-01110: data file 1: '+DATADG/redpanda/datafile/system.268.654614821'

are returned by Oracle.

#!/usr/bin/tcsh
source ./set-environment
echo Generating database crash ...
echo
set v_rmf=`echo 'select file_name from dba_data_files where file_id=1;' | sqlplus -s / as
sysdba | grep system`
setenv ORACLE_SID +ASM
sqlplus -s $dbauser/$dbapwd@+ASM as sysdba <<eof
shutdown abort;
eof
sqlplus -s $dbauser/$dbapwd@+ASM as sysdba <<eof
startup
eof
echo
echo Generating system tablespace loss ...
echo
asmcmd lsdg
echo
asmcmd ls +datadg/redpanda/datafile
echo
asmcmd rm -rf $v_rmf
echo
asmcmd lsdg
echo
asmcmd ls +datadg/redpanda/datafile
echo
echo Trying to restart the database after the crash ...
echo
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
startup
eof
exit
Script Output:
avargas-pc:~/scripts/RECOVERY> ./1-system-tablespace-loss
Generating database crash ...
ASM instance shutdown
……
……
….

Trying to restart the database after the crash ...
ORACLE instance started.
Total System Global Area 318046208 bytes
Fixed Size 1299652 bytes
Variable Size 285215548 bytes
Database Buffers 25165824 bytes
Redo Buffers 6365184 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '+DATADG/redpanda/datafile/system.268.654614821'


1b-recover-system-tablespace

This script starts the catalog database, that was down as result of the crash on the previous scenario, then mounts the production database, restores and recover datafile 1 and opens the database.

#!/usr/bin/tcsh
source ./set-environment
echo Restarting RMAN Catalog Database
echo
setenv ORACLE_SID rmancat
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba<<eof
startup
exit
eof
echo
echo Executing Command : RESTORE and RECOVER SYSTEM DATAFILE
echo
rman TARGET $dbauser/$dbapwd@$datadb CATALOG $rmanuser/$rmanpwd@$rmandb <<eof
startup mount;
restore datafile 1;
recover datafile 1;
alter database open;
eof
sqlplus $dbauser/$dbapwd@$datadb as sysdba <<eof
select file_name from dba_data_files;
exit
eof
exit


2-user-datafile-loss
This script creates a apps.customers table on tablespace users, then the datafile is removed and the reference table is accessed the following errors are then returned:

select username from apps.customers
*
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: '+DATADG/redpanda/datafile/users.270.654614823'

#!/usr/bin/tcsh
source ./set-environment
setenv ORACLE_SID redpanda
clear
echo
echo Preparing User Application ...
echo
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
drop user apps cascade;
create user apps identified by apps default tablespace users temporary tablespace temp;
grant dba to apps;
connect apps/apps@$datadb
create table customers as select * from dba_users;
update customers set username='CUSTOMER_'||USERNAME;
commit;
select username from customers;
exit
eof
echo
echo Generating user datafile remove ...
echo
set v_rmf=`echo "select file_name from dba_data_files where tablespace_name='USERS';" |
sqlplus -s / as sysdba | grep users`
echo
echo Setting tablespace users offline ...
echo
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
alter tablespace users offline;
eof
echo
echo Removing tablespace users datafile ...
echo
setenv ORACLE_SID +ASM
asmcmd lsdg
echo
asmcmd ls +datadg/redpanda/datafile
echo
asmcmd rm -rf $v_rmf
echo
asmcmd ls +datadg/redpanda/datafile
echo
asmcmd lsdg
echo
echo Checking application ...
echo

sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
select username from apps.customers;
exit
eof

Script Output:
avargas-pc:~/scripts/RECOVERY> ./2-user-datafile-loss
Preparing User Application ...
……………
…………………

Checking application ...
select username from apps.customers
*
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: '+DATADG/redpanda/datafile/users.270.654614823'

2b-recover-users-tablespace
This script restore the missing datafile of tablespace users, in this case the database did not crash when the datafile was removed, the restore and recover can be done online.

#!/usr/bin/tcsh
source ./set-environment
echo Restoring and Recovering Tablespace Users ...
echo
rman TARGET $dbauser/$dbapwd@$datadb CATALOG $rmanuser/$rmanpwd@$rmandb <<eof
restore datafile 4;
recover datafile 4;
sql 'alter tablespace users online' ;
eof
echo
echo Checking application ...
echo
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
select username from apps.customers;
exit
eof
exit

3-online-redo-loss
This script simulates a database crash that lead to a missing online redo log, when the database is restarted the following errors are displayed:

ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1:
'+DATADG/redpanda/onlinelog/group_1.271.654616725'
ORA-00312: online log 1 thread 1:
'+FRADG/redpanda/onlinelog/group_1.285.654616725'

#!/usr/bin/tcsh
source ./set-environment
echo Generating database crash ...
echo
set v_logf=v\$logfile
set v_logs=v\$log
set v_logh=v\$log_history
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
set pages 50000 lines 120 echo off head off veri off flush off ti off
spool rmonlnlog.sh
select 'asmcmd rm '||member||'' from $v_logf where group#=1;
spool off
set echo on head on veri on
set pages 50000 lines 120
spool redolog_miss_status-before_crash.log
archive log list;
select SEQUENCE# from $v_logh where FIRST_TIME=(select max(FIRST_TIME) from $v_logh);
select * from $v_logs;
spool off
SHUTDOWN ABORT;
eof
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
SHUTDOWN ABORT;
eof
echo
echo Removing online redo log group ...
echo
setenv ORACLE_SID +ASM
sqlplus -s sys/oracle as sysdba <<eof
SHUTDOWN ABORT;
STARTUP;
eof
chmod 700 ./rmonlnlog.sh
./rmonlnlog.sh
rm ./rmonlnlog.sh
echo
echo Starting databases after the crash ...
echo
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
STARTUP
eof
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
STARTUP
Eof
3b-recover-from-redo-loss

This script performs the recovery of the database on the online redolog miss scenario.
When a redolog is missing a full database restore must be done.
The script request the sequence # to recover using the until sequence clause. This sequence can be obtained from the database alert.log

#!/usr/bin/tcsh
source ./set-environment
set v_logf=v\$logfile
set v_log=v\$log
echo Executing Full Database Restore ...
echo
echo Please check the last archived sequence of the database.
echo please enter sequence number to restore to ...
set v_seq = $<
echo
rman TARGET $dbauser/$dbapwd@$datadb CATALOG $rmanuser/$rmanpwd@$rmandb <<eof
STARTUP MOUNT;
RESTORE DATABASE;
RECOVER DATABASE UNTIL SEQUENCE $v_seq THREAD 1;
ALTER DATABASE OPEN RESETLOGS;
eof
echo Checking Database after online redolog loss and database restore and recover
echo
sqlplus $dbauser/$dbapwd@$datadb <<eof
set pages 50000 lines 200
col member for a55
select * from $v_log;
select member from $v_logf;
exit
eof


4-controlfile-loss

This script simulates a database crash that lead to a missing controlfile, when the database is restarted the following error is displayed:

ORA-00205: error in identifying control file, check alert log for more info

#!/usr/bin/tcsh
source ./set-environment
echo Generating database crash ...
echo
set v_ctf=v\$controlfile
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
set pages 50000 lines 120 echo off head off veri off flush off ti off
spool rmctlfl.sh
select 'asmcmd rm '||name||'' from $v_ctf ;
spool off
SHUTDOWN ABORT
eof
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
SHUTDOWN ABORT
eof
echo Checking databases up ...
ps -efa | grep smon | grep -v grep
echo
echo Removing controlfiles ...
echo
setenv ORACLE_SID +ASM
sqlplus -s / as sysdba <<eof
SHUTDOWN ABORT
STARTUP
eof
chmod 700 ./rmctlfl.sh
./rmctlfl.sh
rm rmctlfl.sh
echo
echo Starting databases after the crash ...
echo
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
STARTUP
eof
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
STARTUP
Eof

4b-recover-from-controlfile-loss

This script restore the missing controlfile from an autobackup, then mounts and recover the database.
After a controlfile restore the database must be opened using the resetlogs option.

#!/usr/bin/tcsh
source ./set-environment
echo Executing Controlfile Restore
echo
echo Please check the DBID from any backup file, i.e.:
echo
echo in this controlfile backup dbid=3603176431
echo
echo cf_D-REDPANDA_id-3603176431_0bjg00fq
echo -----------------^^^^^^^^^^---------
echo
echo please enter DBID number of the database to restore the controlfile
set v_dbid = $<
echo
rman TARGET $dbauser/$dbapwd@$datadb CATALOG $rmanuser/$rmanpwd@$rmandb <<eof
SET DBID $v_dbid;
STARTUP NOMOUNT;
RESTORE CONTROLFILE FROM AUTOBACKUP;
ALTER DATABASE MOUNT;
RECOVER DATABASE;
ALTER DATABASE OPEN RESETLOGS;
eof
set v_ctlf=v\$controlfile
echo Checking Database after controlfile loss and restore
echo
sqlplus $dbauser/$dbapwd@$datadb as sysdba<<eof
set pages 50000 lines 120
select name from $v_ctlf
;
exit
eof

5-database-loss

This script simulates a crash that lead to the complete loss of the database, the whole database directory is wiped off the ASM diskgroup DATADG, all online logs, controlfiles, spfile and datafiles are lost.

When trying to open the database the following errors are returned:

ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATADG/redpanda/spfileredpanda.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATADG/redpanda/spfileredpanda.ora
ORA-15056: additional error message
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +DATADG/redpanda/spfileredpanda.ora
ORA-17503: ksfdopn:2 Failed to open file +DATADG/redpanda/spfileredpanda.ora
ORA-15173: entry 'redpanda' does not exist in directory '/'
ORA-06512: at line 4

#!/usr/bin/tcsh
source ./set-environment
echo Generating database crash ...
echo
set v_par=v\$parameter
set v_dba=v\$database
set v_logs=v\$log
set v_logh=v\$log_history
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
set echo on head on veri on pages 50000 lines 120
spool redolog_miss_status-before_crash.log
archive log list;
select SEQUENCE# from $v_logh where FIRST_TIME=(select max(FIRST_TIME) from $v_logh);
select * from $v_logs;
spool off
set pages 50000 lines 120 echo off head off veri off flush off ti off
spool rmdbs.sh
select 'asmcmd rm -rf '||a.value||'/'||b.name
from $v_par a, $v_dba b
where a.name='db_create_file_dest';
spool off
SHUTDOWN ABORT
eof
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
SHUTDOWN ABORT
eof
echo
echo Preparing to crash and burn database ...
echo
setenv ORACLE_SID +ASM
sqlplus -s / as sysdba <<eof
SHUTDOWN ABORT
STARTUP
eof
chmod 700 ./rmdbs.sh
./rmdbs.sh
rm rmdbs.sh
echo
echo Starting databases after the crash ...
echo
sqlplus -s $dbauser/$dbapwd@$rmandb as sysdba <<eof
STARTUP
Eof
sqlplus -s $dbauser/$dbapwd@$datadb as sysdba <<eof
STARTUP
eof
echo
echo Evaluating damage ...
echo
echo Listing directories on ASM Data diskgroup
echo
asmcmd ls +datadg
echo
echo Listing directories on ASM Flash Recovery Area diskgroup
echo
asmcmd ls +fradg
echo

5b-recover-from-total-loss
This script executes the restore after a total database loss.
In first term it creates the missing database directory on the ASM diskgroup DATADG, this is necessary to be able to restore the spfile and controlfile on the first stages of the restore procedure.

The instance is initially started by rman with dummy parameters, once the spfile is restored the instance is restarted nomount using the spfile.

Once the controlfile is successfully restored from autobackup the database can be restored, recovered and open with the restlogs option.

#!/usr/bin/tcsh
source ./set-environment
echo Executing Controlfile Restore
echo Please check the DBID from any backup file, i.e.:
echo
echo in this controlfile backup dbid=3603176431
echo
echo cf_D-REDPANDA_id-3603176431_0bjg00fq
echo -----------------^^^^^^^^^^---------
echo
echo please enter DBID number of the database to restore the controlfile
set v_dbid = $<
echo
echo Please check the last archived sequence of the database.
echo please enter sequence number to restore to ...
set v_seq = $<
echo
echo Rebuilding Database Directory on Data Diskgroup
echo
setenv ORACLE_SID +ASM
asmcmd mkdir +DATADG/REDPANDA
asmcmd ls +DATADG
echo
echo Executing Rman Restore and Recovery Steps
echo
rman TARGET $dbauser/$dbapwd@$datadb CATALOG $rmanuser/$rmanpwd@$rmandb <<eof
SET DBID $v_dbid;
STARTUP NOMOUNT;
RESTORE SPFILE FROM AUTOBACKUP ;
STARTUP FORCE NOMOUNT;
RESTORE CONTROLFILE FROM AUTOBACKUP;
ALTER DATABASE MOUNT;
run {
set until sequence $v_seq thread 1;
restore database;
recover database;
}
ALTER DATABASE OPEN RESETLOGS;
eof
set v_log=v\$log
set v_logf=v\$logfile
set v_dbs=v\$database
echo Checking Database after total database loss, restore and recover
echo
sqlplus $dbauser/$dbapwd@$datadb as sysdba <<eof
set pages 50000 lines 120
select * from $v_log;
select member from $v_logf;
select name from $v_dbs;
exit
eof


No comments:

Post a Comment