EXPERT ORACLE DBA: Redo log corruption

Recovering After the Loss of Online Redo Log Files: Scenarios

If a media failure has affected the online redo logs of a database, then the appropriate recovery procedure depends on the following:

The configuration of the online redo log: mirrored or non-mirrored
The type of media failure: temporary or permanent
The types of online redo log files affected by the media failure: current, active, unarchived, or inactive

Table 19-1 displays V$LOG status information that can be crucial in a recovery situation involving online redo logs.

Table 19-1 STATUS Column of V$LOG

Status	Description
UNUSED	The online redo log has never been written to.
CURRENT	The online redo log is active, that is, needed for instance recovery, and it is the log to which the database is currently writing. The redo log can be open or closed.
ACTIVE	The online redo log is active, that is, needed for instance recovery, but is not the log to which the database is currently writing.It may be in use for block recovery, and may or may not be archived.
CLEARING	The log is being re-created as an empty log after an ALTER DATABASE CLEAR LOGFILE statement. After the log is cleared, then the status changes to UNUSED.
CLEARING_CURRENT	The current log is being cleared of a closed thread. The log can stay in this status if there is some failure in the switch such as an I/O error writing the new log header.
INACTIVE	The log is no longer needed for instance recovery. It may be in use for media recovery, and may or may not be archived.

Recovering After Losing a Member of a Multiplexed Online Redo Log Group

If the online redo log of a database is multiplexed, and if at least one member of each online redo log group is not affected by the media failure, then the database continues functioning as normal, but error messages are written to the log writer trace file and the alert_SID.log of the database.

Solve the problem by taking one of the following actions:

If the hardware problem is temporary, then correct it. The log writer process accesses the previously unavailable online redo log files as if the problem never existed.
If the hardware problem is permanent, then drop the damaged member and add a new member by using the following procedure.

Note:

The newly added member provides no redundancy until the log group is reused.

Locate the filename of the damaged member in V$LOGFILE. The status is INVALID if the file is inaccessible:

2. SELECT GROUP#, STATUS, MEMBER

3. FROM V$LOGFILE

4. WHERE STATUS='INVALID';

6. GROUP# STATUS MEMBER

7. ------- ----------- ---------------------

8. 0002 INVALID /oracle/oradata/trgt/redo02.log

Drop the damaged member. For example, to drop member redo01.log from group 2, issue:

11. ALTER DATABASE DROP LOGFILE MEMBER '/oracle/oradata/trgt/redo02.log';

12.

Add a new member to the group. For example, to add redo02.log to group 2, issue:

14. ALTER DATABASE ADD LOGFILE MEMBER '/oracle/oradata/trgt/redo02b.log'

15. TO GROUP 2;

16.

If the file you want to add already exists, then it must be the same size as the other group members, and you must specify REUSE. For example:

ALTER DATABASE ADD LOGFILE MEMBER '/oracle/oradata/trgt/redo02b.log'

REUSE TO GROUP 2;

Recovering After the Loss of All Members of an Online Redo Log Group

If a media failure damages all members of an online redo log group, then different scenarios can occur depending on the type of online redo log group affected by the failure and the archiving mode of the database.

If the damaged log group is active, then it is needed for crash recovery; otherwise, it is not.

If the group is . . .	Then . . .	And you should . . .
Inactive	It is not needed for crash recovery	Clear the archived or unarchived group.
Active	It is needed for crash recovery	Attempt to issue a checkpoint and clear the log; if impossible, then you must restore a backup and perform incomplete recovery up to the most recent available redo log.
Current	It is the log that the database is currently writing to	Attempt to clear the log; if impossible, then you must restore a backup and perform incomplete recovery up to the most recent available redo log.

Your first task is to determine whether the damaged group is active or inactive.

Locate the filename of the lost redo log in V$LOGFILE and then look for the group number corresponding to it. For example, enter:

2. SELECT GROUP#, STATUS, MEMBER FROM V$LOGFILE;

4. GROUP# STATUS MEMBER

5. ------- ----------- ---------------------

6. 0001 /oracle/dbs/log1a.f

7. 0001 /oracle/dbs/log1b.f

8. 0002 INVALID /oracle/dbs/log2a.f

9. 0002 INVALID /oracle/dbs/log2b.f

10. 0003 /oracle/dbs/log3a.f

11. 0003 /oracle/dbs/log3b.f

12.

Determine which groups are active. For example, enter:

14. SELECT GROUP#, MEMBERS, STATUS, ARCHIVED

15. FROM V$LOG;

16.

17. GROUP# MEMBERS STATUS ARCHIVED

18. ------ ------- --------- -----------

19. 0001 2 INACTIVE YES

20. 0002 2 ACTIVE NO

21. 0003 2 CURRENT NO

22.

If the affected group is inactive, follow the procedure in Losing an Inactive Online Redo Log Group. If the affected group is active (as in the preceding example), then follow the procedure in "Losing an Active Online Redo Log Group".

Losing an Inactive Online Redo Log Group

If all members of an online redo log group with INACTIVE status are damaged, then the procedure depends on whether you can fix the media problem that damaged the inactive redo log group.

If the failure is . . .	Then . . .
Temporary	Fix the problem. LGWR can reuse the redo log group when required.
Permanent	The damaged inactive online redo log group eventually halts normal database operation. Reinitialize the damaged group manually by issuing the ALTER DATABASE CLEAR LOGFILE statement as described in this section.

Clearing Inactive, Archived Redo

You can clear an inactive redo log group when the database is open or closed. The procedure depends on whether the damaged group has been archived.

To clear an inactive, online redo log group that has been archived, use the following procedure:

If the database is shut down, then start a new instance and mount the database:

2. STARTUP MOUNT

Reinitialize the damaged log group. For example, to clear redo log group 2, issue the following statement:

5. ALTER DATABASE CLEAR LOGFILE GROUP 2;

Clearing Inactive, Not-Yet-Archived Redo

Clearing a not-yet-archived redo log allows it to be reused without archiving it. This action makes backups unusable if they were started before the last change in the log, unless the file was taken offline prior to the first change in the log. Hence, if you need the cleared log file for recovery of a backup, then you cannot recover that backup. Also, it prevents complete recovery from backups due to the missing log.

To clear an inactive, online redo log group that has not been archived, use the following procedure:

If the database is shut down, then start a new instance and mount the database:

2. STARTUP MOUNT

Clear the log using the UNARCHIVED keyword. For example, to clear log group 2, issue:

5. ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2;

If there is an offline datafile that requires the cleared log to bring it online, then the keywords UNRECOVERABLE DATAFILE are required. The datafile and its entire tablespace have to be dropped because the redo necessary to bring it online is being cleared, and there is no copy of it. For example, enter:

ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2 UNRECOVERABLE DATAFILE;

Immediately back up the whole database with an operating system utility, so that you have a backup you can use for complete recovery without relying on the cleared log group. For example, enter:

8. % cp /disk1/oracle/dbs/*.f /disk2/backup

Back up the database's control file with the ALTER DATABASE statement. For example, enter:

11. ALTER DATABASE BACKUP CONTROLFILE TO '/oracle/dbs/cf_backup.f';

Failure of CLEAR LOGFILE Operation

The ALTER DATABASE CLEAR LOGFILE statement can fail with an I/O error due to media failure when it is not possible to:

Relocate the redo log file onto alternative media by re-creating it under the currently configured redo log filename
Reuse the currently configured log filename to re-create the redo log file because the name itself is invalid or unusable (for example, due to media failure)

In these cases, the ALTER DATABASE CLEAR LOGFILE statement (before receiving the I/O error) would have successfully informed the control file that the log was being cleared and did not require archiving. The I/O error occurred at the step in which the CLEAR LOGFILE statement attempts to create the new redo log file and write zeros to it. This fact is reflected in V$LOG.CLEARING_CURRENT.

Losing an Active Online Redo Log Group

If the database is still running and the lost active redo log is not the current log, then issue the ALTER SYSTEM CHECKPOINT statement. If successful, then the active redo log becomes inactive, and you can follow the procedure in"Losing an Inactive Online Redo Log Group". If unsuccessful, or if your database has halted, then perform one of procedures in this section, depending on the archiving mode.

The current log is the one LGWR is currently writing to. If a LGWR I/O fails, then LGWR terminates and the instance crashes. In this case, you must restore a backup, perform incomplete recovery, and open the database with theRESETLOGS option.

To recover from loss of an active online log group in NOARCHIVELOG mode:

If the media failure is temporary, then correct the problem so that the database can reuse the group when required.
Restore the database from a consistent, whole database backup (datafiles and control files) as described in "Restoring Datafiles Before Performing Incomplete Recovery". For example, enter:

3. % cp /disk2/backup/*.dbf $ORACLE_HOME/oradata/trgt/

Mount the database:

6. STARTUP MOUNT

Because online redo logs are not backed up, you cannot restore them with the datafiles and control files. In order to allow the database to reset the online redo logs, you must first mimic incomplete recovery:

9. RECOVER DATABASE UNTIL CANCEL

10. CANCEL

11.

Open the database using the RESETLOGS option:

13. ALTER DATABASE OPEN RESETLOGS;

14.

Shut down the database consistently. For example, enter:

16. SHUTDOWN IMMEDIATE

17.

Make a whole database backup.

To recover from loss of an active online redo log group in ARCHIVELOG mode:

If the media failure is temporary, then correct the problem so that the database can reuse the group when required. If the media failure is not temporary, then use the following procedure.

Begin incomplete media recovery, recovering up through the log before the damaged log.
Ensure that the current name of the lost redo log can be used for a newly created file. If not, then rename the members of the damaged online redo log group to a new location. For example, enter:

3. ALTER DATABASE RENAME FILE "?/oradata/trgt/redo01.log" TO "/tmp/redo01.log";

4. ALTER DATABASE RENAME FILE "?/oradata/trgt/redo01.log" TO "/tmp/redo02.log";

Open the database using the RESETLOGS option:

7. ALTER DATABASE OPEN RESETLOGS;

Note:

All updates executed from the endpoint of the incomplete recovery to the present must be re-executed.

Loss of Multiple Redo Log Groups

If you have lost multiple groups of the online redo log, then use the recovery method for the most difficult log to recover. The order of difficulty, from most difficult to least difficult, follows:

The current online redo log
An active online redo log
An unarchived online redo log
An inactive online redo log

EXPERT ORACLE DBA

Wednesday, October 19, 2016

Redo log corruption

1 comment:

Blog Archive