Aug 22, 2022
Archived Redo Log- Files

The Oracle database can run in one of two modes: ARCHIVELOG mode and NOARCHIVELOG mode. The difference between these two modes is simply what happens to a redo log file when Oracle goes to reuse it. “Will we keep a copy of that redo or should Oracle just overwrite it, losing it forever?” is an important question to answer. Unless you keep this file, you can’t recover data from a backup to that point in time.

Say you take a backup once a week on Saturday. Now, on Friday afternoon, after you have generated hundreds of redo logs over the week, your hard disk fails. If you have not been running in ARCHIVELOG mode, the only choices you have right now are as follows:

•\ Drop the tablespace(s) associated with the failed disk. Any tablespace that had a file on that disk must be dropped, including the contents of that tablespace. If the SYSTEM tablespace (Oracle’s data dictionary) or some other important system-related tablespace like your UNDO tablespace is affected, you can’t do this. You will have to use the next option instead.

•\ Restore last Saturday’s data and lose all of the work you did that week.

Neither option is very appealing. Both imply that you lose data. If you had been executing in ARCHIVELOG mode, on the other hand, you simply would have found another disk and restored the affected files from Saturday’s backup onto it. Then, you would have applied the archived redo logs and, ultimately, the online redo logs to them (in effect replaying the week’s worth of transactions in fast-forward mode). You lose nothing. The data is restored to the point of the failure.

People frequently tell me they don’t need ARCHIVELOG mode for their production systems. I have yet to meet anyone who was correct in that statement. I believe that a system is not a production system unless it is in ARCHIVELOG mode. A database that is not in ARCHIVELOG mode will, someday, lose data. It is inevitable; you will lose data (not might, but will) if your database is not in ARCHIVELOG mode.

“We are using RAID-5, so we are totally protected” is a common excuse. I’ve seen cases where, due to a manufacturing error, all disks in a RAID set froze, all at about the same time. I’ve seen cases where the hardware controller introduced corruption into the datafiles, so people safely protected corrupt data with their RAID devices. RAID also does not do anything to protect you from operator error, one of the most common causes of data loss. RAID does not mean the data is safe, it might be more available, it might be safer, but data solely on a RAID device will be lost someday; it is a matter of time.

“If we had the backups from before the hardware or operator error and the archives were not affected, we could have recovered.” The bottom line is that there is no excuse for not being in ARCHIVELOG mode on a system where the data is of any value. Performance is no excuse; properly configured archiving adds little to no overhead. This, and the fact that a fast system that loses data is useless, means that even if archiving added 100 percent overhead, you still need to do it. A feature is overhead if you can remove it and lose nothing important; overhead is like icing on the cake. Preserving your data and making sure you don’t lose your data isn’t overhead—it’s the DBA’s primary job!

Only a test or maybe a development system should execute in NOARCHIVELOG mode. Most development systems should be run in ARCHIVELOG mode for two reasons:

•\ This is how you will process the data in production; you want development to act and react as your production system would.
•\ In many cases, the developers pull their code out of the data dictionary, modify it, and compile it back into the database. The development database holds the current version of the code. If the development database suffers a disk failure in the afternoon, what happens to all of the code you compiled and recompiled all morning? It’s lost.

Don’t let anyone talk you out of being in ARCHIVELOG mode. You spent a long time developing your application, so you want people to trust it. Losing their data will not instill confidence in your system.

Note There are some cases in which a large DW could justify being in NOARCHIVELOG mode—if it made judicious use of READ ONLY tablespaces and was willing to fully rebuild any READ WRITE tablespace that suffered a failure by reloading the data.

More Details
Aug 22, 2022
Online Redo Log- Files-1

Every Oracle database has at least two online redo log file groups. Each redo log group consists of one or more redo log members (redo is managed in groups of members). The individual redo log file members of these groups are true mirror images of each other.

These online redo log files are fixed in size and are used in a circular fashion. Oracle will write to log file group 1, and when it gets to the end of this set of files, it will switch to log file group 2 and overwrite the contents of those files from start to end.

When it has filled log file group 2, it will switch back to log file group 1 (assuming we have only two redo log file groups; if we have three, it would, of course, proceed to the third group). This is shown in Figure 3-4.

Figure 3-4.  Writing to log file groups

The act of switching from one log file group to another is called a log switch. It is important to note that a log switch may cause a temporary “pause” in a poorly configured database. Since the redo logs are used to recover transactions in the event of a failure, we must be certain we won’t need the contents of a redo log file before we are able to use it. If Oracle isn’t sure that it won’t need the contents of a log file, it will suspend operations in the database momentarily and make sure that the data in the cache that this redo “protects” is safely written (checkpointed) onto disk. Once Oracle is sure of that, processing will resume and the redo file will be reused.

We’ve just started to talk about a key database concept: checkpointing. To understand how online redo logs are used, you’ll need to know something about checkpointing, how the database buffer cache works, and what a process called databaseblock writer (DBWn) does. Thedatabase buffer cache and DBWn are covered in more detail later on, but we’llskip ahead a little anyway and touch on them now. 

Thedatabase buffer cache is where database blocks are stored temporarily. This isa structure in Oracle’s SGA. As blocks are read, they are stored in this cache, hopefully so we won’t have to physically reread them later.

The buffer cache is first and foremost a performance tuning device. It exists solely to make the very slow process of physical I/O appear to be much faster than it is.

When we modify a block by updating a row on it, these modifications are done in memory to the blocks in the buffer cache. Enough information to redo or to replay this modification is stored in the redo log buffer, another SGA data structure.

When we COMMIT our modifications, making them permanent, Oracle does not go to all of the blocks we modified in the SGA and write them to disk. Rather, it just writes the contents of the redo log buffer out to the online redo logs.

As long as that modified block is in the buffer cache and not on disk, we need the contents of that online redo log in case the database fails. If, at the instant after we committed, the power was turned off, the database buffer cache would be wiped out. If this happens, the only record of our change is in that redo log file.

Upon restart of the database, Oracle will actually replay our transaction, modifying the block again in the same way we did and committing it for us. So, as long as that modified block is cached and not written to disk, we can’t reuse (overwrite) that redo log file. 

This is where DBWn comes into play. This Oracle background process is responsible for making space in the buffer cache when it fills up and, more important, for performing checkpoints. A checkpoint is the writing of dirty (modified) blocks from the buffer cache to disk.

Oracle does this in the background for us. Many things can cause a checkpoint to occur, the most common being a redo log switch. As we filled up log file 1 and switched to log file 2, Oracle initiated a checkpoint.

At this point, DBWn started writing to disk all of the dirty blocks that are protected by log file group 1. Until DBWn flushes all of these blocks protected by that log file, Oracle can’t reuse (overwrite) it. If we attempt to use it before DBWn has finished its checkpoint, we’ll get a message like this in our database’s ALERT log: … 

Thread 1 cannot allocate new log, sequence 16 Checkpoint not complete Current log# 3 seq# 15 mem# 0: /opt/oracle/oradata/CDB/redo03.log…

More Details