11 Kasım 2015 Çarşamba

[EN] FAL[client]: Failed to request gap sequence ... FAL[client]: All defined FAL servers have been attempted.

Problem:

Because of a network problem one of our physical standby database, there was some gaps.

...
      9597          2 +ADATA/astb/archivelog/2015_11_09/thread_2_seq_9597.1827.895359343
      9600          2 +ADATA/astb/archivelog/2015_11_09/thread_2_seq_9600.1828.895360629
      9601          2 +ADATA/astb/archivelog/2015_11_09/thread_2_seq_9601.1831.895361495
      9604          2 +ADATA/astb/archivelog/2015_11_09/thread_2_seq_9604.1825.895363191
      9607          2 +ADATA/astb/archivelog/2015_11_10/thread_2_seq_9607.1824.895364061
      9608          2 +ADATA/astb/archivelog/2015_11_10/thread_2_seq_9608.1823.895364505
      9609          2 +ADATA/astb/archivelog/2015_11_10/thread_2_seq_9609.1826.895365409
...

After the network problem was solved by our system team, Standby RFS processes can succesfully get recent archived logs. But could not get earlier gap archived logs. Even not try to get them. We monitor this via V$MANAGED_STANDBY view.

When we try to restart media recovery on the standby site. We still see the following alertlog warnings.




Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION nodelay
Media Recovery Waiting for thread 2 sequence 9484
Fetching gap sequence in thread 2, gap sequence 9484-9560
Tue Nov 10 17:12:27 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 2 sequence 9484-9560
 DBID 304187465 branch 656388943
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
------------------------------------------------------------

Solution:

We tried;
Restarting media recovery on the standby site. NOT WORK.
Deferred and then reenabled revelant log_archive_dest parameter on the primary site. NOT WORK.
Restarted primary database. NOT WORK.
Restarted standby listener. NOT WORK.
Manually transferred archived logs to standby site and registered them. IT WORKED :). BUT THERE WAS TO MUCH MANUAL WORK :(

Then we finally found the real solution:

--primary db
alter system set log_archive_max_processes = 15 scope=BOTH sid='*';

Current value of this parameter was 4. We increased its value to 15 (it can be 30 max.).
Then we saw that new RFS processes spawn on the standby site (via V$MANAGED_STANDBY).
And this new processes started to get the missing archived log files from the very beginning of the gap. That's it. IT WORKED :)

But, do not forget to reduce this parameter to reasonable value back again. Also do not leave the value at 30. In case you may face this gap situaion again you will have no other option rather than manually transferring archived log files.





2 yorum: