author ： Tang Xiaodan （ Zhejiang Mobile Database Administrator ）
Zhou kay （ Shanghai xinju database engineer ）
Believe that a lot of Oracle DBA In my career, I have encountered this situation more or less ： The data file has been deleted by mistake , The storage is broken and the data file cannot be recognized , The worst part is , actually rman Backup is also bad …… It's no use being in a mess when it comes to problems , And it's very dangerous to do it rashly , When there is an urgent problem , The most important thing is calm analysis , There is no disorder in the face of danger .
Now I'll go through several actual combat cases , Let's introduce some unconventional recovery methods for abnormal data files .
One 、 Recovery of deleted data files
Experimental scenario ： Due to the misoperation of maintenance personnel , Result in the deletion of some data files in the database , Database error .
>>>> fault simulation
10 spot 59 branch , Delete file by mistake .
11 spot 20 Sub database alert The log shows ora－01116 Such mistakes , According to the background log, at this time ts_test01.dbf The file cannot be opened normally .
But the database didn't shut down , Is still in the read write state .
>>>> Problem analysis
The data file was deleted by mistake , The database is still in open state . This problem can be solved by linux Systematic “ File descriptor ” Retrieve lost data files . stay Linux Everything in the system can be seen as a file , File descriptor （file descriptor） It is the index created by the kernel to efficiently manage the opened files , All execution I/O Operating system calls are made through file descriptors . File descriptor, open file handle, and i-node As shown in the figure ：
stay linux In the system , After the data file is deleted , Its file handle is also opened and used by related database processes , It can be recovered by copying the handle information directly . It should be noted that , The database cannot be shut down during this period . Otherwise, the relevant handle will be released , The file can't be retrieved .
>>>> Recovery steps
Try to pass through oracle dbwr The process found a file handle that was mistakenly deleted .
Current oracle dbwr Process spid yes 3293 The missing... Can be found through this process ts_test01.dbf File handle .
A directory named with some numbers , They are process directories , Below the fd The subdirectory contains all the process related file descriptors . In this case oracle Of dbwr Process fd There are deleted items in the directory ts_test01.dbf The descriptor of the file （ notes ： The file descriptor is 25, The current state is deleted）.
adopt copy Restore deleted data files in the same way , And set the right group permissions .
By way of offline The file redirection is completed by resetting the path of the file .
Because the previous data file cannot open The problem of , Some changed data cannot be written to the data file , Lead to datafile header Upper checkpoint＃ and controlfile Of documents checkpoint_change# atypism , Media recovery of data files is required .
After media recovery , The table space is OK online, Troubleshooting is also complete .
>>>> To sum up
As a system maintainer rm,mv They are all high-risk operations , Think twice before you do it , Determine the impact , Achieve “ Ning Ting 3 branch , No robbing 1 second ”. When you have a database problem , The fault status shall be maintained , Without a clear understanding of the cause of the problem and the solution , Hasty action will complicate the problem , Cause incalculable losses . For this case , If you shut down the database rashly , Only use rman Backup for recovery , If the backup fails , Data loss will be inevitable . In short, we can do , There is no disorder in the face of danger ！ think thrice before acting ！
Two 、 Use bbed Skip full recovery of archive files
Experimental scenario ： Some data files are damaged due to storage corruption , Need to use backup to restore , The missing part of the archive is found in the database recovery phase , The database cannot be recovered , Normal start .
>>>> Experimental environment preparation
Use rman Make a complete database .
Yes test Table execution insert operation , Every three times insert Once after switch logfile, Guaranteed to be generated 34,35,36 Each of the three archives contains 3 strip insert Operation log of .
>>>> fault simulation
adopt abort After the mode stops , Delete ts_test01.dbf File simulation storage failure .
Delete... Artificially sequence 35 The archive log of . thus , The fault has recurred .
When used again startup Command startup , The database is mount And then because I couldn't recognize datafile 6(ts_test01.dbf), In the end, we can only stay in mount Stage .
adopt rman Restore data files in the same way . In the media recovery phase rman Report errors ：no backup of archived log for thread 1 with sequence 35 and startingscn of…….. It's because it's missing 35 The Restore cannot be completed due to No （35 File number has been deleted ）.
The archive log records all kinds of operations on the database in chronological order （ Include insert,delete,update,create wait ）. The loss of archives means that some operations are missing ,oracle You will not be able to continue the subsequent recovery of the archive .
In this case, the use of conventional means is obviously not normal open database . Need to pass through bbed Skip the missing archive to continue with media recovery .
>>>> Recovery steps
adopt rman Of crosscheck archivelog all Command check archive log found , The lack of 35 No .
Skipping the missing archive requires 6 Document No scn Push forward at least equal to 36 No first change＃1243371
Data file scn Recorded in the file 1 Number block Offset 484 In the first four bytes of a byte . At present 6 Document No scn After the big and small end conversion, the decimal value is 1243327（dump The original value of is bff81200 The hexadecimal after the big and small end conversion is 0012f8bf）. The value is exactly 35 No first change＃
Use bbed Change the header of the data file scn Number , Turn it into 1243381（ Pay attention to the changed scn Need greater than 36 No first change＃, In this experiment 36 No first change＃10 As new scn Number , After hexadecimal and size side conversion, the data is f5f812）, And use sum apply Command to recalculate the checksums .
To skip archiving, you need the header of the data file rba. It consists of seq#、log block#、 Offset ( Fixed for 16) form , It determines from which location of the archive log the data file should be archived .Rba Offset in the header block of the data file 500 From the beginning 12 Bytes ( As shown in the picture from 23 Start to 0000ffff end , front 4 A byte is the serial number of the log , middle 4 One byte is the log block number , Last 4 Two bytes is the offset ).
take rba Change to the next archive log .log block#.offset#( This experiment rba Changed to 24000000.02000000.10000000 namely 36.2.16)
Execute the data file again 6 The database can be opened normally after media recovery . Due to skipping part of the log , Inevitably, there is the problem of data loss or inconsistency . For the database restored by this method, it is recommended to stop and rebuild at an appropriate time .
>>>> To sum up
Backup is the last line of defense for data security , The validity of backup files should be fully verified . Check found backup set exception , When it comes to problems such as loss , New database backup should be initiated in time .
Oracle A database is like a huge and sophisticated machine , Key documents are like gears in a machine , Any missing will lead to the stagnation and collapse of the whole system . As a qualified DBA, We need to be able to use machines , You should also be able to repair machines , Be bold and careful , There is no disorder in the face of danger , Guarantee the lasting and stable operation of the database .