crsd cannot be started on node 2, database and monitoring cannot be started automatically, such as ocrconfig, ocrcheck and srvct

Keywords: Database Oracle



Changes of CRSD Process in 11g


In 11.2, the CRSD process is no longer one of the most critical processes in RAC.

If you are familiar with 10g RAC, you should be aware of the importance of the CRSD process. After the operating system is started, Oracle starts the whole CLUSTER and database by starting the process.

In RAC 11.2, Oracle adjusted the ASM so that OCR and VOT can be stored in the ASM disk group. ASM is a component supported by CLUSTER, while OCR and VOT needed for CLUSTER to start are placed in ASM, which actually solves the problem of chicken or egg first. Ultimately, Oracle solved this problem through the OHASD process, and the architecture of CLUSTER and ASM changed greatly. OHASD process replaced CRSD process and became the most critical process in RAC environment.

The importance of CRSD process has been incredibly low. Two days ago, in a customer's 11.2 RAC environment, it was found that even if a node's CRSD process did not start, it could still start the database manually, and the database could be accessed normally.

The cause of the problem should be that there is an error in the disk group where OCR and VOT are accessed on Node 2, which causes CRSD to quit automatically after several attempts to obtain information stored in OCR, thus making Node 2 unable to start normally. However, at this time, besides the CRSD process, other CLUSTER processes on Node 2 have been fully started, and ASM instances can also be started. At this time, the database on Node 2 can be started manually.

  

The alert of ASM on node 2 has the following error message:



  • Tue Jun 13 10:59:17 2017
    Reconfiguration started (old inc 10, new inc 12)
    List of instances:
     1 2 (myinst: 1) 
     Global Resource Directory frozen
     Communication channels reestablished
     Master broadcasted resource hash value bitmaps
     Non-local Process blocks cleaned out
    Tue Jun 13 10:59:17 2017
     LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
     Set master node info 
     Submitted all remote-enqueue requests
     Dwn-cvts replayed, VALBLKs dubious
     All grantable enqueues granted
     Submitted all GCS remote-cache requests
     Fix write in gcs resources
    Reconfiguration complete
    Tue Jun 13 11:03:01 2017
    IPC Send timeout detected. Sender: ospid 3173 [oracle@rac1 (PING)]
    Receiver: inst 2 binc 429480538 ospid 3190
    Tue Jun 13 12:12:38 2017
    NOTE: [ocrcheck.bin@rac1 (TNS V1-V3) 21461] opening OCR file
    Tue Jun 13 12:12:38 2017
    NOTE: [ocrcheck.bin@rac1 (TNS V1-V3) 21461] opening OCR file
    Tue Jun 13 13:38:34 2017
    MEMORY_TARGET defaulting to 1128267776.
    * instance_number obtained from CSS = 1, checking for the existence of node 0... 
    * node 0 does not exist. instance_number = 1 
    Starting ORACLE instance (normal)
    Tue Jun 13 13:42:20 2017
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.

       

         


 

This should be the reason for the ocrd process to error and exit. The database can be opened normally, the database and monitoring on node 2 can not start automatically, and vip also has problems. In addition, tools that require ocr information on node 2 are not available, such as ocrconfig, ocrcheck and srvctl.

        At present, there is still no solution, if you meet friends can





         

                     




Posted by jackwh on Sat, 22 Jun 2019 12:46:14 -0700