00600 [4400]产生原因及解决方案

Oracle报错实例分析:运维DBA反映Oracle数据库出现报错ORA-00600: internal error code, arguments: [4400], [48],分析原因为跟分布式事务有关。

分析一个ORA-600错误,用UE打开trace,看到如下错误:

Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /install1/oracle/bill1/product/9.2.0
System name: AIX
Node name: billing1
Release: 3
Version: 5
Machine: 00CB104D4C00
Instance name: bill1
Redo thread mounted by this instance: 1
Oracle process number: 148
Unix process pid: 1363982, image: oracle@billing1 (TNS V1-V3)

*** SESSION ID:(578.13852) 2012-06-04 12:08:44.492
*** 2012-06-04 12:08:44.492
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

可以看到,该系统是aix 5.3,db version是9208 单实例。
继续查看下面的call stack,内容如下:

----- Call Stack Trace -----
calling              call    entry                argument values in hex     
location            type    point                (  means dubious value)   
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl      ksedst              1029555CC 
ksfdmp+0018          bl      01FD46A8           
kgeriv+0118          bl      _ptrgl             
kgeasi+00cc          bl      kgeriv              000000002  1100A2128 
                                                  102954750  7000001B2B78F30 
                                                  000000000 
ktcddt+013c          bl      kgeasi              110006838  1103923A8 
                                                  113000001130  200000002 
                                                  100000001  000000004 
                                                  000000030  7000001B1B74EE8 
ktcsod+01f8          bl      ktcddt              110006838  110006978 
                                                  000000000 
kssdch_stage+02b8    bl      _ptrgl             
kssdch+0014          bl      kssdch_stage        000000000  700000000007D98 
                                                  110002F50 
ktcbod+030c          bl      kssdch              11000D618  000000008 
kssdch_stage+02b8    bl      _ptrgl             
kssdch+0014          bl      kssdch_stage        110002F50  110061758 
                                                  000000009 
ksuxds+1118          bl      kssdch              1000E82E4  000000000 
ksudel+006c          bl      ksuxds              7000001AB716A20  100000001 
opilof+03dc          bl      01FD4914           
opiodr+08cc          bl      _ptrgl             
ttcpip+0cc4          bl      _ptrgl             
opitsk+0d60          bl      ttcpip              11000D4C0  000000000 
                                                  000000000  000000000 
                                                  000000000  000000000 
                                                  000000000  000000000 
opiino+0758          bl      opitsk              000000000  000000000 
opiodr+08cc          bl      _ptrgl             
opidrv+032c          bl      opiodr              3C00000018  4101FAA40 
                                                  FFFFFFFFFFFF8F0  0A0012010 
sou2o+0028          bl      opidrv              3C0C000000  4A0059B20 
                                                  FFFFFFFFFFFF8F0 
main+0138            bl      01FD40E8           
__start+0098        bl      main                000000000  000000000 

mos上关于该错误的描述是这样的:
PURPOSE:           
  This article discusses the internal error "ORA-600 [4400]", what  it means and possible actions. The information here is only applicable  to the versions listed and is provided only for guidance.

ERROR:             
  ORA-600 [4400] [a] [b] [c] [d] [e]

VERSIONS:
  versions 6.0 to 11

DESCRIPTION:

  Internal error 4400 means that we are trying to delete a transaction (for  example at logoff time) but the transaction has not yet been marked  completed.

  This can happen at the remote site in a distributed transaction if the  first part of the first stage of a two phase commit gets an error before  it really starts the protocol.

FUNCTIONALITY:     
  TRANSACTION CONTROL

IMPACT:           
  PROCESS FAILURE - but only at logoff so minimal impact
  NON CORRUPTIVE - No underlying data corruption.

该文档描述说4400错误是跟分布式事务有关,曾经也遇到不少关于分布式事务的问题,以前也写过一篇:ORA-01591: lock held by in-doubt distributed transaction

针对该错误,对比call stack 可以发现,基本上完全一致,该文档说该错误完全可以忽略,如下:

ORA-00600 [4400], [48], [], [], [], [] From a Distributed Transaction [ID 464861.1]

Symptoms
The following error is reported on 9.2.0.5:

ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

The call stack is:

ksedmp ksfdmp kgeriv kgeasi ktcddt ktcsod kssdch_stage kssdch ktcbod
kssdch_stage kssdch PGOSF40__ksuxds ksudel kxfprdp opirip opidrv sou2o

Cause
The error is encountered due to Bug 3840810 which was fixed in version 10.1.0.3.

The error is encountered when there is a dblink between 8i and 9i/10g databases. This error is only raised
in the log-off of the local session while trying to delete a transaction but the transaction has not yet
been marked completed. This lack of information is caused by the bug and if there is no process failure due
to this error, it can be ignored since there is no SQL statement/session affected.

This bug has been fixed by architectural changes in 10g and unfortunately is not backportable to 9.2.

If this is an one time occurrence then it can be safely ignored.

我们可以从trace里面找到如下信息:

BH (0x700000134fdd900) file#: 405 rdba: 0x65403498 (405/13464) class 1 ba: 0x700000134764000
  set: 84 dbwrid: 3 obj: 343488 objn: 343488
  hash: [700000070fead00,700000163feec00] lru: [7000000e8fe6d68,70000013ffe8468]
  LRU flags: hot_buffer
  ckptq: [7000000aefe93d8,700000088fe2dd8] fileq: [7000001b1174040,7000000eefc73e8]
  st: XCURRENT md: NULL rsop: 0x0 tch: 5
  flags: buffer_dirty gotten_in_current_mode block_written_once
          redo_since_read
  LRBA: [0x3e7cf.396e5.0] HSCN: [0x0bac.2f283f13] HSUB: [1] RRBA: [0x0.0.0]
  buffer tsn: 32 rdba: 0x65403498 (405/13464)
  scn: 0x0bac.2f283f13 seq: 0x01 flg: 0x02 tail: 0x3f130601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x65403498

Itl          Xid                  Uba        Flag  Lck        Scn/Fsc
0x01  0x0040.026.00167771  0x7203e25c.0f91.02  C---    0  scn 0x0bac.2f26d569
0x02  0x008c.04b.002ae01f  0x7242e607.0089.07  C---    0  scn 0x0bac.2f26d709
0x03  0x0043.04f.00184d78  0x720020a7.e5ff.2f  C---    0  scn 0x0bac.2f28047f
.......
0x30  0x0005.041.001d254f  0x228560f5.c941.0e  --U-    1  fsc 0x0075.2f282d74
........
0x3f  0x008e.012.0029e870  0x22801e5b.0444.06  C---    0  scn 0x0bac.2f26d4de
0x40  0x0005.000.001d261c  0x228560f5.c941.0a  --U-    1  fsc 0x007b.2f282d25
.........
0x61  0x0020.004.000bc447  0x7203ce74.e107.12  C---    0  scn 0x0bac.2f26d66a
0x62  0x0049.04c.001e453a  0x2a42eb96.ffe3.06  C---    0  scn 0x0bac.2f26d647

LRBA 是recover的起点,这个是checkpoint东西,大家可以参考这里:详解oracle checkpoint

从上面可以看到,所有事务falg都是C或U,表示事务都是提交了的,说明这个错误确实没有任何影响。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/d3bdb4b91f8b1b903809baa15857ff20.html