singleton ORA-24798 error cannot resume the distributed transaction branch on another instance

The database has a single service defined for two preferred RAC instances. The service has the DTP flag set and load balancing enabled. As one database service is defined only one Tuxedo group is defined . In this Tuxedo group are two servers which do a nested service call. The initiating Tuxedo parent service starts an XA transaction, connects to the database, and suspends the transaction. The Tuxedo parent service then calls the Tuxedo child service, connects to the database, and tries but fails to start an XA transaction with the join flag set. The failure occurs when the second database service request, sent by the Tuxedo child service, was routed to a different RAC instance than the one sent by the Tuxedo parent service.

SOLUTION

The solution  is to use the recommended configuration defined for Oracle 10gR2 RAC. Multiple versions of the database service need to be defined and each one will only have a single preferred RAC instance. Accordingly the Tuxedo configuration needs to be changed to have a Tuxedo group defined for each database service. The load balancing will then be done by Tuxedo for these database services.

1. Create two versions of  "myservice" database services myserviceA and myserviceB each with one preferred and one available RAC instance:

srvctl add service -d dbrac -s myserviceA -r dbrac1 -a dbrac2
srvctl add service -d dbrac -s myserviceB -r dbrac2 -a dbrac1



2.  Set the DTP flag on both services (NOTE: 11.1 documentation says this is not required but that was not tested):

srvctl modify service -d dbrac -s myserviceA -x TRUE
srvctl modify service -d dbrac -s myserviceB -x TRUE


3. Change the tnsnames.ora to have two copies of myservice with SERVICE_NAME set to "myserviceA" and "myserviceB".

4. Create two Tuxedo groups, one for each database service:

*GROUPS
"DBGROUP1"     LMID=SIMPTEST       GRPNO=1
OPENINFO="Oracle_XA:Oracle_XA+Sqlnet=myserviceA+Acc=P/xxx01/mymach+SesTm=30"
"DBGROUP2"     LMID=SIMPTEST       GRPNO=2
OPENINFO="Oracle_XA:Oracle_XA+Sqlnet=myserviceB+Acc=P/xxx01/mymach+SesTm=30"


NOTE: The actual application servers duplicated in each Tuxedo group was minimized by only using this group duplication for those Tuxedo application servers involved in the nested service call.

SYMPTOMS

According to documentation your GLOBAL_TXN_PROCESSES is set to a value higher than 0 (1 is default) on all the instances of the RAC and your XA application does connect to database through a service which has not been defined as a DTP service.

The service that is being used by XA application is not a singleton service, so it does load balancing on connections.

Your XA application has created a transaction branch connected to database and then detached this branch. When a different Resource Manager or XA process tries to attach this transaction branch, it may fail with error:

ORA-24798: cannot resume the distributed transaction branch on another instance

CHANGES

There have not been errors on detaching the transaction branch.

The error will appear if the resource manager trying to resume/join the branched transaction is connected to a different instance as the resource manager that created the transaction.

CAUSE

Releases 11G and 12.1 maintains a restriction regarding resuming/joining transaction branches on RAC environments as we can see on documentation:

Oracle DatabaseDevelopment Guide 12c Release 1 (12.1)
 Chapter 19 - Developing Applications with Oracle XA
   Section - Oracle XA Issues and Restrictions
       Apart - Using Oracle XA with Oracle Real Application Clusters (Oracle RAC)

" A different case is when multiple instances operate on a single transaction
branch. For example, assume that a single transaction lands on Node 1 and
Node 2 as follows:

Node 1

1. xa_start
2. SQL operations
3. xa_end (SUSPEND)

Node 2

1. xa_start (RESUME)
2. xa_prepare
3. xa_commit
4. xa_end

In the immediately preceding sequence, Oracle Database returns an error
because Node 2 must not resume a branch that is physically located on a different node (Node 1)"

Resuming/Joining transaction branches on RAC environments requires that the service that XA application connects to database is a singleton service like DTP, as described on:

Oracle Real Application Clusters Administration and Deployment Guide 12g Release 1 (12.1)
 Chapter 5 - Workload Management with Dynamic Database Services
   Section - Distributed Transaction Processing in Oracle RAC
      Apart - Overview of XA Transactions and DTP Services


"XA affinity (placing all branches of the same XA transaction at the same Oracle RAC
instance) is a requirement when suspending and resuming the same XA branch or if
using savepoints across branches."

SOLUTION

There are different solutions to avoid this restrictions:

1) Avoid that your application resumes/joins transaction branches operations.

2) Modify your XA application to keep control in which instance a transaction  branch  was created, so a resource manager connected to that instance will be able to  resume that branch.

3) Use a singleton service like DTP, so all the connection will go to the same instance   and therefore the problem will not occur.

你可能感兴趣的:(数据库)