Oracle BAM项目上线后,运行得一直不是很稳定,Server后台不断报错,报错之后就出现各种异常,包括ODI被hung住,BAM的Report、DO等全部都打不开,每隔一段时间就出现这种状态,当时项目上就留了我一个人,既要解决ODI异步执行监控问题,又要解决系统运行异常问题,又要被客户不断的质疑,项目经理又是不是的打电话询问进展,分身乏术,简直就是焦头烂额呀,花了两个晚上研究ODI的Agent底层执行代码和Agent的执行方式,突然找到了异步执行的方法,啧啧,意外地惊喜。解决了异步执行,压力就少了很多,终于可以专心的研究系统不稳定运行的原因了。先贴上报错一:
Exception Description: The method [getAddressString] on the object [oracle.sdpinternal.messaging.AddressImpl] triggered an exception.
Internal Exception: java.lang.reflect.InvocationTargetException
Target Invocation Exception: java.lang.NullPointerException
Mapping: oracle.toplink.mappings.DirectToFieldMapping[addressString-->ADDRESS.ADDRESS_STRING]
Descriptor: RelationalDescriptor(oracle.sdpinternal.messaging.AddressImpl --> [DatabaseTable(ADDRESS)])
at weblogic.ejb.container.internal.EJBRuntimeUtils.throwEJBException(EJBRuntimeUtils.java:154)
at weblogic.ejb.container.internal.BaseLocalObject.handleSystemException(BaseLocalObject.java:887)
at weblogic.ejb.container.internal.BaseLocalObject.handleSystemException(BaseLocalObject.java:818)
at weblogic.ejb.container.internal.BaseLocalObject.postInvoke1(BaseLocalObject.java:517)
at weblogic.ejb.container.internal.BaseLocalObject.__WL_postInvokeTxRetry(BaseLocalObject.java:455)
at weblogic.ejb.container.internal.SessionLocalMethodInvoker.invoke(SessionLocalMethodInvoker.java:52)
at oracle.sdpinternal.messaging.storage.MessagingStore_urkbp2_ELOImpl.recordStatusUpdate(Unknown Source)
at oracle.sdpinternal.messaging.EngineReceivingCoreBean.processStatus(EngineReceivingCoreBean.java:533)
at sun.reflect.GeneratedMethodAccessor928.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
报错二:
Caused by: java.lang.NullPointerException
at oracle.sdpinternal.messaging.util.Normalizer.normalizeEmailAddress(Normalizer.java:90)
at oracle.sdpinternal.messaging.AddressImpl.getNormalizedValue(AddressImpl.java:196)
at oracle.sdpinternal.messaging.AddressImpl.toCanonicalString(AddressImpl.java:446)
at oracle.sdpinternal.messaging.AddressImpl.getAddressString(AddressImpl.java:491)
at sun.reflect.GeneratedMethodAccessor839.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at oracle.toplink.internal.descriptors.MethodAttributeAccessor.getAttributeValueFromObject(MethodAttributeAccessor.java:56)
解决过程和思路:
查看了报错,多是跟EmailAddress有关,而BAM跟邮件相关的即是Alert,发送邮件。因此可以从发送警报作为解决问题的着手点。
从log看来,异常是由于ADDRESS.ADDRESS_STRING的字段为空引起的。ADDRESS表是User Message Service(UMS)的系统表,用来记录收发信人的地址的,所属的数据库为ORASDPM,当发送一条报警后,Address表数据如下图,
第一次接触UMS,乍看到这两条记录,只能确定第二条记录是收件人的Email信息,第一条异常的记录是啥用处,无从得知,网上关于Address表的资料很少,反复测试后猜测该记录应为发信人的Email信息。
在假定异常记录为发信人的地址后,作了相关查询,在metalink上查询后得到如下信息:
Steps to reproduce:
1) Login to FMW EM console
2) On the left pane, click on User Messaging Service
3) Click on usermessagingserver at the top of the page
4) Go to Message status and choose Successful in Overall status filter
5) Click Search.
Note: the same exception can be seen when selecting soa-infra in Enterprise Manager.
Cause
This issue is due to the presence of NULL values in the VALUE column of the DEV_ORASDPM.ADDRESS table.
This was identified in unpublished Bug 13259585 NPE WHEN GETTING MESSAGE STATUS FROM EM
Solution
1. Identify the records in the "DEV_ORASDPM.ADDRESS" table in which the 'VALUE' (i.e the email address) is blank. Execute the following sql command:
SELECT ADDR_ID, TYPE, ADDRESS_STRING, VALUE FROM "DEV_ORASDPM"."ADDRESS" WHERE VALUE IS NULL;
2. Either manually delete these records in the DB in a 'non-proudction' environment or used the purge script as detailed in Note 1287036.1 - Purging ORASDPM Schema On SOA 11g.
Ultimately please identify and fix the BPEL process that is generating a NULL email address. In most cases this is the email address of the user receiving the notification or email. For example, in an 'Email Activity' the 'To' address is the one populated in the 'VALUE' column.
这个信息佐证了猜想,即异常是由ADDRESS引起的,而Address是记录收件人和发件人的地址信息的,即空白的记录就是发件人的信息缺少,从而导致系统异常。
此外还有另外一个测试也可印证上面的观点,从Oracle官网download了ums的sample:usermessagingsample-src.zip将其部署到bamserver的weblogic上测试发送信息,确实印证了空白数据为发件人的数据(具体测试可参考:http://docs.oracle.com/cd/E21764_01/integration.1111/e10224/ns_java_api.htm)
收发信人地址:
Address表的数据变化:
确定完ADRRESS的用途后,后面的测试就比较有针对性了,找到UMS里设置发件人的地方。到OFM control的所有设置Sender的地方都测试过了,但是异常数据仍然存在,后来仔细研究了一下sample的代码,发现其中有一个地方是这么写的:
if (senders != null) {
message.addAllSenders(MessagingFactory.createAddress(senders));
// also register as access points, so reply messages will be received.
for (Address senderAddr : message.getSenders()) {
synchronized(this) {
mClient.registerAccessPoint(MessagingFactory.createAccessPoint(senderAddr));
}
}
}
也就是说只要sender不为空,都会为Sender用户注册Access Point,在Message Application Client里也能看到相应的Access Point,而Oracle BAM却是空白的,因此可以猜测Oracle BAM的sender是空的。翻查Oracle BAM Server Properties,发现Outbound Email Account属性为空。将其设置上,重启bamServer,再次测试报警,噢耶耶,太棒了,异常数据没了,Sender的记录被准确记录下来。数据终于正常了,感谢上苍。
至此问题顺利解决。
总结:
问题其实是由于配置的失误导致的,但是后果却是致命的。系统不稳,客户不满,上司着急,自己上火,客户不满意,后果很严重。问题嘛,很简单,但是从问题的定位到解决,却是费了不少曲折,资料缺少,初次接触UMS,时间紧迫等等都是问题。不过解决完这个问题之后,对于UMS的部分表的数据该从哪里设置都能有个大概的了解,与BAM的集成也比较清楚了。设置还是挺简单的,但是Oracle BAM的outbound email account一定要设置,此外要设置UMS的email driver里的outgoingmail相关的属性,至于SenderAddress可以留空。
其他的详细配置可参考文档:http://docs.oracle.com/cd/E21764_01/integration.1111/e10226/bam_config.htm#CEGEFIBH中的Configuring Oracle User Messaging Service
项目终于可以告一段落了,很累。Oracle BAM虽然不复杂,但是用的人少,资料也少,要研究和实现的知识点也不少,通常遇到一个问题都能耗费不少时间。身心俱疲,终于可以休息了,趁热打铁,将一些问题记录下来,供自己供他人参考,如有要转载的请记得链接本文地址,看在偶熬夜到凌晨的份上:))))