History | Log In     View a printable version of the current page. Get help!  
Issue Details (XML | Word)

Key: JBPAPP-402
Type: Bug Bug
Status: Resolved Resolved
Resolution: Done
Priority: Blocker Blocker
Assignee: Ron Sigal
Reporter: Carlo de Wolf
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
JBoss Enterprise Platform App Edition

In certain situations createQueueConnection can hang

Created: 05/Nov/07 05:56 AM   Updated: 19/Dec/07 10:36 AM
Component/s: App Server
Affects Version/s: None
Fix Version/s: 4.3.0.GA
Security Level: Public (Everyone can see)

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. Java Source File BisocketClientInvoker.java (20 kb)
2. Text File dump.log (115 kb)
3. Java Archive File jboss-remoting.jar (838 kb)
4. Java Archive File jboss-remoting.jar (839 kb)
5. Java Archive File jboss-remoting.jar (838 kb)
6. Text File thread-dump.log (131 kb)

Issue Links:
Dependency
This issue depends:
JBREM-873 Have ServerInvokerCallbackHandler reg... Major Closed
JBREM-845 Infinite loop in BisocketClientInvoke... Blocker Closed
This issue is a dependency of:
JBPAPP-416 QE blockers for EAP4.3 Blocker Closed

Affects: Release Notes


 Description  « Hide
Under certain conditions in JBoss Remoting an infinite loop can occur which causes createQueueConnection to hang.

    [junit] "main" prio=1 tid=0x000000004011a5e0 nid=0x33b9 runnable [0x00007fff3c82b000..0x00007fff3c82d280]
    [junit] at java.net.SocketInputStream.socketRead0(Native Method)
    [junit] at java.net.SocketInputStream.read(SocketInputStream.java:129)
    [junit] at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    [junit] at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
    [junit] - locked <0x00002b467be64b60> (a java.io.BufferedInputStream)
    [junit] at java.io.FilterInputStream.read(FilterInputStream.java:66)
    [junit] at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.readVersion(MicroSocketClientInvoker.java:1000)
    [junit] at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:574)
    [junit] at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:413)
    [junit] at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:122)
    [junit] at org.jboss.remoting.Client.invoke(Client.java:1634)
    [junit] at org.jboss.remoting.Client.invoke(Client.java:548)
    [junit] at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.org$jboss$jms$client$delegate$ClientConnectionFactoryDelegate$createConnectionDelegate$aop(ClientConnectionFactoryDelegate.java:167)
    [junit] at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.invokeNext(ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.java)
    [junit] at org.jboss.jms.client.container.StateCreationAspect.handleCreateConnectionDelegate(StateCreationAspect.java:83)
    [junit] at org.jboss.aop.advice.org.jboss.jms.client.container.StateCreationAspect0.invoke(StateCreationAspect0.java)
    [junit] at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.invokeNext(ClientConnectionFactoryDelegate$createConnectionDelegate_N3019492359065420858.java)
    [junit] at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.createConnectionDelegate(ClientConnectionFactoryDelegate.java)
    [junit] at org.jboss.jms.client.JBossConnectionFactory.createConnectionInternal(JBossConnectionFactory.java:205)
    [junit] at org.jboss.jms.client.JBossConnectionFactory.createQueueConnection(JBossConnectionFactory.java:101)
    [junit] at org.jboss.jms.client.JBossConnectionFactory.createQueueConnection(JBossConnectionFactory.java:95)
    [junit] at org.jboss.ejb3.test.strictpool.unit.StrictPoolUnitTestCase.testMessageDriven(StrictPoolUnitTestCase.java:87)

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order:
Carlo de Wolf [05/Nov/07 06:01 AM]
Server side:
"Thread-41" prio=1 tid=0x00002aaaac3cd0b0 nid=0x2e87 in Object.wait() [0x0000000048fe3000..0x0000000048fe3d80]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002b7156e2f410> (a java.util.HashSet)
        at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.createSocket(BisocketClientInvoker.java:458)
        - locked <0x00002b7156e2f410> (a java.util.HashSet)
        at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.getConnection(MicroSocketClientInvoker.java:815)
        at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:525)
        at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:413)
        at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:122)
        at org.jboss.remoting.Client.invoke(Client.java:1634)
        at org.jboss.remoting.Client.invoke(Client.java:548)
        at org.jboss.remoting.Client.invokeOneway(Client.java:598)
        at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallback(ServerInvokerCallbackHandler.java:815)
        at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallbackOneway(ServerInvokerCallbackHandler.java:686)
        at org.jboss.jms.server.endpoint.ServerSessionEndpoint.performDelivery(ServerSessionEndpoint.java:1490)
        - locked <0x00002b71569e25e8> (a org.jboss.remoting.transport.bisocket.BisocketClientInvoker)
        at org.jboss.jms.server.endpoint.ServerSessionEndpoint.handleDelivery(ServerSessionEndpoint.java:1375)
        - locked <0x00002b71569dfd40> (a org.jboss.jms.server.endpoint.ServerSessionEndpoint)
        at org.jboss.jms.server.endpoint.ServerConsumerEndpoint.handle(ServerConsumerEndpoint.java:307)
        - locked <0x00002b71569ed6b0> (a java.lang.Object)
        at org.jboss.messaging.core.impl.RoundRobinDistributor.handle(RoundRobinDistributor.java:119)
        at org.jboss.messaging.core.impl.MessagingQueue$DistributorWrapper.handle(MessagingQueue.java:582)
        at org.jboss.messaging.core.impl.ClusterRoundRobinDistributor.handle(ClusterRoundRobinDistributor.java:79)
        at org.jboss.messaging.core.impl.ChannelSupport.deliverInternal(ChannelSupport.java:476)
        at org.jboss.messaging.core.impl.MessagingQueue.deliverInternal(MessagingQueue.java:505)
        at org.jboss.messaging.core.impl.ChannelSupport.deliver(ChannelSupport.java:226)
        - locked <0x00002b71565d3d50> (a java.lang.Object)
        at org.jboss.jms.server.endpoint.ServerSessionEndpoint$2.run(ServerSessionEndpoint.java:1598)
        at EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:89)
        at java.lang.Thread.run(Thread.java:595)

Note that this also blocks a server shutdown.

Alex Pinkin [27/Nov/07 10:30 AM]
problem reproduced. fix being worked on. ETA 11-28

Tim Fox [27/Nov/07 10:58 AM]
The underlying problem here is that when using a client pool size > 1 with JBoss Remoting invocations can arrive in a different order to which they were sent.

We cannot just set max pool size to 1, since the current remoting pool implementation sleeps for 1 second if a socket is not available -p this would cripple performance.

The current workaround for this was flawed and causing the deadlock.

I have implemented a better workaround, and am now running the tests.

Tim Fox [29/Nov/07 11:45 AM]
As per the AS list thread, I have partially re-implemented the remoting socket transport client pool code to use a fair semaphore.

This needs to be reviewed and implemented and tested properly by the remoting team.

Note that this is not a problem only for JBM, this would affect any other users of the remoting socket family of transports, giving random pauses of around 1 second between invocations when number of concurrent invocations is approx equal to max client pool size.

Ron Sigal [06/Dec/07 05:46 PM]
I've applied Tim's fix and Remoting 2.2.2.SP3 is available.

Tim is about to release JBM 1.4.0.SP2.

Carlo, if you verify that org.jboss.ejb3.test.strictpool.unit.StrictPoolUnitTestCase passes with JBM 1.4.0.SP2 no longer hangs, I'll close this issue. Or you could close it.

Ron Sigal [08/Dec/07 01:06 AM]
I'm attaching a jboss-remoting.jar in which org.jboss.remoting.transport.bisocket.BisocketClientInvoker.createSocket() will throw an exception if PingTImerTask fails.

Clebert Suconic [09/Dec/07 01:39 AM]
The lock still happening with the remoting attached. See dump.log

Ron Sigal [09/Dec/07 02:30 AM]
Attaching the version of BisocketClientInvoker in the attached jboss-remoting.jar.

Ron Sigal [10/Dec/07 01:54 AM]
I've attached a new jboss-remoting.jar. In this version, a BisocketClientInvoker registers itself as a ConnectionListener, and when it is notified of the failure of the connection to its client, it will set the flag that breaks the BisocketClientInvoker.createSocket() loop. That should break the deadlock, and the shutdown should proceed normally.

Clebert Suconic [10/Dec/07 09:48 AM]
As far as I remember, the ConnectionListener doesn't guarantee ordering. doest it?
What if the JBM listener is called first?

The Remote call to the ConnectionListener is synchronous, right?

I believe you have to make sure the Bisocket listener is caleld first, right?

Ron Sigal [11/Dec/07 02:12 AM]
"I believe you have to make sure the Bisocket listener is caleld first, right?"

Clebert, great minds think alike. In this case, I mean yours and Tim's.

Ron Sigal [11/Dec/07 02:17 AM]
Tim supplied a "rough" patch to Remoting that was very nearly perfect as is. I've polished it and tested it, and am attaching the resulting jboss-remoting.jar.

As soon as I hear that the problem is solved, I'll check in the changes and tag a Remoting 2.2.2.SP4.

Carlo de Wolf [11/Dec/07 04:43 AM]
I see a hang for about 30 seconds and then cleanup continues. Like I said earlier this is acceptable to me. Please confirm this behavior.

I'm now doing an EJB 3 test suite run with it. If we have no regression there, I vote commit.

Carlo de Wolf [11/Dec/07 05:47 AM]
No regression, I vote commit.

Ron Sigal [11/Dec/07 05:06 PM]
The delay occurs while the Remoting lease expires.

Fernando Nasser [17/Dec/07 12:33 PM]
Remoting and Messaging have been upgraded to releases that are supposed to fix this.