Skip to content

Block when executing websocket's function #4329

@michaelchao

Description

@michaelchao

Block issue happen when I do a websocket function. I will describe the reason directly as follow:

This function will be called when I use websocket function
...
int getNewMode(SocketImpl* sockImpl, int mode)
{	
  ScopedLock lock(_mutex);
  auto it = _socketMap.find(sockImpl);
  if (it != _socketMap.end())
	  mode |= it->second.second;
  return mode;
}

We can see that there is an mutex operation which try to lock resource. At the same time, there is another thread executing poll option:

oid SocketReactor::run()
{
	_pThread = Thread::current();
	while (!_stop)
	{
		try
		{
			if (!hasSocketHandlers())
			{
				onIdle();
				Thread::trySleep(static_cast<long>(_timeout.totalMilliseconds()));
			}
			else
			{
				bool readable = false;
				PollSet::SocketModeMap sm = _pollSet.poll(_timeout);
...................

in _pollSet.poll,

PollSet::SocketModeMap poll(const Poco::Timespan& timeout)
	{
		PollSet::SocketModeMap result;
		Poco::Timespan remainingTime(timeout);
		int rc;

		ScopedLock lock(_mutex);
		do
		{
			Poco::Timestamp start;
			rc = epoll_wait(_epollfd, &_events[0],
				static_cast<int>(_events.size()), static_cast<int>(remainingTime.totalMilliseconds()));
.......

The _mutex also can be visit, and when this thread lock successfully, it will sleep remainingTime.totalMilliseconds() at worse case.
This will cause an deep sleep in locking option.
In my machine, following will be shown
The thread of poll always lock resource, and the thread of getNewMode never can lock resource even thread of poll unlock.

To fix this issue, I have a personal advice:

.......
                // ScopedLock lock(_mutex); Move this to $$$$1$$$$
		do
		{
			Poco::Timestamp start;
			rc = epoll_wait(_epollfd, &_events[0],
				static_cast<int>(_events.size()), static_cast<int>(remainingTime.totalMilliseconds()));
			if (rc == 0) {
				return result;
			}

			// if we are hitting the events limit, resize it; even without resizing, the subseqent
			// calls would round-robin through the remaining ready sockets, but it's better to give
			// the call enough room once we start hitting the boundary
			if (rc >= _events.size()) _events.resize(_events.size()*2);
			else if (rc < 0)
			{
				// if interrupted and there's still time left, keep waiting
				if (SocketImpl::lastError() == POCO_EINTR)
				{
					Poco::Timestamp end;
					Poco::Timespan waited = end - start;
					if (waited < remainingTime)
					{
						remainingTime -= waited;
						continue;
					}
				}
				else SocketImpl::error();
			}
		}
		while (false);
		ScopedLock lock(_mutex); // $$$$1$$$$
		for (int i = 0; i < rc; i++)
......

I checked code and find _mutx is only responsible for _socketMap, so I this this approach could be ok.

To Reproduce
I test multiple platforms, following are results

  • Android: ok
  • Ubuntu22.04, must be happen

Expected behavior
I want ok

Logs
None

Screenshots
None

Please add relevant environment information:

  • OS Type and Version: Ubuntu22.04
  • POCO Version: 1.12.4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions