Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - parallel_bulk does not work in AWS lambda #94

Open
Aarif1430 opened this issue Nov 25, 2021 · 12 comments
Open

[BUG] - parallel_bulk does not work in AWS lambda #94

Aarif1430 opened this issue Nov 25, 2021 · 12 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@Aarif1430
Copy link

Aarif1430 commented Nov 25, 2021

OSError: [Errno 38] Function not implemented. I started seeing this error after upgrading to python3.9. The reason is opensearch bulk function is using multiprocessing module internally and python multiprocessing.pool.ThreadPool is breaking.

OSError: [Errno 38] Function not implemented
sl = self._semlock = _multiprocessing.SemLock(
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)

File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 57, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 162, in __init__
return Lock(ctx=self.get_context())

--

image

It looks like:

To Reproduce

Steps to reproduce the behavior:

  1. Deploy an application using opensearch-py==1.0.0 to aws lambda
  2. Invoke bulk function of opensearch
  3. See error

Expected behavior
The opensearch client should work as it was working fine with python3.6

Plugins
opensearch-py==1.0.0

Screenshots
Error screenshots
image
image

Host/Environment (please complete the following information):

  • OS: Aws lambda

Additional context
Add any other context about the problem here.

@dblock dblock transferred this issue from opensearch-project/OpenSearch Nov 25, 2021
@Aarif1430 Aarif1430 changed the title [BUG] [BUG] - OSError: [Errno 38] Function not implemented. I started seeing this error after upgrading to python3.9 Nov 25, 2021
@Aarif1430 Aarif1430 changed the title [BUG] - OSError: [Errno 38] Function not implemented. I started seeing this error after upgrading to python3.9 [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda Nov 26, 2021
@Aarif1430 Aarif1430 changed the title [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda which breaks OpenSearch Nov 26, 2021
@jasongilman
Copy link

jasongilman commented Dec 2, 2021

I'm also seeing this error with Python 3.8

[ERROR] OSError: [Errno 38] Function not implemented
Traceback (most recent call last):
....
  File "/var/task/opensearchpy/helpers/actions.py", line 469, in parallel_bulk
    pool = BlockingPool(thread_count)
  File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 925, in __init__
    Pool.__init__(self, processes, initializer, initargs)
  File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 196, in __init__
    self._change_notifier = self._ctx.SimpleQueue()
  File "/var/lang/lib/python3.8/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/var/lang/lib/python3.8/multiprocessing/queues.py", line 336, in __init__
    self._rlock = ctx.Lock()
  File "/var/lang/lib/python3.8/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(

@axeoman axeoman added bug Something isn't working needs-triage labels Dec 20, 2021
@VijayanB VijayanB self-assigned this Dec 28, 2021
@wbeckler
Copy link
Contributor

wbeckler commented Aug 5, 2022

@jasongilman Did you get this error in a lambda or elsewhere?

@jasongilman
Copy link

@wbeckler It was in a lambda.

@Aarif1430
Copy link
Author

@jasongilman Yes it was in aws lambda.

@wbeckler
Copy link
Contributor

Is anyone up for contributing a patch that addresses this issue when /dev/shm isn't available? There's a potential drop in replacement for the multiprocessing library: https://pypi.org/project/lambda-multiprocessing/

@dblock dblock changed the title [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda which breaks OpenSearch [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda Jul 26, 2023
@dblock
Copy link
Member

dblock commented Jul 26, 2023

At a high level, is this issue about adding Python 3.9 support (starting with CI)?

@saimedhi
Copy link
Collaborator

saimedhi commented Sep 8, 2023

@Aarif1430 @jasongilman Is the bug still persisting?

@wbeckler wbeckler added the good first issue Good for newcomers label Sep 19, 2023
@dblock
Copy link
Member

dblock commented Nov 10, 2023

CI with Python 3.9 was added in #336 and it currently passes. We need a test that reproduces this problem.

@samuelc-tm
Copy link

I'm able the reproduce the issue:

Create lambda with python3.9:

import json
from multiprocessing.pool import ThreadPool

def lambda_handler(event, context):
    print("Hello")
    pool = ThreadPool()
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Give error

{
  "errorMessage": "[Errno 38] Function not implemented",
  "errorType": "OSError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 6, in lambda_handler\n    pool = ThreadPool()\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 927, in __init__\n    Pool.__init__(self, processes, initializer, initargs)\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 196, in __init__\n    self._change_notifier = self._ctx.SimpleQueue()\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 113, in SimpleQueue\n    return SimpleQueue(ctx=self.get_context())\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/queues.py\", line 341, in __init__\n    self._rlock = ctx.Lock()\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 68, in Lock\n    return Lock(ctx=self.get_context())\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 162, in __init__\n    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
    "  File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 57, in __init__\n    sl = self._semlock = _multiprocessing.SemLock(\n"
  ]
}

@dblock
Copy link
Member

dblock commented Nov 13, 2023

Looking at https://pypi.org/project/lambda-thread-pool/

You cannot use "multiprocessing.Queue" or "multiprocessing.Pool" within a Python Lambda environment because the Python Lambda execution environment does not support shared memory for processes.

This means we need to get rid of or be able to swap ThreadPool with LambdaThreadPool in

from multiprocessing.pool import ThreadPool
.

For an immediate workaround you can copy-paste the parallel_bulk implementation and replace BlockingPool with LambdaThreadPool and see if that works. For something maintainable, I would extract BlockingPool from this implementation by adding an abstract thread pool interface, implement another one for LambdaThreadPool and add a configuration parameter to specify which thread pool to use. Anyone wants to give either a try?

@dblock dblock changed the title [BUG] - synchronize._semlock in python3.9 breaking multiprocessing.pool in AWS lambda [BUG] - parallel_bulk does not work in AWS lambda Nov 13, 2023
@dblock
Copy link
Member

dblock commented Nov 13, 2023

I renamed this to "parallel_bulk doesn't work in AWS lambda", is there anything else that doesn't?

@samuelc-tm
Copy link

Thank you, in my case the ThreadPool is used by some sdk and it wouldn't be ideal to change. We started getting the issue when upgrading from python3.7 to 3.9. We might just find an alternative solution instead of using the sdk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

8 participants