Skip to content

Local processing job fails when wait=True (default) #5549

@moose-in-australia

Description

@moose-in-australia

PySDK Version

  • PySDK V2 (2.x)
  • PySDK V3 (3.x)

Describe the bug
When running a FrameworkProcessor in local mode with wait=True (the default), the processing job completes successfully but then raises a ClientError when attempting to call DescribeProcessingJob on the AWS API. The job only exists locally, so the describe call fails with "Could not find requested job".

To reproduce

from sagemaker.core import FrameworkProcessor
from sagemaker.core.helper.session_helper import Session
from sagemaker.core.image_uris import retrieve
from sagemaker.core.shapes import ProcessingInput, ProcessingOutput
from sagemaker.core.shapes import ProcessingS3Input, ProcessingS3Output
import os

region = Session().boto_region_name

processor_image_uri = retrieve(
    framework="sklearn",
    version="1.4-2",
    region=region
)

processor = FrameworkProcessor(
    image_uri=processor_image_uri,
    role="arn:aws:iam::123456789012:role/DummyRole",
    instance_type="local",  # Local mode
    instance_count=1,
    base_job_name='test-local-processing',
    command=["python3"]
)

# Create a simple test script
os.makedirs("./test_processing", exist_ok=True)
with open("./test_processing/test.py", "w") as f:
    f.write("print('Processing complete')")

# Run with default wait=True
processor.run(
    code="test.py",
    source_dir="./test_processing",
    inputs=[
        ProcessingInput(
            input_name="data",
            s3_input=ProcessingS3Input(
                s3_uri="file:///tmp/input",
                local_path="/opt/ml/processing/input"
            )
        )
    ],
    outputs=[
        ProcessingOutput(
            output_name="output",
            s3_output=ProcessingS3Output(
                s3_uri="file:///tmp/output",
                local_path="/opt/ml/processing/output"
            )
        )
    ]
)

Expected behavior
The processing job should complete successfully without attempting to call the AWS DescribeProcessingJob API when running in local mode. The wait() method should recognize that the job is local and not try to refresh status.

Screenshots or logs

REDACTED-sagemaker-local exited with code 0
 Compose Stopping Aborting on container exit...
 Container REDACTED-sagemaker-local Stopping
 Container REDACTED-sagemaker-local Stopped

[02/11/26 16:18:37] INFO     ===== Job Complete =====                                                  image.py:248

                    WARNING  No region provided. Using default region.                                 utils.py:340

                    INFO     Runs on sagemaker prod, region:us-east-1                                  utils.py:354

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:30                                                                                   │
│                                                                                                  │
│   27 │   }                                                                                       │
│   28 )                                                                                           │
│   29                                                                                             │
│ ❱ 30 processor.run(                                                                              │
│   31 │   code="preprocessing.py",                                                                │
│   32 │   source_dir="./processing",                                                              │
│   33 │   requirements="requirements.txt",                                                        │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/workflow/pipeline_context.py:346 in       │
│ wrapper                                                                                          │
│                                                                                                  │
│   343 │   │   │                                                                                  │
│   344 │   │   │   return _StepArguments(retrieve_caller_name(self_instance), run_func, *args,    │
│   345 │   │                                                                                      │
│ ❱ 346 │   │   return run_func(*args, **kwargs)                                                   │
│   347 │                                                                                          │
│   348 │   return wrapper                                                                         │
│   349                                                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/processing.py:1226 in run                 │
│                                                                                                  │
│   1223 │   │   )                                                                                 │
│   1224 │   │                                                                                     │
│   1225 │   │   # Submit a processing job.                                                        │
│ ❱ 1226 │   │   return super().run(                                                               │
│   1227 │   │   │   code=s3_runproc_sh,                                                           │
│   1228 │   │   │   inputs=inputs,                                                                │
│   1229 │   │   │   outputs=outputs,                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/workflow/pipeline_context.py:346 in       │
│ wrapper                                                                                          │
│                                                                                                  │
│   343 │   │   │                                                                                  │
│   344 │   │   │   return _StepArguments(retrieve_caller_name(self_instance), run_func, *args,    │
│   345 │   │                                                                                      │
│ ❱ 346 │   │   return run_func(*args, **kwargs)                                                   │
│   347 │                                                                                          │
│   348 │   return wrapper                                                                         │
│   349                                                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/processing.py:843 in run                  │
│                                                                                                  │
│    840 │   │   if not isinstance(self.sagemaker_session, PipelineSession):                       │
│    841 │   │   │   self.jobs.append(self.latest_job)                                             │
│    842 │   │   │   if wait:                                                                      │
│ ❱  843 │   │   │   │   self.latest_job.wait(logs=logs)                                           │
│    844 │                                                                                         │
│    845 │   def _include_code_in_inputs(self, inputs, code, kms_key=None):                        │
│    846 │   │   """Converts code to appropriate input and includes in input list.                 │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/resources.py:143 in wrapper               │
│                                                                                                  │
│     140 │   │   @functools.wraps(func)                                                           │
│     141 │   │   def wrapper(*args, **kwargs):                                                    │
│     142 │   │   │   config = dict(arbitrary_types_allowed=True)                                  │
│ ❱   143 │   │   │   return validate_call(config=config)(func)(*args, **kwargs)                   │
│     144 │   │                                                                                    │
│     145 │   │   return wrapper                                                                   │
│     146                                                                                          │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:39 in               │
│ wrapper_function                                                                                 │
│                                                                                                  │
│    36 │   │                                                                                      │
│    37 │   │   @functools.wraps(wrapped)                                                          │
│    38 │   │   def wrapper_function(*args, **kwargs):                                             │
│ ❱  39 │   │   │   return wrapper(*args, **kwargs)                                                │
│    40 │                                                                                          │
│    41 │   # We need to manually update this because `partial` object has no `__name__` and `__   │
│    42 │   wrapper_function.__name__ = extract_function_name(wrapped)                             │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:136 in __call__     │
│                                                                                                  │
│   133 │   │   if not self.__pydantic_complete__:                                                 │
│   134 │   │   │   self._create_validators()                                                      │
│   135 │   │                                                                                      │
│ ❱ 136 │   │   res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args,   │
│   137 │   │   if self.__return_pydantic_validator__:                                             │
│   138 │   │   │   return self.__return_pydantic_validator__(res)                                 │
│   139 │   │   else:                                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/resources.py:32145 in wait                │
│                                                                                                  │
│   32142 │   │   │   transient=True,                                                              │
│   32143 │   │   ):                                                                               │
│   32144 │   │   │   while True:                                                                  │
│ ❱ 32145 │   │   │   │   self.refresh()                                                           │
│   32146 │   │   │   │   current_status = self.processing_job_status                              │
│   32147 │   │   │   │   status.update(f"Current status: [bold]{current_status}")                 │
│   32148                                                                                          │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/resources.py:143 in wrapper               │
│                                                                                                  │
│     140 │   │   @functools.wraps(func)                                                           │
│     141 │   │   def wrapper(*args, **kwargs):                                                    │
│     142 │   │   │   config = dict(arbitrary_types_allowed=True)                                  │
│ ❱   143 │   │   │   return validate_call(config=config)(func)(*args, **kwargs)                   │
│     144 │   │                                                                                    │
│     145 │   │   return wrapper                                                                   │
│     146                                                                                          │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:39 in               │
│ wrapper_function                                                                                 │
│                                                                                                  │
│    36 │   │                                                                                      │
│    37 │   │   @functools.wraps(wrapped)                                                          │
│    38 │   │   def wrapper_function(*args, **kwargs):                                             │
│ ❱  39 │   │   │   return wrapper(*args, **kwargs)                                                │
│    40 │                                                                                          │
│    41 │   # We need to manually update this because `partial` object has no `__name__` and `__   │
│    42 │   wrapper_function.__name__ = extract_function_name(wrapped)                             │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:136 in __call__     │
│                                                                                                  │
│   133 │   │   if not self.__pydantic_complete__:                                                 │
│   134 │   │   │   self._create_validators()                                                      │
│   135 │   │                                                                                      │
│ ❱ 136 │   │   res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args,   │
│   137 │   │   if self.__return_pydantic_validator__:                                             │
│   138 │   │   │   return self.__return_pydantic_validator__(res)                                 │
│   139 │   │   else:                                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/resources.py:32025 in refresh             │
│                                                                                                  │
│   32022 │   │   logger.debug(f"Serialized input request: {operation_input_args}")                │
│   32023 │   │                                                                                    │
│   32024 │   │   client = Base.get_sagemaker_client()                                             │
│ ❱ 32025 │   │   response = client.describe_processing_job(**operation_input_args)                │
│   32026 │   │                                                                                    │
│   32027 │   │   # deserialize response and update self                                           │
│   32028 │   │   transform(response, "DescribeProcessingJobResponse", self)                       │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/botocore/client.py:602 in _api_call                      │
│                                                                                                  │
│    599 │   │   │   │   │   f"{py_operation_name}() only accepts keyword arguments."              │
│    600 │   │   │   │   )                                                                         │
│    601 │   │   │   # The "self" in this scope is referring to the BaseClient.                    │
│ ❱  602 │   │   │   return self._make_api_call(operation_name, kwargs)                            │
│    603 │   │                                                                                     │
│    604 │   │   _api_call.__name__ = str(py_operation_name)                                       │
│    605                                                                                           │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/botocore/context.py:123 in wrapper                       │
│                                                                                                  │
│   120 │   │   │   with start_as_current_context():                                               │
│   121 │   │   │   │   if hook:                                                                   │
│   122 │   │   │   │   │   hook()                                                                 │
│ ❱ 123 │   │   │   │   return func(*args, **kwargs)                                               │
│   124 │   │                                                                                      │
│   125 │   │   return wrapper                                                                     │
│   126                                                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.12/site-packages/botocore/client.py:1078 in _make_api_call                │
│                                                                                                  │
│   1075 │   │   │   │   'error_code_override'                                                     │
│   1076 │   │   │   ) or error_info.get("Code")                                                   │
│   1077 │   │   │   error_class = self.exceptions.from_code(error_code)                           │
│ ❱ 1078 │   │   │   raise error_class(parsed_response, operation_name)                            │
│   1079 │   │   else:                                                                             │
│   1080 │   │   │   return parsed_response                                                        │
│   1081                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ClientError: An error occurred (ValidationException) when calling the DescribeProcessingJob operation: Could not
find requested job with name: from-idea-to-prod-processing-2026-02-11-16-14-54-339

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 3.4.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): sklearn (FrameworkProcessor)
  • Framework version: 1.4-2
  • Python version: 3.12
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
The workaround is to set wait=False when calling processor.run() in local mode. However, this is not obvious to users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions