-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
PySDK Version
- PySDK V2 (2.x)
- PySDK V3 (3.x)
Describe the bug
When using LocalSession with FrameworkProcessor, the SDK ignores file:// URIs specified in ProcessingOutput.s3_output.s3_uri and replaces them with S3 URIs. This prevents local processing jobs from saving outputs to local directories as intended.
The issue is in Processor._normalize_outputs() (line 489 in sagemaker-core/src/sagemaker/core/processing.py), which replaces any non-S3 URI with an S3 URI without checking if the session is a LocalSession that should preserve file:// URIs.
To reproduce
from sagemaker.core import FrameworkProcessor
from sagemaker.core.local import LocalSession
from sagemaker.core.shapes import ProcessingOutput, ProcessingS3Output
from sagemaker.core.image_uris import retrieve
import os
# Create local session
local_session = LocalSession()
# Get processor image
processor_image_uri = retrieve(
framework="sklearn",
version="1.4-2",
region=local_session.boto_region_name
)
# Define outputs with file:// URIs
local_processing_dir = os.path.abspath("processing")
os.makedirs(f"{local_processing_dir}/train", exist_ok=True)
local_processing_outputs = [
ProcessingOutput(
output_name="train",
s3_output=ProcessingS3Output(
s3_uri=f"file://{local_processing_dir}/train",
local_path="/opt/ml/processing/output/train",
s3_upload_mode="EndOfJob")
)
]
# Create processor with LocalSession
processor = FrameworkProcessor(
image_uri=processor_image_uri,
role="arn:aws:iam::123456789012:role/DummyRole",
instance_type="local",
instance_count=1,
sagemaker_session=local_session,
base_job_name='test-local-processing',
command=["python3"]
)
# Create a simple processing script
os.makedirs("processing_code", exist_ok=True)
with open("processing_code/test.py", "w") as f:
f.write("""
import os
with open('/opt/ml/processing/output/train/output.txt', 'w') as f:
f.write('test output')
print('Processing complete')
""")
# Run processor
processor.run(
code="test.py",
source_dir="./processing_code",
outputs=local_processing_outputs,
wait=False,
logs=True
)
# Check where outputs went
print(f"Expected output location: {local_processing_dir}/train/output.txt")
print(f"File exists locally: {os.path.exists(f'{local_processing_dir}/train/output.txt')}")Expected behavior
When using LocalSession with file:// URIs in ProcessingOutput, the outputs should be saved to the specified local directories, not uploaded to S3.
Screenshots or logs
N/A
System information
- SageMaker Python SDK version: 3.4.0 (sagemaker-core 2.4.0)
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): scikit-learn
- Framework version: 1.4-2
- Python version: 3.10
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
N/A