Skip to content

Use std::align_alloc in file_data_loader #10660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented May 2, 2025

Summary:
Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer to 16, we overflow the original buffer (as it wasn't padded) and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)

The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Differential Revision: D74041198

Copy link

pytorch-bot bot commented May 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10660

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit f4391a7 with merge base be2dda7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 2, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@lucylq lucylq changed the title Fix alignment in file_data_loader Use std::align_alloc in file_data_loader May 2, 2025
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from e3b1227 to 96bf3d8 Compare May 2, 2025 17:54
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some tests?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10660

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 96bf3d8 to 89b0d90 Compare May 2, 2025 17:58
@lucylq
Copy link
Contributor Author

lucylq commented May 2, 2025

Is it possible to add some tests?

alignment is covered by the file_data_loader_tests:

EXPECT_ALIGNED(fb->data(), alignment());

I think we shouldn't have this error now that we've moved to aligned_alloc (hopefully), probably the main thing is to make sure oss ci passes on macos.

Are you thinking of a different test though?

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 89b0d90 to 2654ddc Compare May 2, 2025 18:20
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 2654ddc to bcab1f2 Compare May 3, 2025 00:04
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from bcab1f2 to eb2794b Compare May 3, 2025 00:19
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:
Pull Request resolved: pytorch#10660

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from eb2794b to 3ca3e07 Compare May 3, 2025 00:23
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 3ca3e07 to 049a032 Compare May 3, 2025 01:32
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@lucylq lucylq force-pushed the export-D74041198 branch from 049a032 to 00fea9e Compare May 5, 2025 19:40
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 00fea9e to acb9e6d Compare May 5, 2025 23:40
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from acb9e6d to f6779ca Compare May 5, 2025 23:46
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

NOTE: this increase binary size (linux, clang) by 8 bytes, so also raising it there.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from f6779ca to f4391a7 Compare May 6, 2025 00:31
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants