Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
7aa699e
Optimize std::vector allocations with reserve()
mateusznowakTT Feb 12, 2026
7b0807f
Add reserve() to matmul, all_gather, and groupnorm operations
mateusznowakTT Feb 12, 2026
eaf6b55
Replace list initialization with reserve+emplace for better performance
mateusznowakTT Feb 12, 2026
bea8552
Add reserve() to transpose and CCL operations
mateusznowakTT Feb 12, 2026
830af57
Fix: Correct reserve() call for std::set in groupnorm_mcast
mateusznowakTT Feb 12, 2026
238726c
Optimize vector allocations: permute, reshard, llama matmul, groupnorm
mateusznowakTT Feb 12, 2026
d07cc3a
Optimize vector allocations: rms_allgather and groupnorm_sharded
mateusznowakTT Feb 12, 2026
ad657df
Optimize vector allocations: sdpa, sort, reduce, adam, sliding_window
mateusznowakTT Feb 12, 2026
3b932fc
Optimize vector allocations across 20 files: ccl, moreh, concat, mesh…
mateusznowakTT Feb 12, 2026
c74e7f5
Optimize vector allocations: unary, chunk, fold, ccl, flatbuffer, gro…
mateusznowakTT Feb 13, 2026
c94891b
Optimize vector allocation in typecast program factory
mateusznowakTT Feb 13, 2026
a539e96
Optimize vector allocations: device, allocator, kernel, profiler, pro…
mateusznowakTT Feb 13, 2026
482794a
Further std::vector optimizations
mateusznowakTT Feb 13, 2026
317b830
Optimize vector allocations: transpose_hc_sharded, binary_backward
mateusznowakTT Feb 13, 2026
f9fe5bf
Optimize vector allocations: unary_backward reserve(1) for 66 grad_te…
mateusznowakTT Feb 13, 2026
5e3e596
Use emplace_back instead of push_back in case of inline object creation
mateusznowakTT Feb 13, 2026
4147da3
Squash intermediate variables into direct emplace_back calls
mateusznowakTT Feb 13, 2026
e7bd491
Add vector_init helper and example use in binary_backward
mateusznowakTT Feb 13, 2026
554df65
Convert unrolled initializer lists to vector_init across 4 files
mateusznowakTT Feb 13, 2026
3e0968e
Optimize vector allocations: reserve and emplace_back across 17 files
mateusznowakTT Feb 13, 2026
f96a179
Add reserve() to mesh_device_view: devices_in_region and neighbors
mateusznowakTT Feb 13, 2026
313128d
Fix vector_init
mateusznowakTT Feb 13, 2026
93c9cc1
fix
mateusznowakTT Feb 13, 2026
6c8d1e6
Merge remote-tracking branch 'origin/main' into mateusznowakTT/optimi…
mateusznowakTT Feb 13, 2026
5e5a2ab
Add reserve() to vectors in recently merged code
mateusznowakTT Feb 13, 2026
2b5b60a
fix
mateusznowakTT Feb 14, 2026
898dc27
Allow vector_init to preallocate more if needed
mateusznowakTT Feb 14, 2026
1bd9f99
Replace reserve+emplace patterns with vector_init
mateusznowakTT Feb 14, 2026
74bf0ab
Replace push_back series with vector_init in groupnorm and rms_allgather
mateusznowakTT Feb 14, 2026
3a8a8ca
improve
mateusznowakTT Feb 15, 2026
a45d7d0
Use vector_init<T, N> with compile-time overallocation in sdpa_decode
mateusznowakTT Feb 15, 2026
4527bb8
Move vector_init to tt_stl with ttsl namespace and runtime reserve ov…
mateusznowakTT Feb 15, 2026
c232b59
Apply vector_init across tt_metal/ and ttnn/ runtime patterns
mateusznowakTT Feb 15, 2026
6cc709f
Use vector_init in sdpa, sdpa_decode, and conv2d width-sharded factories
mateusznowakTT Feb 15, 2026
0fc44c6
Optimize vector in ring_joint_sdpa_factory
mateusznowakTT Feb 15, 2026
f30bb23
Apply vector_init to layernorm sharded factory helper functions
mateusznowakTT Feb 15, 2026
5ab1ade
Add reserve() to fused-op signaler append-by-reference functions
mateusznowakTT Feb 15, 2026
7878cfe
Apply vector_init to CCL and matmul program factories
mateusznowakTT Feb 15, 2026
be88677
Apply vector_init to slice_rm, all_reduce_async, and llama_sharded fa…
mateusznowakTT Feb 15, 2026
d8b1296
Add reserve() to small fused-op signalers and optimize dram_sharded s…
mateusznowakTT Feb 15, 2026
fee09e7
Improve
mateusznowakTT Feb 15, 2026
0ea887d
Add generic container_init backbone to vector_init.hpp
mateusznowakTT Feb 15, 2026
6cbab50
move requires to single place
mateusznowakTT Feb 15, 2026
d8bbaa8
Add small_vector_init convenience wrappers to vector_init.hpp
mateusznowakTT Feb 16, 2026
ceb2008
Apply small_vector_init to CCL SubDeviceId single-element patterns
mateusznowakTT Feb 16, 2026
ac5355e
Convert dimension arrays to SmallVector in slice/transpose operations
mateusznowakTT Feb 16, 2026
1aaa56f
Convert additional local vectors to SmallVector in slice_write and CC…
mateusznowakTT Feb 16, 2026
3436e8b
fix
mateusznowakTT Feb 16, 2026
b32cc34
Merge remote-tracking branch 'origin/main' into mateusznowakTT/optimi…
mateusznowakTT Feb 16, 2026
2e8e234
Apply vector_init optimizations to softmax program factories
mateusznowakTT Feb 16, 2026
13a0129
Apply vector_init optimizations to 9 normalization files
mateusznowakTT Feb 16, 2026
1a0459c
Apply vector_init optimizations to CCL program factories
mateusznowakTT Feb 16, 2026
003544f
Revert empty vector_init() to std::vector for clarity
mateusznowakTT Feb 16, 2026
441b4bf
optimize
mateusznowakTT Feb 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions tt_metal/common/core_coord.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// SPDX-License-Identifier: Apache-2.0

#include <tt_stl/assert.hpp>
#include <tt_stl/vector_init.hpp>
#include <core_coord.hpp>
#include <nlohmann/json.hpp>
#include <tt_stl/reflection.hpp>
Expand Down Expand Up @@ -220,6 +221,7 @@ CoreRangeSet CoreRangeSet::merge(const T& other) const {
for (unsigned y = min_y; y <= max_y; y++) {
std::set<CoreRange> filter_set, tmp, new_crs;
std::vector<CoreRange> ranges;
ranges.reserve(max_x - min_x + 2);
for (unsigned x = min_x; x <= max_x + 1; x++) {
if (grid[y][x]) {
unsigned x_start = x;
Expand Down Expand Up @@ -286,6 +288,7 @@ bool CoreRangeSet::intersects(const CoreRangeSet& other) const {

CoreRangeSet CoreRangeSet::intersection(const CoreRangeSet& other) const {
std::vector<CoreRange> intersection;
intersection.reserve(std::min(this->ranges_.size(), other.ranges().size()));
for (const auto& local_cr : this->ranges_) {
for (const auto& other_cr : other.ranges()) {
if (auto intersect = local_cr.intersection(other_cr); intersect.has_value()) {
Expand Down Expand Up @@ -420,12 +423,14 @@ CoreRangeSet CoreRangeSet::subtract(const CoreRangeSet& other) const {
}

std::vector<CoreRange> result_ranges;
result_ranges.reserve(this_merged.ranges_.size() * 2);

for (const auto& current_range : this_merged.ranges_) {
std::vector<CoreRange> current_remaining = {current_range};
auto current_remaining = ttsl::vector_init<CoreRange>(current_range);

for (const auto& subtract_range : other_merged.ranges_) {
std::vector<CoreRange> new_remaining;
new_remaining.reserve(current_remaining.size() * 4);

for (const auto& remaining : current_remaining) {
auto intersection_opt = remaining.intersection(subtract_range);
Expand All @@ -437,33 +442,29 @@ CoreRangeSet CoreRangeSet::subtract(const CoreRangeSet& other) const {
const CoreRange& intersection = intersection_opt.value();

if (remaining.start_coord.x < intersection.start_coord.x) {
CoreRange left{
remaining.start_coord, CoreCoord{intersection.start_coord.x - 1, remaining.end_coord.y}};
new_remaining.push_back(left);
new_remaining.emplace_back(
remaining.start_coord, CoreCoord{intersection.start_coord.x - 1, remaining.end_coord.y});
}

if (remaining.end_coord.x > intersection.end_coord.x) {
CoreRange right{
CoreCoord{intersection.end_coord.x + 1, remaining.start_coord.y}, remaining.end_coord};
new_remaining.push_back(right);
new_remaining.emplace_back(
CoreCoord{intersection.end_coord.x + 1, remaining.start_coord.y}, remaining.end_coord);
}

if (remaining.start_coord.y < intersection.start_coord.y) {
CoreRange bottom{
new_remaining.emplace_back(
CoreCoord{
std::max(remaining.start_coord.x, intersection.start_coord.x), remaining.start_coord.y},
CoreCoord{
std::min(remaining.end_coord.x, intersection.end_coord.x), intersection.start_coord.y - 1}};
new_remaining.push_back(bottom);
std::min(remaining.end_coord.x, intersection.end_coord.x), intersection.start_coord.y - 1});
}

if (remaining.end_coord.y > intersection.end_coord.y) {
CoreRange top{
new_remaining.emplace_back(
CoreCoord{
std::max(remaining.start_coord.x, intersection.start_coord.x),
intersection.end_coord.y + 1},
CoreCoord{std::min(remaining.end_coord.x, intersection.end_coord.x), remaining.end_coord.y}};
new_remaining.push_back(top);
CoreCoord{std::min(remaining.end_coord.x, intersection.end_coord.x), remaining.end_coord.y});
}
}
current_remaining = new_remaining;
Expand Down Expand Up @@ -624,6 +625,7 @@ CoreRangeSet select_from_corerangeset(
const CoreRangeSet& crs, uint32_t start_index, uint32_t end_index, bool row_wise) {
auto all_cores = corerange_to_cores(crs, end_index + 1, row_wise);
std::vector<CoreRange> selected_cores;
selected_cores.reserve(end_index - start_index + 1);
for (uint32_t i = start_index; i <= end_index; i++) {
selected_cores.push_back(CoreRange(all_cores[i], all_cores[i]));
}
Expand Down
16 changes: 16 additions & 0 deletions tt_metal/common/mesh_coord.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ namespace {
// Returns the last valid coordinate for the provided `shape`.
MeshCoordinate shape_back(const MeshShape& shape) {
tt::stl::SmallVector<uint32_t> coords;
coords.reserve(shape.dims());
for (int i = 0; i < shape.dims(); i++) {
coords.push_back(shape[i] - 1);
}
Expand All @@ -38,6 +39,7 @@ std::vector<size_t> find_diff_dimensions(const MeshCoordinateRange& a, const Mes
TT_ASSERT(a.dims() == b.dims(), "Cannot compare ranges with different dimensions: {} != {}", a.dims(), b.dims());

std::vector<size_t> diff_dims;
diff_dims.reserve(a.dims());
for (size_t i = 0; i < a.dims(); ++i) {
if (a.start_coord()[i] != b.start_coord()[i] || a.end_coord()[i] != b.end_coord()[i]) {
diff_dims.push_back(i);
Expand Down Expand Up @@ -263,6 +265,7 @@ MeshCoordinate::BoundaryMode MeshCoordinateRange::get_boundary_mode() const {

MeshShape MeshCoordinateRange::shape() const {
tt::stl::SmallVector<uint32_t> shape_dims;
shape_dims.reserve(dims());
if (!wraparound_shape_.has_value()) {
for (size_t i = 0; i < dims(); ++i) {
shape_dims.push_back(end_[i] - start_[i] + 1);
Expand Down Expand Up @@ -485,6 +488,8 @@ void MeshCoordinateRangeSet::merge(const MeshCoordinateRange& to_merge) {
// Can replace `it` + `merged` with a single new range.
tt::stl::SmallVector<uint32_t> new_start;
tt::stl::SmallVector<uint32_t> new_end;
new_start.reserve(merged.dims());
new_end.reserve(merged.dims());
for (size_t i = 0; i < merged.dims(); ++i) {
new_start.push_back(std::min(merged.start_coord()[i], it->start_coord()[i]));
new_end.push_back(std::max(merged.end_coord()[i], it->end_coord()[i]));
Expand Down Expand Up @@ -530,6 +535,8 @@ void MeshCoordinateRangeSet::merge(const MeshCoordinateRange& to_merge) {
// Calculate the bounding box of all ranges
tt::stl::SmallVector<uint32_t> bb_start;
tt::stl::SmallVector<uint32_t> bb_end;
bb_start.reserve(ranges_[0].dims());
bb_end.reserve(ranges_[0].dims());

for (size_t dim = 0; dim < ranges_[0].dims(); ++dim) {
uint32_t min_start = ranges_[0].start_coord()[dim];
Expand Down Expand Up @@ -592,6 +599,8 @@ MeshCoordinateRangeSet subtract(const MeshCoordinateRange& parent, const MeshCoo
if (parent.start_coord()[diff_dim] < intersection.start_coord()[diff_dim]) {
tt::stl::SmallVector<uint32_t> left_start;
tt::stl::SmallVector<uint32_t> left_end;
left_start.reserve(parent.dims());
left_end.reserve(parent.dims());
for (size_t i = 0; i < parent.dims(); ++i) {
if (i == diff_dim) {
left_start.push_back(parent.start_coord()[i]);
Expand All @@ -608,6 +617,8 @@ MeshCoordinateRangeSet subtract(const MeshCoordinateRange& parent, const MeshCoo
if (intersection.end_coord()[diff_dim] < parent.end_coord()[diff_dim]) {
tt::stl::SmallVector<uint32_t> right_start;
tt::stl::SmallVector<uint32_t> right_end;
right_start.reserve(parent.dims());
right_end.reserve(parent.dims());
for (size_t i = 0; i < parent.dims(); ++i) {
if (i == diff_dim) {
right_start.push_back(intersection.end_coord()[i] + 1);
Expand All @@ -632,6 +643,11 @@ MeshCoordinateRangeSet subtract(const MeshCoordinateRange& parent, const MeshCoo

std::vector<MeshCoordinate> MeshCoordinateRangeSet::coords() const {
std::vector<MeshCoordinate> coords;
size_t total_size = 0;
for (const auto& range : ranges_) {
total_size += range.shape().mesh_size();
}
coords.reserve(total_size);
for (const auto& range : ranges_) {
for (const auto& coord : range) {
coords.push_back(coord);
Expand Down
1 change: 1 addition & 0 deletions tt_metal/distributed/mesh_command_queue_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ void MeshCommandQueueBase::enqueue_write(
auto lock = lock_api_function_();
// Iterate over global coordinates; skip host-remote coordinates, as per `host_buffer` configuration.
std::vector<distributed::ShardDataTransfer> shard_data_transfers;
shard_data_transfers.reserve(host_buffer.shard_coords().size());
for (const auto& host_buffer_coord : host_buffer.shard_coords()) {
auto buf = host_buffer.get_shard(host_buffer_coord);
if (buf.has_value()) {
Expand Down
6 changes: 6 additions & 0 deletions tt_metal/distributed/mesh_device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ std::shared_ptr<MeshDevice> MeshDeviceImpl::create(
std::vector<tt::tt_fabric::FabricNodeId> fabric_node_ids;
TT_FATAL(config.mesh_shape().has_value(), "Mesh shape must be provided when physical device ids are supplied");
const auto& supplied_ids = config.physical_device_ids();
fabric_node_ids.reserve(supplied_ids.size());
for (int supplied_id : supplied_ids) {
auto fabric_node_id =
MetalContext::instance().get_control_plane().get_fabric_node_id_from_physical_chip_id(supplied_id);
Expand Down Expand Up @@ -525,6 +526,9 @@ std::shared_ptr<MeshDevice> MeshDeviceImpl::create_submesh(
std::vector<MaybeRemote<IDevice*>> submesh_devices;
std::vector<tt::tt_fabric::FabricNodeId> submesh_fabric_node_ids;
const MeshCoordinateRange submesh_range(offset_coord, end_coordinate);
const auto submesh_size = submesh_range.shape().mesh_size();
submesh_devices.reserve(submesh_size);
submesh_fabric_node_ids.reserve(submesh_size);
for (const auto& coord : submesh_range) {
if (view_->impl().is_local(coord)) {
submesh_devices.push_back(MaybeRemote<IDevice*>::local(view_->impl().get_device(coord)));
Expand Down Expand Up @@ -586,6 +590,7 @@ std::vector<std::shared_ptr<MeshDevice>> MeshDeviceImpl::create_submeshes(

// Stamp `submesh_shape` along each dimension, `steps` number of times.
std::vector<std::shared_ptr<MeshDevice>> submeshes;
submeshes.reserve(MeshCoordinateRange(MeshShape(steps)).shape().mesh_size());
for (const auto& step_position : MeshCoordinateRange(MeshShape(steps))) {
tt::stl::SmallVector<uint32_t> offset_coords;
for (size_t dim = 0; dim < submesh_shape.dims(); dim++) {
Expand Down Expand Up @@ -636,6 +641,7 @@ MeshCommandQueue& MeshDeviceImpl::mesh_command_queue(std::optional<uint8_t> cq_i

DeviceIds MeshDeviceImpl::get_device_ids() const {
DeviceIds device_ids;
device_ids.reserve(this->num_devices());
for (auto* device : this->get_devices()) {
device_ids.push_back(device->id());
}
Expand Down
3 changes: 3 additions & 0 deletions tt_metal/distributed/mesh_device_view.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ namespace {
std::vector<IDevice*> get_devices_from_coordinates(
const MeshDeviceViewImpl& mesh, const std::vector<MeshCoordinate>& coords) {
std::vector<IDevice*> devices;
devices.reserve(coords.size());
for (const auto& coord : coords) {
if (auto* device = mesh.get_device(coord)) {
devices.push_back(device);
Expand Down Expand Up @@ -91,6 +92,7 @@ MeshDeviceViewImpl::MeshDeviceViewImpl(

std::vector<IDevice*> MeshDeviceViewImpl::get_devices(const MeshCoordinateRange& range) const {
std::vector<IDevice*> devices_in_region;
devices_in_region.reserve(range.shape().mesh_size());
for (const auto& coord : range) {
devices_.at(coord).if_local([&devices_in_region](const auto& device) { devices_in_region.push_back(device); });
}
Expand Down Expand Up @@ -256,6 +258,7 @@ std::vector<MeshCoordinate> MeshDeviceViewImpl::get_line_coordinates(
// Lambda to get valid neighbors (not checking visited - that's done in DFS)
auto get_neighbors = [&](const MeshCoordinate& coord) -> std::vector<MeshCoordinate> {
std::vector<MeshCoordinate> neighbors;
neighbors.reserve(4);
const size_t row = coord[0];
const size_t col = coord[1];

Expand Down
2 changes: 1 addition & 1 deletion tt_metal/distributed/system_mesh.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ MeshContainer<MappedDevice> initialize_mapped_devices(const tt::tt_fabric::MeshI
std::vector<MappedDevice> system_mesh_devices;
system_mesh_devices.reserve(shape.mesh_size());
for (int linear_index = 0; linear_index < shape.mesh_size(); ++linear_index) {
system_mesh_devices.push_back(MappedDevice{
system_mesh_devices.emplace_back(MappedDevice{
.device_id = MaybeRemote<int>::remote(),
.fabric_node_id = tt::tt_fabric::FabricNodeId(mesh_id, linear_index)});
}
Expand Down
12 changes: 7 additions & 5 deletions tt_metal/fabric/control_plane.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include <vector>
#include <yaml-cpp/yaml.h>
#include <tt_stl/assert.hpp>
#include <tt_stl/vector_init.hpp>

#include <tt-metalium/experimental/fabric/control_plane.hpp>
#include "core_coord.hpp"
Expand Down Expand Up @@ -87,13 +88,14 @@ std::vector<std::pair<FabricNodeId, std::vector<AsicPosition>>> get_galaxy_fixed
std::vector<std::pair<FabricNodeId, std::vector<AsicPosition>>> fixed_asic_position_pinnings;

// Get all 4 possible corners ASIC positions
std::vector<AsicPosition> corner_asic_positions;
corner_asic_positions.emplace_back(AsicPosition{1, 1}); // Top left corner
corner_asic_positions.emplace_back(AsicPosition{2, 1}); // Top right corner
corner_asic_positions.emplace_back(AsicPosition{3, 1}); // Bottom left corner
corner_asic_positions.emplace_back(AsicPosition{4, 1}); // Bottom right corner
auto corner_asic_positions = ttsl::vector_init<AsicPosition>(
AsicPosition{1, 1}, // Top left corner
AsicPosition{2, 1}, // Top right corner
AsicPosition{3, 1}, // Bottom left corner
AsicPosition{4, 1}); // Bottom right corner

std::vector<FabricNodeId> corner_fabric_node_ids;
corner_fabric_node_ids.reserve(4 * mesh_graph.get_all_mesh_ids().size());
for (const auto& mesh_id : mesh_graph.get_all_mesh_ids()) {
const auto& mesh_shape = mesh_graph.get_mesh_shape(mesh_id);
corner_fabric_node_ids.emplace_back(FabricNodeId{mesh_id, 0});
Expand Down
1 change: 1 addition & 0 deletions tt_metal/fabric/topology_mapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -693,6 +693,7 @@ void TopologyMapper::populate_fabric_node_id_to_asic_id_mappings(

// Log pinnings
std::vector<std::string> pinning_strs;
pinning_strs.reserve(mesh_pinnings.size());
for (const auto& [fabric_node, positions] : mesh_pinnings) {
if (positions.size() == 1) {
pinning_strs.push_back(fmt::format(
Expand Down
2 changes: 2 additions & 0 deletions tt_metal/impl/allocator/l1_banking_allocator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ void AllocatorImpl::init_compute_and_storage_l1_bank_manager() {

// Define the bank assignment here.
std::vector<uint32_t> shuffled_bank_id = {};
shuffled_bank_id.reserve(num_l1_banks);
if (not config_->l1_bank_remap.empty()) {
TT_ASSERT(
num_l1_banks == config_->l1_bank_remap.size(),
Expand Down Expand Up @@ -214,6 +215,7 @@ AllocatorConfig L1BankingAllocator::generate_config(
"Reserved size must be aligned to L1 allocator alignment {}",
config.l1_alignment);
// Initialize dram_offsets from soc_descriptor
config.dram_bank_offsets.reserve(soc_desc.get_num_dram_views());
for (auto channel = 0; channel < soc_desc.get_num_dram_views(); channel++) {
config.dram_bank_offsets.push_back(soc_desc.get_address_offset(channel));
}
Expand Down
7 changes: 4 additions & 3 deletions tt_metal/impl/buffers/buffer_page_mapping.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ namespace CMAKE_UNIQUE_NAMESPACE {
std::vector<BufferCorePageMapping::ContiguousHostPages> to_host_page_ranges(
tt::stl::Span<const uint32_t> host_page_indices) {
std::vector<BufferCorePageMapping::ContiguousHostPages> result;
result.reserve(host_page_indices.size());

uint32_t start_host_page_idx = 0;
bool is_processing_range = false;

auto add_page = [&](uint32_t end_host_page_idx) {
uint32_t start_host_page = host_page_indices[start_host_page_idx];
uint32_t end_host_page = host_page_indices[end_host_page_idx - 1];
result.push_back({
result.emplace_back(BufferCorePageMapping::ContiguousHostPages{
.device_page_offset = start_host_page_idx,
.host_page_start = start_host_page,
.num_pages = end_host_page - start_host_page + 1,
Expand Down Expand Up @@ -70,7 +71,7 @@ BufferPageMapping::BufferPageMapping(const UncompressedBufferPageMapping& page_m
if (host_ranges.empty()) {
continue;
}
core_page_mapping.push_back(BufferCorePageMapping{
core_page_mapping.emplace_back(BufferCorePageMapping{
.device_start_page = 0,
.num_pages = static_cast<uint32_t>(core_host_page_indices.size()),
.host_ranges = std::move(host_ranges),
Expand Down Expand Up @@ -123,7 +124,7 @@ BufferPageMapping BufferPageMapping::filter_by_host_range(uint32_t start_host_pa
continue;
}

result_core_mapping.host_ranges.push_back({
result_core_mapping.host_ranges.emplace_back(BufferCorePageMapping::ContiguousHostPages{
.device_page_offset = host_range.device_page_offset + host_range_start - host_range.host_page_start,
.host_page_start = host_range_start - start_host_page,
.num_pages = host_range_end - host_range_start,
Expand Down
1 change: 1 addition & 0 deletions tt_metal/impl/context/metal_context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1729,6 +1729,7 @@ dev_msgs::core_info_msg_t MetalContext::populate_core_info_msg(

// Determine which noc-coords are harvested
std::vector<uint32_t> harvested_axis_coord;
harvested_axis_coord.reserve(core_info.harvested_coords().size());
CoreCoord logical_grid_size = cluster_->get_soc_desc(device_id).get_grid_size(CoreType::TENSIX);
uint32_t harvested_noc_coords = umd::CoordinateManager::shuffle_tensix_harvesting_mask_to_noc0_coords(
cluster_->get_soc_desc(device_id).arch, cluster_->get_harvesting_mask(device_id));
Expand Down
1 change: 1 addition & 0 deletions tt_metal/impl/data_format/bfloat16.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ bfloat16 bfloat16_identity_transform(const bfloat16& input) { return input; }
std::vector<bfloat16> unpack_uint32_vec_into_bfloat16_vec(
const std::vector<std::uint32_t>& data, const std::function<bfloat16(const bfloat16&)>& transform) {
std::vector<bfloat16> result;
result.reserve(data.size() * 2);
for (unsigned int packed : data) {
auto unpacked = unpack_two_bfloat16_from_uint32(packed);
result.push_back(transform(unpacked.first));
Expand Down
4 changes: 3 additions & 1 deletion tt_metal/impl/device/device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -794,6 +794,7 @@ std::vector<CoreCoord> Device::get_optimal_dram_bank_to_logical_worker_assignmen
}
// Get all logical cores in the worker grid
std::vector<CoreCoord> all_worker_cores_logical;
all_worker_cores_logical.reserve(num_cores_x * num_cores_y);
for (int i = 0; i < num_cores_x; ++i) {
for (int j = 0; j < num_cores_y; ++j) {
all_worker_cores_logical.push_back(CoreCoord(i, j));
Expand All @@ -805,7 +806,8 @@ std::vector<CoreCoord> Device::get_optimal_dram_bank_to_logical_worker_assignmen
auto core_phy = this->physical_worker_core_from_logical_core(CoreCoord(0, i));
worker_phy_y.at(i) = core_phy.y;
}
std::vector<uint32_t> worker_phy_x = std::vector<uint32_t>(num_cores_x);
std::vector<uint32_t> worker_phy_x;
worker_phy_x.reserve(num_cores_x);
for (int i = 0; i < num_cores_x; ++i) {
auto core_phy = this->physical_worker_core_from_logical_core(CoreCoord(i, 0));
worker_phy_x.push_back(core_phy.x);
Expand Down
Loading
Loading