Skip to content

Commit ed10380

Browse files
committed
feat(Tensor): add operations for getting min/max values, finding element positions and setting values at specific positions
- Added methods to the Tensor class to retrieve the minimum and maximum values of the tensor. - Implemented functionality to find the positions of specific elements within the tensor. - Added the ability to set values at specific positions in the tensor. - Conducted basic testing to ensure the correctness of these new operations.
1 parent bf8dac8 commit ed10380

3 files changed

Lines changed: 363 additions & 24 deletions

File tree

include/NeuZephyr/Tensor.cuh

Lines changed: 53 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -823,38 +823,59 @@ namespace nz::data {
823823
void transpose();
824824

825825
/**
826-
* @brief Sets a specific element of the tensor's data to a given value.
826+
* @brief Sets the value of an element in the tensor or its gradient at a specified position.
827827
*
828-
* This function modifies a specific element of the tensor's data stored in GPU memory.
829-
* The element to be modified is specified by its position in the tensor's shape (given as a 2D index).
830-
* The function first copies the tensor's data from GPU memory to host memory, modifies the specified element,
831-
* and then copies the updated data back to the GPU memory.
828+
* This member function allows you to set the value of a specific element in the tensor or its gradient.
829+
* It first validates the position and the gradient setting based on the tensor's requirements.
832830
*
833-
* @param position A `shape_type` (alias for `std::vector<int>`) representing the 2D index (row, column)
834-
* of the element to modify.
835-
* @param value The value to which the specified element will be set.
831+
* @param position The position in the tensor where the value will be set. Memory location: host - to - device.
832+
* @param value The value to be set at the specified position. Memory location: host - to - device.
833+
* @param isGrad A boolean indicating whether to set the value in the gradient or the tensor data. Memory location: host - to - device.
836834
*
837-
* This function performs the following steps:
838-
* 1. It checks if the provided position is valid within the tensor's shape. If not, an exception is thrown.
839-
* 2. It copies the tensor's data from GPU memory to host memory using `cudaMemcpy`.
840-
* 3. It modifies the specified element at the given position in the tensor's data.
841-
* 4. It copies the updated data back to the GPU memory.
835+
* @return None
836+
*
837+
* **Memory Management Strategy**:
838+
* - A temporary array `data` of size `_size` is allocated on the host using `malloc`.
839+
* - The data from the device (either tensor data or gradient) is copied to the host using `cuStrm::StreamManager<value_type>::Instance().memcpy`.
840+
* - After the value is set at the specified position in the host - side data, the updated data is copied back to the device.
841+
* - The temporary array `data` is freed using `free` to avoid memory leaks.
842+
*
843+
* **Exception Handling Mechanism**:
844+
* - Throws `std::invalid_argument` if the `position` is out of bounds of the tensor's shape.
845+
* - Throws `std::invalid_argument` if `isGrad` is `true` but the tensor does not require gradients.
846+
* - If any of the `cuStrm::StreamManager` operations fail, it may lead to undefined behavior as error - checking is not explicitly done in this function.
842847
*
843-
* @throws std::invalid_argument If the provided position is out of bounds.
848+
* **Relationship with Other Components**:
849+
* - Depends on `cuStrm::StreamManager<value_type>::Instance()` for memory copying and data synchronization operations.
850+
* - Relies on the `_shape` member variable to validate the position and calculate the index in the data array.
851+
* - Uses the `_data` and `_grad` member variables to access the tensor data and its gradient.
852+
*
853+
* @throws std::invalid_argument When the position is out of bounds or when trying to set the gradient of a tensor that does not require gradients.
844854
*
845855
* @note
846-
* - This function uses memory copying between host and device, which can introduce performance overhead.
847-
* - The tensor's data is modified on the host first and then copied back to the GPU. This approach may not be
848-
* the most efficient for large tensors or frequent updates.
856+
* - The time complexity of this function is O(n) due to the memory copying operations, where n is the number of elements in the tensor (`_size`).
857+
* - Ensure that the CUDA runtime environment is properly initialized and the device memory is valid before calling this function.
858+
* - Ensure that the `position` is within the valid range of the tensor's shape to avoid exceptions.
859+
* - If setting the gradient, ensure that the tensor requires gradients.
860+
*
861+
* @warning
862+
* - If any of the `cuStrm::StreamManager` operations fail, the behavior of this function is undefined.
849863
*
850864
* @code
851865
* ```cpp
852-
* Tensor tensor({2, 3}); // Create a tensor with shape 2x3
853-
* tensor.setData(std::vector<int>({1, 2}), 7.5f); // Set the element at position (1, 2) to 7.5f
866+
* Tensor tensor;
867+
* Tensor::shape_type position = {0, 0, 0, 0};
868+
* Tensor::value_type value = 1.0;
869+
* bool isGrad = false;
870+
* try {
871+
* tensor.setData(position, value, isGrad);
872+
* } catch (const std::invalid_argument& e) {
873+
* std::cerr << e.what() << std::endl;
874+
* }
854875
* ```
855876
* @endcode
856877
*/
857-
void setData(const shape_type& position, value_type value) const;
878+
void setData(const shape_type& position, value_type value, bool isGrad = false) const;
858879

859880
/// @}
860881

@@ -1190,6 +1211,18 @@ namespace nz::data {
11901211
*/
11911212
[[nodiscard]] value_type sum(size_type batch, size_type channel) const;
11921213

1214+
[[nodiscard]] value_type max() const;
1215+
1216+
[[nodiscard]] value_type max(size_type batch, size_type channel) const;
1217+
1218+
[[nodiscard]] value_type min() const;
1219+
1220+
[[nodiscard]] value_type min(size_type batch, size_type channel) const;
1221+
1222+
[[nodiscard]] shape_type find(value_type value) const;
1223+
1224+
[[nodiscard]] shape_type find(value_type value, size_type batch, size_type channel) const;
1225+
11931226
/**
11941227
* @brief Compute the sum of the exponential values of all elements in the Tensor.
11951228
*

src/Tensor.cu

Lines changed: 96 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -407,18 +407,24 @@ namespace nz::data {
407407
_shape.updateStride();
408408
}
409409

410-
void Tensor::setData(const shape_type& position, const value_type value) const {
410+
void Tensor::setData(const shape_type& position, const value_type value, const bool isGrad) const {
411411
if (position[0] >= _shape[0] || position[1] >= _shape[1] || position[2] >= _shape[2] || position[3] >= _shape[
412412
3]) {
413413
throw std::invalid_argument("Invalid position");
414414
}
415+
if (isGrad && !_requires_grad) {
416+
throw std::invalid_argument(
417+
"Gradient setting is not allowed for tensors that do not require gradients.");
418+
}
415419
auto* data = static_cast<value_type*>(malloc(_size * sizeof(value_type)));
416-
cuStrm::StreamManager<value_type>::Instance().memcpy(data, _data, _size * sizeof(value_type),
420+
cuStrm::StreamManager<value_type>::Instance().memcpy(data, isGrad ? _grad : _data, _size * sizeof(value_type),
417421
cudaMemcpyDeviceToHost);
422+
cuStrm::StreamManager<value_type>::Instance().syncData(data);
418423
data[position[0] * _shape.getStride(0) + position[1] * _shape.getStride(1) + position[2] * _shape.getStride(2) +
419424
position[3] * _shape.getStride(3)] = value;
420-
cuStrm::StreamManager<value_type>::Instance().memcpy(_data, data, _size * sizeof(value_type),
425+
cuStrm::StreamManager<value_type>::Instance().memcpy(isGrad ? _grad : _data, data, _size * sizeof(value_type),
421426
cudaMemcpyHostToDevice);
427+
cuStrm::StreamManager<value_type>::Instance().syncData(isGrad ? _grad : _data);
422428
free(data);
423429
}
424430

@@ -565,6 +571,7 @@ namespace nz::data {
565571
krnl::Summation(grid, block, block.x / WARP_SIZE * sizeof(float), dData, _data, _size);
566572
cuStrm::StreamManager<value_type>::Instance().memcpy(hData, dData, grid.x * sizeof(value_type),
567573
cudaMemcpyDeviceToHost);
574+
cuStrm::StreamManager<value_type>::Instance().syncData(hData);
568575
value_type result = 0;
569576
for (auto i = 0; i < grid.x; ++i) {
570577
result += hData[i];
@@ -588,6 +595,7 @@ namespace nz::data {
588595
krnl::Summation(grid, block, block.x / WARP_SIZE * sizeof(float), dData, _data, size, offset);
589596
cuStrm::StreamManager<value_type>::Instance().memcpy(hData, dData, grid.x * sizeof(value_type),
590597
cudaMemcpyDeviceToHost);
598+
cuStrm::StreamManager<value_type>::Instance().syncData(hData);
591599
value_type result = 0;
592600
for (auto i = 0; i < grid.x; ++i) {
593601
result += hData[i];
@@ -597,6 +605,89 @@ namespace nz::data {
597605
return result;
598606
}
599607

608+
Tensor::value_type Tensor::max() const {
609+
auto hData = hostData();
610+
value_type result = std::numeric_limits<value_type>::min();
611+
for (auto i = 0; i < _size; ++i) {
612+
if (hData[i] > result) {
613+
result = hData[i];
614+
}
615+
}
616+
return result;
617+
}
618+
619+
Tensor::value_type Tensor::max(const size_type batch, const size_type channel) const {
620+
if (batch >= _shape[0] || channel >= _shape[1]) {
621+
throw std::invalid_argument("Invalid position");
622+
}
623+
const auto offset = batch * _shape.getStride(0) + channel * _shape.getStride(1);
624+
auto hData = hostData();
625+
value_type result = std::numeric_limits<value_type>::min();
626+
for (auto i = 0; i < _shape[2] * _shape[3]; ++i) {
627+
if (hData[offset + i] > result) {
628+
result = hData[offset + i];
629+
}
630+
}
631+
return result;
632+
}
633+
634+
Tensor::value_type Tensor::min() const {
635+
auto hData = hostData();
636+
value_type result = std::numeric_limits<value_type>::max();
637+
for (auto i = 0; i < _size; ++i) {
638+
if (hData[i] < result) {
639+
result = hData[i];
640+
}
641+
}
642+
return result;
643+
}
644+
645+
Tensor::value_type Tensor::min(const size_type batch, const size_type channel) const {
646+
if (batch >= _shape[0] || channel >= _shape[1]) {
647+
throw std::invalid_argument("Invalid position");
648+
}
649+
const auto offset = batch * _shape.getStride(0) + channel * _shape.getStride(1);
650+
auto hData = hostData();
651+
value_type result = std::numeric_limits<value_type>::max();
652+
for (auto i = 0; i < _shape[2] * _shape[3]; ++i) {
653+
if (hData[offset + i] < result) {
654+
result = hData[offset + i];
655+
}
656+
}
657+
return result;
658+
}
659+
660+
Tensor::shape_type Tensor::find(const value_type value) const {
661+
auto hData = hostData();
662+
auto index = 0;
663+
for (auto i = 0; i < _size; ++i) {
664+
if (hData[i] == value) {
665+
index = i;
666+
break;
667+
}
668+
}
669+
auto n = index / (_shape[1] * _shape[2] * _shape[3]);
670+
auto c = (index % (_shape[1] * _shape[2] * _shape[3])) / (_shape[2] * _shape[3]);
671+
auto h = (index % (_shape[2] * _shape[3])) / _shape[3];
672+
auto w = index % _shape[3];
673+
return {n, c, h, w};
674+
}
675+
676+
Tensor::shape_type Tensor::find(value_type value, size_type batch, size_type channel) const {
677+
auto hData = hostData();
678+
auto index = 0;
679+
auto offset = batch * _shape.getStride(0) + channel * _shape.getStride(1);
680+
for (auto i = 0; i < _shape[2] * _shape[3]; ++i) {
681+
if (hData[offset + i] == value) {
682+
index = i;
683+
break;
684+
}
685+
}
686+
auto h = index / _shape[3];
687+
auto w = index % _shape[3];
688+
return {batch, channel, h, w};
689+
}
690+
600691
Tensor::value_type Tensor::expSum() const {
601692
const dim3 block(256);
602693
const dim3 grid((_size + block.x - 1) / block.x);
@@ -606,6 +697,7 @@ namespace nz::data {
606697
krnl::SummationExp(grid, block, block.x / WARP_SIZE * sizeof(float), dData, _data, _size);
607698
cuStrm::StreamManager<value_type>::Instance().memcpy(hData, dData, grid.x * sizeof(value_type),
608699
cudaMemcpyDeviceToHost);
700+
cuStrm::StreamManager<value_type>::Instance().syncData(hData);
609701
value_type result = 0;
610702
for (auto i = 0; i < grid.x; ++i) {
611703
result += hData[i];
@@ -629,6 +721,7 @@ namespace nz::data {
629721
krnl::SummationExp(grid, block, block.x / WARP_SIZE * sizeof(float), dData, _data, size, offset);
630722
cuStrm::StreamManager<value_type>::Instance().memcpy(hData, dData, grid.x * sizeof(value_type),
631723
cudaMemcpyDeviceToHost);
724+
cuStrm::StreamManager<value_type>::Instance().syncData(hData);
632725
value_type result = 0;
633726
for (auto i = 0; i < grid.x; ++i) {
634727
result += hData[i];

0 commit comments

Comments
 (0)