Potential bug in RelativePositionEmbedding

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [x] I checked to make sure that this issue has not been filed already.

## 1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/position_embedding.py

## 2. Describe the bug

https://github.com/tensorflow/models/blob/db50116d3f2502a96a86cb1ee34f75008bfb467c/official/nlp/modeling/layers/position_embedding.py#L161-L174

Why do we have this in line 167?
```bash
    inv_timescales = min_timescale * tf.exp(
        tf.cast(tf.range(num_timescales), tf.float32) *
        -log_timescale_increment)
```

The problem is that we multiply `min_timescale` instead of `1 / min_timescale` to obtain `inv_timescales`.
In this way, the final values before sin/cos function we have are:
```
pos * T_min * [T_r ** 0, T_r ** ( -1/dim_range), T_r ** ( -2/dim_range), ..., T_r ** ( -dim_range/dim_range)]
```
where `T_r = T_max / T_min` and `dim_range = num_timescales - 1`.
Notably, the largest timestep corresponds to an effective inverse timescale `T_max / (T_min * T_min)` and the smallest timestep uses `1 / T_min`.





## 3. Steps to reproduce

This is more like a math issue. No code execution is needed.

## 4. Expected behavior

I think the correct thing to have is:
```
pos / T_min * [T_r ** 0, T_r ** ( -1/dim_range), T_r ** ( -2/dim_range), ..., T_r ** ( -dim_range/dim_range)]
```
where the largest timestep corresponds to an inverse timescale `T_max` and the smallest timestep uses `T_min`.

These two implementations have no difference when `T_min=1.0`. The test function at https://github.com/tensorflow/models/blob/db50116d3f2502a96a86cb1ee34f75008bfb467c/official/nlp/modeling/layers/position_embedding_test.py#L164 is probabily not very helpful then.

## 5. Additional context

This is more like a math issue. No code execution is needed.

## 6. System information

This is more like a math issue. No code execution is needed.

- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Mobile device name if the issue happens on a mobile device:
- TensorFlow installed from (source or binary):
- TensorFlow version (use command below):
- Python version:
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:

	position = tf.cast(tf.range(length), tf.float32)
	num_timescales = self._hidden_size // 2
	min_timescale, max_timescale = self._min_timescale, self._max_timescale
	log_timescale_increment = (
	math.log(float(max_timescale) / float(min_timescale)) /
	(tf.cast(num_timescales, tf.float32) - 1))
	inv_timescales = min_timescale * tf.exp(
	tf.cast(tf.range(num_timescales), tf.float32) *
	-log_timescale_increment)
	scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(
	inv_timescales, 0)
	position_embeddings = tf.concat(
	[tf.sin(scaled_time), tf.cos(scaled_time)], axis=1)
	return position_embeddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential bug in RelativePositionEmbedding #10945

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential bug in RelativePositionEmbedding #10945

Description

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions