what's the meaning of the naming convention for TFRecord files

Hi, long time no see. I encountered some confusion while studying your cool work.

When I was running the program[Train a SegCLR embedding model](https://colab.research.google.com/gist/chinasaur/d7139adf9fbb8df3f55a00afde2c32fe/segclr-embedding-training.ipynb)  in  
[SegCLR wiki](https://github.com/google-research/connectomics/wiki/SegCLR)，
I ran the following code:
```python
tfrecord_pathspec = 'gs://h01-release/data/20230118/training_data/c3_positive_pairs/goog14c3_max200000_skip50.tfrecord@1000'
tfrecord_files = data_input_util.expand_pathspec(tfrecord_pathspec)
```
The above statement indicates that the program will load the training samples from Google Cloud [h01-release](https://console.cloud.google.com/storage/browser/h01-release/data/20230118/training_data/c3_positive_pairs?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false). 

Perhaps as a beginner in TensorFlow2, I don't quite understand the meaning of the sample file name `goog14c3_max200000_skip50.tfrecord-00000-of-01000` in Google Cloud Storage. For example, what does `max200000` mean and what does `skip50` signify? It seems that `0000-of-01000` indicates that this is the `0th` file out of `1000` samples, because there are a total of 1000 TFRecord files in that directory. I was surprised to find that they all seem to be around `1.7G` in size. Does that mean that each TFRecord file represents randomly sampled pair information from a segment? So in `SegCLR`, a total of 1000 segments were collected from H01, and the number of pairs sampled from each segment was the same, resulting in the basic consistency of each TFRecord's size.


There may be some misunderstandings in my understanding. Please help me identify them. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what's the meaning of the naming convention for TFRecord files #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what's the meaning of the naming convention for TFRecord files #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions