- 
                Notifications
    
You must be signed in to change notification settings  - Fork 12.9k
 
Open
Description
I am kindly asking for clarification in some points regarding Chapter 2.
- 
Why do we need to introduce the random seed? And if it is to have consistent train/test sets over multiple runs, then why do we need to have multiple runs.
 - 
If using the hash function will keep the test set consistent, can new instances be included into the test set as the hash value of its id satisfies the condition crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32?
 - 
What is the point to use stratified sampling in the first place.
 - 
Why cant we just use the normal train_test_split method instead of StratifiedShuffleSplit?
 
Thank you for your kindness and your time.
Metadata
Metadata
Assignees
Labels
No labels