-
-
Notifications
You must be signed in to change notification settings - Fork 403
Add new practice exercise baffling-birthdays
#1575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for contributing to Based on the files changed in this PR, it would be good to pay attention to the following details when reviewing the PR:
Automated comment created by PR Commenter 🤖. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with the testing approach. It's good that you left comments for students that tell them how different sample sizes behave in terms of flaky tests.
The tests with your implementation are very fast, but I think some students might go too far with sample sizes and have problems with timeouts 🤔
~D[2019-02-12] | ||
] | ||
|
||
output = BafflingBirthdays.shared_birthday?(birthdates) == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and on LOC 50 are unused variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiegillet sorry, I should have been clearer - please make those expressions into assertions 😅 now it's:
warning: use of operator == has no effect
│
63 │ BafflingBirthdays.shared_birthday?(birthdates) == true
│ ~
│
└─ test/baffling_birthdays_test.exs:63:54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, don't apologize for me being an idiot 😆
exercises/practice/baffling-birthdays/test/baffling_birthdays_test.exs
Outdated
Show resolved
Hide resolved
|
||
# for a sample size of 100 and this delta, the assertion is expected to fail once in 100 runs | ||
# for a sample size of 600 and this delta, the assertion is expected to fail once in a billion runs | ||
delta = 0.83 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you choose the deltas in the last few tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this wolfram alpha app
- enter the sample proportion (
expected
) - enter the confidence level (0.99, which means one failure in 100)
- enter the sample size (100)
That gives you the 99% interval, meaning a random pick will fall in there 99% of the time. This interval goes from 0.03417 to 0.1997, which is 2 times the delta, that's how I got the value.
After that, I entered a confidence level of 0.999999999 (one failure in a billion) and tweaked the sample size until I found an interval that's fairly close to the confidence level of 0.99, that came out to being around 600 every time. It's not an exact calculation, but it should be close enough.
I used mix test --repeat-until-failure 100
with a sample of 100 to check if the solution would actually fail, and it does about half the time, which is expected, so it seems to work. I did not try a billion times with a sample of 600, but I ran it for a bit and never managed to make it fail 😆
I have one dark secret for this particular test though. If you try with a sample of 100, you will almost always fail the test, because with 100 sample, most of the time you will estimate 100 or 99, which is outside of the theoretical range (99.08 to 1). But I chose to ignore it, it's complicated to explain and it's unlikely that people will only pick a 100 sample size, and if they do, they'll see the comment and pick something bigger.
expected_count_standard_deviation = fn | ||
day when day <= 28 -> :math.sqrt(group_size * 12 / 365 * (1 - 12 / 365)) | ||
day when day <= 30 -> :math.sqrt(group_size * 11 / 365 * (1 - 11 / 365)) | ||
day when day == 31 -> :math.sqrt(group_size * 7 / 365 * (1 - 7 / 365)) | ||
end | ||
|
||
counts_outside_95_percent_confidence_interval = | ||
day_frequencies | ||
|> Enum.filter(fn {day, count} -> | ||
abs(count - expected_count.(day)) > 1.96 * expected_count_standard_deviation.(day) | ||
end) | ||
|> length() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there maybe a website that you could link in a comment in this code that could explain what's going on? My statistics knowledge is very rusty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a log going on, that's what I meant when I said that maybe I got carried away lol
There are many concepts involved:
- calculating the probabilities of events happening (how many times each day is found in a year), although that's basically counting
- calculating the mean and variance for a discrete uniform distribution, that's what those functions above are doing
- calculating the 95% confidence interval, using a normal distribution for simplicity
- calculating the probabilities that x out of y events will be outside that range, using binomial distributions
I was wondering if I should explain more each step, but it seems like too much, when solving the exercise is actually pretty easy.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... yes, that is way too much. I won't even pretend I understand 🙈
I see a few potential problems with this testing code:
- students might get confused by it (but in reality they should just read the test description and know what to do)
- confused students might ask mentors about this test code, and mentors won't know how to explain it (but so what? it's important to know how to say "I don't know" 😬)
- if this test code ever needs to be changed and there's nobody with a physics or maths phd around... I won't know how to fix the tests 😅 (but then I would just throw it all away and do what Erik did: https://github.com/exercism/csharp/pull/2402/files#diff-a31c6c904e805077bb6ab5a333d6e0f154d63488697388f75ce77e4ea40a4eba)
Still, I'm pretty impressed with the effort you went through and if I understood it, I'm sure I would agree it's the only reasonable way to write a reliable test for random values 🤓 I'm fine just accepting that I don't get it and let it be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have done this if it wasn't fun :)
I actually agree with your list of problems, I share your concerns. I looked at Erik's test, my tests include his (assert map_size(month_frequencies) == number_of_months
), but yes, I am fine with future maintainers simplifying the tests if I get hit by a bus. Or if this causes more confusion than it's worth.
@spec random_birthdates(group_size :: integer()) :: [Date.t()] | ||
def random_birthdates(group_size) do | ||
for _ <- 1..group_size do | ||
year = generate_non_leap_year_january_first(0, 3000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting year range 😬 just so that we're clear: it's completely unnecessary to make the year random, right? To pass the tests, you might just as well choose the same non-leap year for all birthdates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as far as the tests are concerned you could pick year = 2025
and be done with it, that's true.
In terms of generating random Date.t()
, this felt more natural to me. I guess it doesn't really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - It does not matter! I just wanted to make sure 🙂
config.json
Outdated
"randomness", | ||
"dates-and-time", | ||
"lists", | ||
"enum" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have also used: list-comprehensions, ranges, tuples. I think list-comprehensions might be optional (you're not using cartesian products, right?), but ranges might be necessary for generating a list of a given length.
Additionally a MapSet, but we don't have a concept for that 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I added them all, it could help to know the concepts.
Additionally a MapSet, but we don't have a concept for that 🤷
I made a set concept for Elm a while ago. Want me to fork it? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, if you have the time, it would be very nice to have that concept in Elixir too
I did some test by changing my sample size:
The cutoff is 10 seconds right? That's a limit of 250k samples. That doesn't seem too bad, especially considering that the tests are kind of hinting that a sample of 600 would be fine. |
I somehow remember it's 2x or 3x the average test runtime set in our config file (
Agreed, I'm no longer worried about test speed. Thanks! |
New exercise.
This one is a bit of a special one, and I might have gotten carried away with the math :D
It involves randomness, and therefore the tests have the potential to be flaky. I did the math to ensure that the flakiness is kept to a minimum (like one fail per 100k runs), but the safer I make the tests, the less strict they become.