Skip to content

Add new practice exercise baffling-birthdays #1575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 15, 2025

Conversation

jiegillet
Copy link
Contributor

New exercise.

This one is a bit of a special one, and I might have gotten carried away with the math :D

It involves randomness, and therefore the tests have the potential to be flaky. I did the math to ensure that the flakiness is kept to a minimum (like one fail per 100k runs), but the safer I make the tests, the less strict they become.

Copy link
Contributor

github-actions bot commented Jun 7, 2025

Thank you for contributing to exercism/elixir 💜 🎉. This is an automated PR comment 🤖 for the maintainers of this repository that helps with the PR review process. You can safely ignore it and wait for a maintainer to review your changes.

Based on the files changed in this PR, it would be good to pay attention to the following details when reviewing the PR:

  • General steps

    • 🏆 Does this PR need to receive a label with a reputation modifier (x:size/{tiny,small,medium,large,massive})? (A medium reputation amount is awarded by default, see docs)
  • Any exercise changed

    • 👤 Does the author of the PR need to be added as an author or contributor in <exercise>/.meta/config.json (see docs)?
    • 🔬 Do the analyzer and the analyzer comments exist for this exercise? Do they need to be changed?
    • 📜 Does the design file (<exercise>/.meta/design.md) need to be updated to document new implementation decisions?
  • Practice exercise changed

    • 🌲 Do prerequisites, practices, and difficulty in config.json need to be updated?
    • 🧑‍🏫 Are the changes in accordance with the community-wide problem specifiations?
  • Practice exercise tests changed

    • ⚪️ Are all tests except the first one skipped?
    • 📜 Does <exercise>/.meta/tests.toml need updating?

Automated comment created by PR Commenter 🤖.

Copy link
Member

@angelikatyborska angelikatyborska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with the testing approach. It's good that you left comments for students that tell them how different sample sizes behave in terms of flaky tests.

The tests with your implementation are very fast, but I think some students might go too far with sample sizes and have problems with timeouts 🤔

~D[2019-02-12]
]

output = BafflingBirthdays.shared_birthday?(birthdates) == true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and on LOC 50 are unused variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiegillet sorry, I should have been clearer - please make those expressions into assertions 😅 now it's:

    warning: use of operator == has no effect
    │
 63 │       BafflingBirthdays.shared_birthday?(birthdates) == true
    │                                                      ~
    │
    └─ test/baffling_birthdays_test.exs:63:54

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, don't apologize for me being an idiot 😆


# for a sample size of 100 and this delta, the assertion is expected to fail once in 100 runs
# for a sample size of 600 and this delta, the assertion is expected to fail once in a billion runs
delta = 0.83
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose the deltas in the last few tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this wolfram alpha app

  • enter the sample proportion (expected)
  • enter the confidence level (0.99, which means one failure in 100)
  • enter the sample size (100)

That gives you the 99% interval, meaning a random pick will fall in there 99% of the time. This interval goes from 0.03417 to 0.1997, which is 2 times the delta, that's how I got the value.

After that, I entered a confidence level of 0.999999999 (one failure in a billion) and tweaked the sample size until I found an interval that's fairly close to the confidence level of 0.99, that came out to being around 600 every time. It's not an exact calculation, but it should be close enough.

I used mix test --repeat-until-failure 100 with a sample of 100 to check if the solution would actually fail, and it does about half the time, which is expected, so it seems to work. I did not try a billion times with a sample of 600, but I ran it for a bit and never managed to make it fail 😆

I have one dark secret for this particular test though. If you try with a sample of 100, you will almost always fail the test, because with 100 sample, most of the time you will estimate 100 or 99, which is outside of the theoretical range (99.08 to 1). But I chose to ignore it, it's complicated to explain and it's unlikely that people will only pick a 100 sample size, and if they do, they'll see the comment and pick something bigger.

Comment on lines +130 to +141
expected_count_standard_deviation = fn
day when day <= 28 -> :math.sqrt(group_size * 12 / 365 * (1 - 12 / 365))
day when day <= 30 -> :math.sqrt(group_size * 11 / 365 * (1 - 11 / 365))
day when day == 31 -> :math.sqrt(group_size * 7 / 365 * (1 - 7 / 365))
end

counts_outside_95_percent_confidence_interval =
day_frequencies
|> Enum.filter(fn {day, count} ->
abs(count - expected_count.(day)) > 1.96 * expected_count_standard_deviation.(day)
end)
|> length()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there maybe a website that you could link in a comment in this code that could explain what's going on? My statistics knowledge is very rusty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a log going on, that's what I meant when I said that maybe I got carried away lol

There are many concepts involved:

I was wondering if I should explain more each step, but it seems like too much, when solving the exercise is actually pretty easy.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... yes, that is way too much. I won't even pretend I understand 🙈

I see a few potential problems with this testing code:

  • students might get confused by it (but in reality they should just read the test description and know what to do)
  • confused students might ask mentors about this test code, and mentors won't know how to explain it (but so what? it's important to know how to say "I don't know" 😬)
  • if this test code ever needs to be changed and there's nobody with a physics or maths phd around... I won't know how to fix the tests 😅 (but then I would just throw it all away and do what Erik did: https://github.com/exercism/csharp/pull/2402/files#diff-a31c6c904e805077bb6ab5a333d6e0f154d63488697388f75ce77e4ea40a4eba)

Still, I'm pretty impressed with the effort you went through and if I understood it, I'm sure I would agree it's the only reasonable way to write a reliable test for random values 🤓 I'm fine just accepting that I don't get it and let it be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not have done this if it wasn't fun :)

I actually agree with your list of problems, I share your concerns. I looked at Erik's test, my tests include his (assert map_size(month_frequencies) == number_of_months), but yes, I am fine with future maintainers simplifying the tests if I get hit by a bus. Or if this causes more confusion than it's worth.

@spec random_birthdates(group_size :: integer()) :: [Date.t()]
def random_birthdates(group_size) do
for _ <- 1..group_size do
year = generate_non_leap_year_january_first(0, 3000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting year range 😬 just so that we're clear: it's completely unnecessary to make the year random, right? To pass the tests, you might just as well choose the same non-leap year for all birthdates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as far as the tests are concerned you could pick year = 2025 and be done with it, that's true.
In terms of generating random Date.t(), this felt more natural to me. I guess it doesn't really matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - It does not matter! I just wanted to make sure 🙂

config.json Outdated
Comment on lines 2349 to 2352
"randomness",
"dates-and-time",
"lists",
"enum"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have also used: list-comprehensions, ranges, tuples. I think list-comprehensions might be optional (you're not using cartesian products, right?), but ranges might be necessary for generating a list of a given length.

Additionally a MapSet, but we don't have a concept for that 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added them all, it could help to know the concepts.

Additionally a MapSet, but we don't have a concept for that 🤷

I made a set concept for Elm a while ago. Want me to fork it? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, if you have the time, it would be very nice to have that concept in Elixir too

@jiegillet
Copy link
Contributor Author

jiegillet commented Jun 13, 2025

The tests with your implementation are very fast, but I think some students might go too far with sample sizes and have problems with timeouts 🤔

I did some test by changing my sample size:

  • 1M: 40s
  • 100k: 4s
  • 10k: 0.4s

The cutoff is 10 seconds right? That's a limit of 250k samples. That doesn't seem too bad, especially considering that the tests are kind of hinting that a sample of 600 would be fine.

@angelikatyborska
Copy link
Member

The cutoff is 10 seconds right?

I somehow remember it's 2x or 3x the average test runtime set in our config file ("average_run_time": 4) but maybe it's just simple 10s. Anyway similar values (8s, 10s, or 12s)

especially considering that the tests are kind of hinting that a sample of 600 would be fine.

Agreed, I'm no longer worried about test speed. Thanks!

@angelikatyborska angelikatyborska merged commit df0b279 into main Jun 15, 2025
9 checks passed
@angelikatyborska angelikatyborska deleted the jie-baffling-birthdays branch June 15, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants