You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 22, 2022. It is now read-only.
First of all I want to say that this is a brilliant idea! Not only increase the number open source Elixir projects, it also helps people to improve their skills.
What I'm going to suggest here might not be on top of your priority list as the project just started and we need more contributions. However I think it's an interesting problem to solve and in some point will be useful to avoid duplications.
The basic idea is if someone is trying to post a code that is already posted, the system shows the similar posts to the user. It will be the user decisions to post the tip or discard it.
In order to decide if a code is similar to an existing code, we can use simhash.
One of the main challenges that I'm not sure how to address is to make it scalable. As the data grows, going through all the posts and checking the similarities sounds counterintuitive. However we can:
Only check similarities for the past X months.
First find the similar posts by looking at their title(DB Query) and then validate it by using simhash on the code
If your open to this feature, I'd be happy get into the details and create a PR!
Hi 👋
First of all I want to say that this is a brilliant idea! Not only increase the number open source Elixir projects, it also helps people to improve their skills.
What I'm going to suggest here might not be on top of your priority list as the project just started and we need more contributions. However I think it's an interesting problem to solve and in some point will be useful to avoid duplications.
The basic idea is if someone is trying to post a code that is already posted, the system shows the similar posts to the user. It will be the user decisions to post the tip or discard it.
In order to decide if a code is similar to an existing code, we can use simhash.
One of the main challenges that I'm not sure how to address is to make it scalable. As the data grows, going through all the posts and checking the similarities sounds counterintuitive. However we can:
If your open to this feature, I'd be happy get into the details and create a PR!