Skip to content

Delete dead code in dict #2014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: unstable
Choose a base branch
from

Conversation

zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Apr 27, 2025

After transitioning much usage of dict to the new hashtable, parts of the dict implementation have become dead code. The following is deleted:

  • Dict entry without value
  • Dict entry with embedded key
  • Pointer bit tricks and other complexity for different kinds of dictEntry
  • Dict two-phase unlink
  • Dict type flag to disable incremental rehashing
  • Dict stats
  • Defrag callbacks defragEntryStartCb and defragEntryFinishCb

@zuiderkwast zuiderkwast requested a review from hpatro April 27, 2025 09:33
Deletes the following features that are no longer used:

* Dict entry with no value
* Key as dict entry, when key is odd
* Embed key in dict entry

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Copy link

codecov bot commented Apr 27, 2025

Codecov Report

Attention: Patch coverage is 71.05263% with 11 lines in your changes missing coverage. Please review.

Project coverage is 71.17%. Comparing base (249495a) to head (b172888).

Files with missing lines Patch % Lines
src/dict.c 71.05% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2014      +/-   ##
============================================
+ Coverage     71.00%   71.17%   +0.16%     
============================================
  Files           123      123              
  Lines         66103    65906     -197     
============================================
- Hits          46939    46909      -30     
+ Misses        19164    18997     -167     
Files with missing lines Coverage Δ
src/dict.h 100.00% <ø> (ø)
src/dict.c 79.29% <71.05%> (+14.60%) ⬆️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can reach to a point where we can completely remove dict by hashtable? I would prefer us leaving the dict.c/dict.h as is till we're ready to remove it completely.

@zuiderkwast
Copy link
Contributor Author

Do you think we can reach to a point where we can completely remove dict by hashtable?

It would take a long time and a lot of work and will not bring a huge benefit. Counting calls to dictCreate(), it's still used in 54 places (blocked keys, watched keys, ready keys, replica keys with expire, cluster nodes, configs, script cache, scripting engines, function libraries, functions, latency events, rdb aux fields, sentinel instances, ...).

Most or all of them are small and many are short-lived. It's not trivial to replace them with hashtable since they need an key-value entry object. I'm considering replacing the internals of dict so it uses hashtable internally with a dictEntry being just a key and a value (no next pointer). But to do that, I'd like to shrink the dict API as much as possible first.

What do you think of this plan?

@hpatro
Copy link
Collaborator

hpatro commented Apr 29, 2025

Do you think we can reach to a point where we can completely remove dict by hashtable?

It would take a long time and a lot of work and will not bring a huge benefit. Counting calls to dictCreate(), it's still used in 54 places (blocked keys, watched keys, ready keys, replica keys with expire, cluster nodes, configs, script cache, scripting engines, function libraries, functions, latency events, rdb aux fields, sentinel instances, ...).

Most or all of them are small and many are short-lived. It's not trivial to replace them with hashtable since they need an key-value entry object. I'm considering replacing the internals of dict so it uses hashtable internally with a dictEntry being just a key and a value (no next pointer). But to do that, I'd like to shrink the dict API as much as possible first.

What do you think of this plan?

Sounds like a plan. I was trying to look internally if this will have implications. Shall we do the changes you're suggesting in a single PR with multiple commits?

@zuiderkwast
Copy link
Contributor Author

I was trying to look internally if this will have implications.

Yes, please do. There is more unused stuff that I kept because it's very small and we (or you) might want to use it still.

Shall we do the changes you're suggesting in a single PR with multiple commits?

This PR has 3 separate commits. We can merge without squashing if you want.

Also, when you backport this to AWS, you can just skip this PR?

@hpatro
Copy link
Collaborator

hpatro commented Apr 29, 2025

Shall we do the changes you're suggesting in a single PR with multiple commits?

This PR has 3 separate commits. We can merge without squashing if you want.

Also, when you backport this to AWS, you can just skip this PR?

I'm worried about the next set of PRs. It becomes difficult to track which changes were related to each other. Hence, I was asking if it can be done in a single go in this PR.

@zuiderkwast
Copy link
Contributor Author

I'm worried about the next set of PRs. It becomes difficult to track which changes were related to each other.

@hpatro Which next set of PRs? I don't have anything in the near future. Are you referring using hashtable as the backend for dict? I don't know if it is as easy as it seems, but to do that, it seems good to shrink the unused dict APIs as much as possible before. Doesn't it?

Hence, I was asking if it can be done in a single go in this PR.

I did get rid of the usage of no_value in a previous PR just to be able to get rid of all the dictEntry pointer-bit tricks in one go, along with the other stuff deleted in this PR, which was obsoleted earlier when I switched over kvstore to using hashtable. This PR deletes almost all of the unused code in dict and I don't foresee obsoleting the rest of dict in the near future.

TBH, I don't understand why you're so concerned about deleting dead code. The hashtable is basically a dict with a user-defined dictEntry, which should obsolete the hacks to manipulate the dictEntry, a struct that should be internal to dict.

Can you check if you're using any of this in AWS? Then we can perhaps keep it for now, but we can delete the rest of the dead code?

@hpatro
Copy link
Collaborator

hpatro commented Apr 30, 2025

We use the dictionary code internally in few places, so quite possible those benefit from the optimisation which we had introduced and suddenly it could see regression in memory utilisation by reverting these enhancements.

I agree with all of your above points. I think it makes sense for Valkey to cleanup unused code. Just trying to see if we can do it with minimal impact to downstream repositories.

Can you check if you're using any of this in AWS? Then we can perhaps keep it for now, but we can delete the rest of the dead code?

I'm doing this as we speak. Let me get back soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants