KeyError on GridSearch resume: _ordered_ids and _populate_next aren't persisted

**Describe the bug**
 kt.GridSearch(..., overwrite=False) in a fresh process (kernel restart, cell re-run, new script) reloads the prior trials fine but then dies on the first completed trial in the resumed  process with KeyError: '<trial_id>' from LinkedList.next inside GridSearchOracle.populate_space.

Version: keras_tuner 1.4.8, Python 3.12, installed via pip

**To Reproduce**
No Colab link, sorry--I've been working off a private notebook with a long image download sequence (unrelated) that I doubt you'd want to repeat.
Bug is purely in GridSearchOracle internal state and reproducing it cleanly needs an interrupted trial (process killed while a trial is running).

To reproduce:
1. Run any kt.GridSearch(...) to completion of at least one trial.
2. Restart the Python process (or re-run the cell that builds kt.GridSearch(..., overwrite=False)).
3. Call .search(). After the first completed trial in the new process you get:

File ".../keras_tuner/src/tuners/gridsearch.py", line 197, in populate_space
    next_id = self._ordered_ids.next(old_trial_id)
File ".../keras_tuner/src/tuners/gridsearch.py", line 80, in next
    index = self._data_to_index[data]
KeyError: '0001'

**Expected behavior**
Resumed search picks up at the next un-tried grid combo, same as if the original process had kept going.

**Additional context**
Traced through the source. GridSearchOracle.__init__ makes two in-memory fields:

- _ordered_ids: LinkedList — trial IDs in hp-combo order
- _populate_next: list — queue of trial IDs ready to spawn the next combo

Neither is in Oracle.get_state / set_state, and GridSearchOracle doesn't override either. So on resume they come back empty while start_order rehydrates fine. As soon as end_trial fires (e.g. from an interrupted trial retried via _retry_queue) it pushes a trial_id onto _populate_next. 
Next populate_space pops that id and looks it up in the empty _ordered_ids._data_to_index.KeyError.

gridsearch.py is byte-identical at v1.4.7, v1.4.8 and current master, so it's still there.

Two possible fixes:

a) Override get_state / set_state on GridSearchOracle to persist both fields plus rebuild the LinkedList in set_state. 
or
b) Lazily reconstruct on first populate_space after reload — walk start_order to fill _ordered_ids in insertion order via _ordered_ids.insert(tid, prev_tid), seed _populate_next with end_order[-1]. Smaller change.

Workaround I've got in my notebook:

def reseed_grid_picker(tuner):
    oracle = tuner.oracle
    if oracle._ordered_ids._memory:
        return
    prev_id = None
    for trial_id in oracle.start_order:
        oracle._ordered_ids.insert(trial_id, prev_id)
        prev_id = trial_id
    if oracle.end_order:
        oracle._populate_next.append(oracle.end_order[-1])

**Would you like to help us fix it?**
Happy to open a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError on GridSearch resume: _ordered_ids and _populate_next aren't persisted #1055

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

KeyError on GridSearch resume: _ordered_ids and _populate_next aren't persisted #1055

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions