Skip to content

RWI reading after instance crash takes ages #751

@okybaca

Description

@okybaca

After an instance crash (power-out, ...), fresh start takes very long time
because of HeapReader reading RWI database and rebuilding indexes (~half a
day in case of 300GB RWI database) resulting in prolonged instance
down-time.

I'm not sure about how RWI database works, prove me wrong please, but from
what I observe:

  • all the segments indices are kept in memory during run
    (which can result in memory exhaustion as described in RWIs fill out the whole memory space for YaCy #731)

  • during correct shutdown, indices are written to (.idx and .gap?) files and
    re-start is relatively fast

  • if the instance crashes, no .idx & .gap files are written

Possible solution could be:

  • keep the .idx & .gap files until the segment is changed

  • maybe time-to-time, write .idx & .gap file, when the segment wasn't
    changed for a long time

  • if running into low-memory condition, unmount (?) the oldest segments.
    check this also on start-up, because with a RWI database bigger than
    available RAM, the instance wouldn't start, resulting in memory error

  • speed-up the RWI transfer to other instances, as suggested in RWI: indexDistribution.minChunkSize does nothing #724.
    RWI database on a host only grows even by moderate crawling (~200ppm,
    DHT-IN switched off) and never actually shrinks, resulting in many
    instances jam.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIndicates new feature requests or improvement for existing functionalityindex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions