Skip to content

CinC/Physionet PCG/ECG challenge 2016 #29

@breznak

Description

@breznak

CinC challenge

https://physionet.org/challenge/2016/

A prestigious challenge/conference with nice data!

🔥 UPDATE: game's still ON! 🎸

Looking for hackers to help me set someting up, if it's feasible. The there will be whole summer to tune the app.

Blocked by: Add encoders #22

Plan of attack

  • audio
    • for now use wav2vect from Matlab
    • implement wavEncoder - IN PROGRESS Wav encoder #26
    • evaluate if functionality of the WAVEncoder (internal scipy) is the same as matlab's
    • try Cochlea encoder
    • implement sound encoders for nupic.audio Create sound encoders #22
  • training
    • records are Normal/Anomaly/Unknown
    • aggregate all NORMAL records to a 2 column file (reset, PCG)
      • how radical subsampling? bcs nupic is too slow to process whole dataset: only down to 1000(from 2000),bcs of Sampling Theorem (Fs>=2*F)
    • commit the training data files (bcs the preprocessing takes long)
    • train a HTM model + serialize it
    • try param swarming
  • evaluation
    • load the model, disable learning
    • 2 tasks description.py?, OR other way to train/load/eval a model on datasets
    • compute average anomaly score for all datapoints of a record
    • implement the anomaly metric in nupic
    • create a model (for nupic?) that does this classification based on avg. anomaly scores?
    • threshold to Normal/Anomaly/Unknown
  • submission
    • modify examples sample2016*
    • nupic is installed, so setup will just source a virtualenv
    • each evaluation in next will call matlab (wav2csv), python(writes anomaly scores to CSV), matlab again(loads anomalies and decides classification)
    • this is problematic, better go full-python if possible!
  • improvements:
    • try bag (multi model) voting
      • model trained on full normal data
    • model on FHS parts
    • model on anomalous data
    • model pretrained on ECG data from other sources! https://github.com/breznak/nupic.biodat

Working plan to get some validation results ASAP:

  • training data
    • will train only on Normal data and select (FHS) subsequences of it
    • data extracted from Matlab @breznak will do that
  • train HTM model
    • on the provided data
    • just one HTM model (with RDSE? encoder, what best settings? probably no time to swarm)
    • able to serialize the model and load to run on eval. data (learning off)
      • the approach with OPF is not reliably working, can someone post code to do that? (@rhyolight or someone..?)
  • write simple classification function: classify(anScores[])
    • should decide classification from the anomaly scores for the whole sequence/sample
    • can be sth like avg and Normal iff <0.4; UNKNOWN iff [0.4...0.7]; Anomal iff > 0.7; ETA ~10mins
  • score
    • process validation data (@breznak will commit a file)
    • classify & compute score -> submit! 🙏

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions