Skip to content

How can we have 9450 ECG data in total when UCR provides only 5000? + 2 other questions #2

@hellojinwoo

Description

@hellojinwoo

Hello tejaslodaya, I have 3 questions regarding the codes, which I would appreciate it if you can answer.

Q1. How did you train the VRAE with 9500 data points (8500 train+ 950 test), when the UCR archive has only 5000 data points in total?

According to the UCR time-series archive website , the ECG5000 dataset is comprised of 500 train dataset and 4500 test dataset. But your readme.md says as follows:

The above network is trained on a dataset of 8500 ECG's and tested on 950 ECG's Named ECG5000 on the UCR archive, this dataset has 5 classes....

Where is the difference of data points (4500 data points difference) between your dataset and UCR archive coming from?

Q2. For the clustering purpose, which would be the good way to slice the daily_return time-series vector of stocks to train the VRAE?

  • Task : To replace the covariance of daily_return time-series vector of 2 stocks with the distance calculated from the VRAE.
  • Question : should I divide the time-series data into non-overlapping sequences or over-lapping sequences with each other?

cf) You can calculate the daily return (Ri) as the picture shown below.

daily_stock

Let's say I have a daily_return time-series data of Apple stock (AAPL) from 2010/1/1 to 2018/12/31. It is almost 2000-dimensional 1D vector. What would be the best way to slice this 2000-dimensional vector?

The research paper Variational Recurrent Auto-encodertried both ways: dividing with overlapping parts and without over-lapping parts

  1. dividing without overlapping parts

...The song were divided into non-overlapping sequences of 50 time steps each....

  1. dividing with overlapping parts

...For this model, we used sequences of 40 time steps with overlap, such that the start of each data point is halfway through the previous data point.

Since I am not interested in generation, which is the usual purpose of VAE, I am wondering which way I should follow. My goal is to cluster stocks based on daily_return time-series vector. Any advice on how to slice the stock daily return time-series data would be very much appreciated!

Q3. Why didn't you follow the author's way of slicing the time-series data?

As far as I know, the ECG 5000 data does not overlap with each other, but you used the ECG data to train the VRAE model. Does it imply that VRAE can be trained with non-overlapping data?

Thank you for reading the questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions