How can we have 9450 ECG data in total when UCR provides only 5000? + 2 other questions

Hello tejaslodaya, I have 3 questions regarding the codes, which I would appreciate it if you can answer.

### Q1. How did you train the VRAE with 9500 data points (8500 train+ 950 test), when the UCR archive has only 5000 data points in total?
According to the <a href='https://www.cs.ucr.edu/~eamonn/time_series_data_2018/'>UCR time-series archive website </a>, the ECG5000 dataset is comprised of 500 train dataset and 4500 test dataset. But your readme.md says as follows:

> The above network is trained on a dataset of 8500 ECG's and tested on 950 ECG's Named ECG5000 on the UCR archive, this dataset has 5 classes....

Where is the difference of data points (4500 data points difference) between your dataset and UCR archive coming from?


### Q2. For the clustering purpose, which would be the good way to slice the daily_return time-series vector of stocks to train the VRAE?
- Task : To replace the covariance of daily_return time-series vector of 2 stocks with the distance calculated from the VRAE. 
- Question : should I divide the time-series data into non-overlapping sequences or over-lapping sequences with each other? 
 
cf) You can calculate the daily return (Ri) as the picture shown below.
 
![daily_stock](https://user-images.githubusercontent.com/34431729/62817389-a4600c00-bb70-11e9-8734-08db2dc43e3f.png)

Let's say I have a daily_return time-series data of Apple stock (AAPL) from 2010/1/1 to 2018/12/31. It is almost 2000-dimensional 1D vector. What would be the best way to slice this 2000-dimensional vector? 

The research paper <a href="https://arxiv.org/pdf/1412.6581.pdf">_Variational Recurrent Auto-encoder_<a/>tried both ways: **dividing with overlapping parts and without over-lapping parts**

1. dividing without overlapping parts
> ...The song were divided into non-overlapping sequences of 50 time steps each....

2. dividing with overlapping parts
>...For this model, we used sequences of 40 time steps with overlap, such that the start of each data point is halfway through the previous data point.

Since I am not interested in generation, which is the usual purpose of VAE, I am wondering which way I should follow. **My goal is to cluster stocks based on daily_return time-series vector**. Any advice on how to slice the stock daily return time-series data would be very much appreciated!

### Q3. Why didn't you follow the author's way of slicing the time-series data?
As far as I know, the ECG 5000 data does not overlap with each other, but you used the ECG data to train the VRAE model. Does it imply that VRAE can be trained with non-overlapping data?

Thank you for reading the questions. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can we have 9450 ECG data in total when UCR provides only 5000? + 2 other questions #2

Q1. How did you train the VRAE with 9500 data points (8500 train+ 950 test), when the UCR archive has only 5000 data points in total?

Q2. For the clustering purpose, which would be the good way to slice the daily_return time-series vector of stocks to train the VRAE?

Q3. Why didn't you follow the author's way of slicing the time-series data?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How can we have 9450 ECG data in total when UCR provides only 5000? + 2 other questions #2

Description

Q1. How did you train the VRAE with 9500 data points (8500 train+ 950 test), when the UCR archive has only 5000 data points in total?

Q2. For the clustering purpose, which would be the good way to slice the daily_return time-series vector of stocks to train the VRAE?

Q3. Why didn't you follow the author's way of slicing the time-series data?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions