Description
Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!
After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).
In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.
Most of the historical quarterly features
data points can be found directly or through calculations on https://www.macrotrends.net/. Example: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/financial-statements
There are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines var chartData
with all the values by date.
between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.
Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!