We need to identify features of interest and extract them from each article. This may necissitate bringing in additional data. For example, Impact Factor may be useful. I have access to Web of Science and can download the impact factors for each year, but need to identify the date of publication for each article.
What other information / features are people interested in?