Dump the API responses and semi-structured crawled data in a dynamodb Place the script in a lambda function and run it on schedule using cloudwatch.