A portfolio manager is using a multi-arm bandit model to decide daily between several stocks for short-term investments. Each stock has a different historical return profile. In this scenario, what would be the "reward" in MAB terminology?
A financial institution uses a neural network model with thousands of parameters to predict loan defaults. On the training dataset, the model has a nearly zero residual sum of squares (RSS). However, it performs poorly on new data. What does this indicate?