A company is deploying a new machine learning (ML) model in a production environment. The company is concerned that the ML model will drift over time, so the company creates a script to aggregate all inputs and predictions into a single file at the end of each day. The company stores the file as an object in an Amazon S3 bucket. The total size of the daily file is 100 GB. The daily file size will increase over time. Four times a year, the company samples the data from the previous 90 days to check the ML model for drift. After the 90-day period, the company must keep the files for compliance reasons. The company needs to use S3 storage classes to minimize costs. The company wants to maintain the same storage durability of the data. Which solution will meet these requirements?
A data scientist has explored and sanitized a dataset in preparation for the modeling phase of a supervised learning task.The statistical dispersion can vary widely between features, sometimes by several orders of magnitude.Before moving on to the modeling phase, the data scientist wants to ensure that the prediction performance on the production data is as accurate as possible.Which sequence of steps should the data scientist take to meet these requirements?