A data engineer needs to build an enterprise data catalog based on the company's Amazon S3 buckets andAmazon RDS databases. The data catalog must include storage format metadata for the data in the catalog.Which solution will meet these requirements with the LEAST effort?
A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. TheETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, andload data into Amazon Redshift.The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python toorchestrate the jobs.Which service will meet these requirements?
A company wants to migrate an application and an on-premises Apache Kafka server to AWS. Theapplication processes incremental updates that an on-premises Oracle database sends to the Kafka server. Thecompany wants to use the replatform migration strategy instead of the refactor strategy.Which solution will meet these requirements with the LEAST management overhead?
A company receives test results from testing facilities that are located around the world. The company storesthe test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process thefiles, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The dataengineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and AmazonEventBridge to schedule jobs.The company recently added more testing facilities. The time required to process files is increasing. The dataengineer must reduce the data processing time.Which solution will MOST reduce the data processing time?
A company needs to load customer data that comes from a third party into an Amazon Redshift datawarehouse. The company stores order data and product data in the same data warehouse. The company wantsto use the combined dataset to identify potential new customers.A data engineer notices that one of the fields in the source data includes values that are in JSON format.How should the data engineer load the JSON data into the data warehouse with the LEAST effort?