A manufacturer of delivery drones is implementing a new data analysis pipeline to detect part failures before they occur. The drones have multiple sensors that send performance and environment data to an analytics pipeline. Currently, data is sent to a REST API endpoint. The REST API endpoint that receives data cannot always keep up with the pace data is arriving. When that happens, data is lost. Machine learning engineers have asked you to change the ingestion process to reduce this data loss. What would you do?
You are responsible for writing your company’s ETL pipelines to run on an Apache Hadoop cluster. The
pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the
pipelines?
Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable
each team to monitor slot usage within their projects. What should you do?
Messages are unexpectedly accumulating in service using Cloud Pub/Sub. A developer unfamiliar with Cloud Pub/Sub has asked for our help in diagnosing the problem. What would you point out with respect to how messages are removed from Cloud Pub/Sub topics?
You are working on a sensitive project involving private user data. You have set up a project on Google Cloud
Platform to house your work internally. An external consultant is going to assist with coding a complex
transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’
privacy?