Professional-Data-Engineer試験無料問題集「Google Certified Professional Data Engineer 認定」
Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?
正解:D
解答を投票する
Your team is building a data lake platform on Google Cloud. As a part of the data foundation design, you are planning to store all the raw data in Cloud Storage You are expecting to ingest approximately 25 GB of data a day and your billing department is worried about the increasing cost of storing old dat a. The current business requirements are:
* The old data can be deleted anytime
* You plan to use the visualization layer for current and historical reporting
* The old data should be available instantly when accessed
* There should not be any charges for data retrieval.
What should you do to optimize for cost?
* The old data can be deleted anytime
* You plan to use the visualization layer for current and historical reporting
* The old data should be available instantly when accessed
* There should not be any charges for data retrieval.
What should you do to optimize for cost?
正解:B
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.
Which product should they use to store the data?
Which product should they use to store the data?
正解:C
解答を投票する
A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their event Timestamp and publish Time. What is the problem and what should you do?
正解:C
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are:
The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured Support for publish/subscribe semantics on hundreds of topics Retain per-key ordering Which system should you choose?
The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured Support for publish/subscribe semantics on hundreds of topics Retain per-key ordering Which system should you choose?
正解:B
解答を投票する
You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?
正解:A
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage One of the pipeline transforms reads CSV files and emits an element for every CSV line. The Job performance is low. the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?
正解:D
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You work for a large fast food restaurant chain with over 400,000 employees. You store employee information in Google BigQuery in a Users table consisting of a FirstName field and a LastName field. A member of IT is building an application and asks you to modify the schema and data in BigQuery so the application can query a FullName field consisting of the value of the FirstName field concatenated with a space, followed by the value of the LastName field for each employee. How can you make that data available while minimizing cost?
正解:A
解答を投票する
You are building a report-only data warehouse where the data is streamed into BigQuery via the streaming API Following Google's best practices, you have both a staging and a production table for the data How should you design your data loading to ensure that there is only one master dataset without affecting performance on either the ingestion or reporting pieces?
正解:D
解答を投票する
You have two projects where you run BigQuery jobs:
* One project runs production jobs that have strict completion time SLAs. These are high priority jobs that must have the required compute resources available when needed. These jobs generally never go below a 300 slot utilization, but occasionally spike up an additional 500 slots.
* The other project is for users to run ad-hoc analytical queries. This project generally never uses more than 200 slots at a time. You want these ad-hoc queries to be billed based on how much data users scan rather than by slot capacity.
You need to ensure that both projects have the appropriate compute resources available. What should you do?
* One project runs production jobs that have strict completion time SLAs. These are high priority jobs that must have the required compute resources available when needed. These jobs generally never go below a 300 slot utilization, but occasionally spike up an additional 500 slots.
* The other project is for users to run ad-hoc analytical queries. This project generally never uses more than 200 slots at a time. You want these ad-hoc queries to be billed based on how much data users scan rather than by slot capacity.
You need to ensure that both projects have the appropriate compute resources available. What should you do?
正解:B
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?
正解:B
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)