Databricks-Certified-Professional-Data-Engineer試験無料問題集（129題）「Databricks Certified Professional Data Engineer 認定」

出題：1

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

A. Cluster: New Job Cluster;
Retries: None;
Maximum Concurrent Runs: 1

B. Cluster: Existing All-Purpose Cluster;
Retries: None;
Maximum Concurrent Runs: 1

C. Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1

D. Cluster: Existing All-Purpose Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1

E. Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: Unlimited

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

Review the following error traceback:

Which statement describes the error being raised?

A. There is a type error because a DataFrame object cannot be multiplied.

B. There is a type error because a column object cannot be multiplied.

C. There is no column in the table named heartrateheartrateheartrate

D. There is a syntax error because the heartrate column is not correctly identified as a column.

E. The code executed was PvSoark but was executed in a Scala notebook.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non- overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late- arriving data.
Streaming DataFrame df has the following schema:
"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"
Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

A. slidingWindow("event_time", "10 minutes")

B. await("event_time + '10 minutes'")

C. awaitArrival("event_time", "10 minutes")

D. delayWrite("event_time", "10 minutes")

E. withWatermark("event_time", "10 minutes")

正解：E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

Which statement characterizes the general programming model used by Spark Structured Streaming?

A. Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.

B. Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.

C. Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.

D. Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.

E. Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

A user wants to use DLT expectations to validate that a derived table report contains all records from the source, included in the table validation_copy.
The user attempts and fails to accomplish this by adding an expectation to the report table definition.
Which approach would allow using DLT expectations to validate all expected records are present in this table?

A. Define a temporary table that perform a left outer join on validation_copy and report, and define an expectation that no report key values are null

B. Define a view that performs a left outer join on validation_copy and report, and reference this view in DLT expectations for the report table

C. Define a function that performs a left outer join on validation_copy and report and report, and check against the result in a DLT expectation for the report table

D. Define a SQL UDF that performs a left outer join on two tables, and check if this returns null values for report key values in a DLT expectation for the report table.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

A junior data engineer is working to implement logic for a Lakehouse table namedsilver_device_recordings.
The source data contains 100 unique fields in a highly nested JSON structure.
Thesilver_device_recordingstable will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the 100 fields are being used in at least one of these applications.
The data engineer is trying to determine the best approach for dealing with schema declaration given the highly-nested structure of the data and the numerous fields.
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?

A. Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.

B. The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.

C. Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.

D. Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.

E. Schema inference and evolution on .Databricks ensure that inferred types will always accurately match the data types used by downstream systems.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

The Databricks CLI is use to trigger a run of an existing job by passing the job_id parameter. The response that the job run request has been submitted successfully includes a filed run_id.
Which statement describes what the number alongside this field represents?

A. The number of times the job definition has been run in the workspace.

B. The globally unique ID of the newly triggered run.

C. The job_id is returned in this field.

D. The job_id and number of times the job has been are concatenated and returned.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.
Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

A. Driver's and Executor's log files

B. Stage's detail screen and Query's detail screen

C. Stage's detail screen and Executor's files

D. Executor's detail screen and Executor's log files

正解：D 解答を投票する

出題：9

The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables.
Which approach will ensure that this requirement is met?

A. When tables are created, make sure that the external keyword is used in the create table statement.

B. Whenever a database is being created, make sure that the location keyword is used

C. When the workspace is being configured, make sure that external cloud object storage has been mounted.

D. Whenever a table is being created, make sure that the location keyword is used.

E. When configuring an external data warehouse for all table storage. leverage Databricks for all ELT.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.
The following logic is used to process these records.
MERGE INTO customers
USING (
SELECT updates.customer_id as merge_ey, updates .*
FROM updates
UNION ALL
SELECT NULL as merge_key, updates .*
FROM updates JOIN customers
ON updates.customer_id = customers.customer_id
WHERE customers.current = true AND updates.address <> customers.address ) staged_updates ON customers.customer_id = mergekey WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.
address THEN
UPDATE SET current = false, end_date = staged_updates.effective_date
WHEN NOT MATCHED THEN
INSERT (customer_id, address, current, effective_date, end_date)
VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null) Which statement describes this implementation?
* The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

A. The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

B. The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

C. The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

Databricks-Certified-Professional-Data-Engineer試験無料問題集「Databricks Certified Professional Data Engineer 認定」