Associate-Developer-Apache-Spark試験無料問題集（179題）「Databricks Certified Associate Developer for Apache Spark 3.0 認定」

出題：1

Which of the following describes a narrow transformation?

A. A narrow transformation is an operation in which data is exchanged across the cluster.

B. A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

C. A narrow transformation is a process in which data from multiple RDDs is used.

D. A narrow transformation is an operation in which no data is exchanged across the cluster.

E. narrow transformation is an operation in which data is exchanged across partitions.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?
Sample of DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId|f |
3.+-------------+---------+-----+-------+---------+----+
4.|1 |3 |4 |25 |1 |null|
5.|2 |6 |7 |2 |2 |null|
6.|3 |3 |null |25 |3 |null|
7.+-------------+---------+-----+-------+---------+----+

A. transactionsDf.withColumnRemoved("predError", "productId")

B. transactionsDf.drop("predError", "productId", "associateId")

C. transactionsDf.dropColumns("predError", "productId", "associateId")

D. transactionsDf.drop(col("predError", "productId"))

E. transactionsDf.drop(["predError", "productId", "associateId"])

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block:
transactionsDf.agg("storeId").avg("value")

A. The avg("value") should be specified as a second argument to agg() instead of being appended to it.

B. agg should be replaced by groupBy.

C. "storeId" and "value" should be swapped.

D. Instead of avg("value"), avg(col("value")) should be used.

E. All column names should be wrapped in col() operators.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type double?

A. spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

B. 1. from pyspark.sql import types as T
2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season",

C. CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

F. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

A. DataFrame.repartition(12)

B. DataFrame.coalesce(6).shuffle()

C. DataFrame.repartition(6)

D. DataFrame.coalesce(6, shuffle=True)

E. DataFrame.coalesce(6)

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

Which of the following options describes the responsibility of the executors in Spark?

A. The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

B. The executors accept tasks from the driver, execute those tasks, and return results to the driver.

C. The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

D. The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

E. The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

Which of the following statements about data skew is incorrect?

A. In skewed DataFrames, the largest and the smallest partition consume very different amounts of memory.

B. To mitigate skew, Spark automatically disregards null values in keys when joining.

C. Broadcast joins are a viable way to increase join performance for skewed data over sort-merge joins.

D. Spark will not automatically optimize skew joins by default.

E. Salting can resolve data skew.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

Which of the following describes Spark's standalone deployment mode?

A. Standalone mode uses only a single executor per worker per application.

B. Standalone mode uses a single JVM to run Spark driver and executor processes.

C. Standalone mode is how Spark runs on YARN and Mesos clusters.

D. Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

E. Standalone mode means that the cluster does not contain the driver.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：9

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

A. itemsDf.store()

B. itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

C. itemsDf.cache()

D. itemsDf.persist(StorageLevel.MEMORY_ONLY)

E. itemsDf.write.option('destination', 'memory').save()

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

Which of the following is one of the big performance advantages that Spark has over Hadoop?

A. Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.

B. Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.

C. Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.

D. Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.

E. Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user-friendly API.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

Associate-Developer-Apache-Spark試験無料問題集「Databricks Certified Associate Developer for Apache Spark 3.0 認定」