Databricks-Certified-Professional-Data-Scientist試験無料問題集（140題）「Databricks Certified Professional Data Scientist 認定」

出題：1

What is the considerable difference between L1 and L2 regularization?

A. L2-regularization can be of vital importance when the application is deployed in resource-tight environments such as cell-phones.

B. All of the above are correct

C. L1 regularization has more accuracy of the resulting model

D. Size of the model can be much smaller in L1 regularization than that produced by L2-regularization

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1.
Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C.
The results of the regression are seen in the exhibit. You cannot request additional data. what is a way that you could try to increase the R2 of the model without artificially inflating it?

A. Create clusters based on the data and use them as model inputs

B. Create interaction variables based only on variables A, B, and C

C. Force all 15 variables into the model as independent variables

D. Break variables A, B, and C into their own univariate models

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

Which of the following is a Continuous Probability Distributions?

A. Negative binomial distribution

B. Normal probability distribution

C. Poisson probability distribution

D. Binomial probability distribution

正解：B 解答を投票する

出題：4

Refer to the exhibit.

You are using K-means clustering to classify customer behavior for a large retailer. You need to determine the optimum number of customer groups. You plot the within-sum-of-squares (wss) data as shown in the exhibit.
How many customer groups should you specify?

A. 8

B. 2

C. 4

D. 3

正解：C 解答を投票する

出題：5

What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?

A. The most stable clustering subject to a minimal cost constraint

B. The most stable clustering

C. The lowest cost clustering

D. The lowest cost clustering subject to a stability constraint

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

Suppose you have been given a relatively high-dimension set of independent variables and you are asked to come up with a model that predicts one of Two possible outcomes like "YES" or "NO", then which of the following technique best fit.

A. Support vector machines

B. Logistic regression

C. All of the above

D. Naive Bayes

E. Random decision forests

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

In unsupervised learning which statements correctly applies

A. telling the machine Predict Y for our data X

B. Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?

C. It does not have a target variable

正解：B,C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

Let's say you have two cases as below for the movie ratings
1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars
2. You recommend a movie with three stars but the user loves it (he'd rate it five stars). So which statement correctly applies?

A. In both cases, the contribution to the RMSE, could varies

B. In both cases, the contribution to the RMSE is the same

C. In both cases, the contribution to the RMSE is the different

D. None of the above

正解：B 解答を投票する

出題：9

Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?

A. There are missing values in the data.

B. There is not enough data to create a test set.

C. The data is unformatted.

D. There are categorical variables in the model.

正解：B 解答を投票する

出題：10

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

A. The mutual information is easier to parallelize.

B. The mutual information doesn't assume that the variables are normally distributed.

C. The mutual information can signal non-linear relationships between the dependent and independent variables.

D. The mutual information has a meaningful test for statistical significance.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

Databricks-Certified-Professional-Data-Scientist試験無料問題集「Databricks Certified Professional Data Scientist 認定」