Databricks-Certified-Professional-Data-Scientist試験無料問題集「Databricks Certified Professional Data Scientist 認定」

What is the considerable difference between L1 and L2 regularization?

解説: (GoShiken メンバーにのみ表示されます)
Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1.
Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C.
The results of the regression are seen in the exhibit. You cannot request additional data. what is a way that you could try to increase the R2 of the model without artificially inflating it?

解説: (GoShiken メンバーにのみ表示されます)
Which of the following is a Continuous Probability Distributions?

Refer to the exhibit.

You are using K-means clustering to classify customer behavior for a large retailer. You need to determine the optimum number of customer groups. You plot the within-sum-of-squares (wss) data as shown in the exhibit.
How many customer groups should you specify?

What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?

解説: (GoShiken メンバーにのみ表示されます)
Suppose you have been given a relatively high-dimension set of independent variables and you are asked to come up with a model that predicts one of Two possible outcomes like "YES" or "NO", then which of the following technique best fit.

解説: (GoShiken メンバーにのみ表示されます)
In unsupervised learning which statements correctly applies

解説: (GoShiken メンバーにのみ表示されます)
Let's say you have two cases as below for the movie ratings
1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars
2. You recommend a movie with three stars but the user loves it (he'd rate it five stars). So which statement correctly applies?

Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

解説: (GoShiken メンバーにのみ表示されます)