DEA-7TT2試験無料問題集「EMC Associate - Data Science and Big Data Analytics v2 認定」

You have plotted the distribution of savings account sizes for a bank.

Based on the distribution shown in the exhibit, how would you proceed?
Response:

Which word or phrase completes the statement? Data-ink ratio is to data visualization as _________.
Response:

You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners".
What can you determine from the lift calculation?
Response:

Refer to the exhibit.

To predict whether or not a customer will renew their annual property insurance policy, an insurance company built and operationalized a naive Bayes classification model.
In the model, there are two class labels, renewal and non-renewal, that are assigned to each customer based on their attributes. A subset of the key attributes, their values, and corresponding conditional probabilities are provided in the exhibit.
A customer has the following attributes:
- Age is greater than 65 years
- Owns their own home
- Renewal month is August
If 20% of customers do not renew their policies every year, what is the score for a non-renewal in the naive Bayesian model for the customer described above?
Response:

Which analytic technique would be appropriate to estimate blood pressure based on age and weight?
Response:

Refer to the exhibit.

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?
Response:

You are having a discussion with a business colleague. The colleague mentions that they want to perform K-means clustering on text file data stored in HDFS. Which tool should be recommended?
Response:

When would you prefer a Naive Bayes model to a logistic regression model for classification?
Response:

You have been assigned to perform a study of the daily revenue effect of a pricing model of online transactions. All data currently available to you has been loaded into your analytics database. This includes revenue data, pricing data, and online transaction data.
You discover that all data comes in different levels of granularity. The transaction data has timestamps consisting of day, hour, minutes, and seconds. Pricing is stored at the daily level and revenue data is only reported monthly.
What is the next step?
Response:

Refer to the exhibit.

Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents for the topic "solid state disk".
In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term's frequency in four documents selected from corpus.
Which of the four documents is most relevant to the analyst's search?
Response:

In a Logistic Regression, the coefficient for "age" equals -3. What is the correct interpretation of the Logistic Regression coefficient, holding all other variables constant?
Response:

A fair six-sided die is rolled. Let A denote the event that an odd number is rolled. Let C denote the event that a 1, 2, or 3 is rolled. What is the value of the conditional probability, P(C|A)?
Response:

Refer to the exhibit, which shows pairwise counts for items purchased together.

Consider the following association rules:
- Milk -> Eggs
- Eggs -> Milk
- Bread -> Milk
- Milk -> Bread
Which rule has a confidence higher than 70%?
Response:

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual.
Which algorithm is the most appropriate for this study?
Response:

Refer to the exhibit.

You are assigned to do an end of the year sales analysis of 1, 000 different products, based on the transaction table. Which column in the end of year report requires the use of a window function?
Response:

The R vector "v" contains 16 elements. Which R command modifies the vector to have the same elements in reverse order?
Response:

You need to run a hypothesis test across three normally distributed populations. Which technique should you use?
Response:

Refer to the exhibit.

What is the approximate R-squared value for a linear regression model fitted to the data associated with this scatterplot?
Response:

In association rules, given X -> Y, what is confidence?
Response: