DEA-7TT2試験無料問題集「EMC Associate - Data Science and Big Data Analytics v2 認定」


A data scientist is preparing a presentation for a meeting with the project's business sponsors. The distribution of per-sale revenue is an important finding from the analysis. The graphics illustrate four ways to plot the per-sale revenue distribution." Which graphic is most appropriate for the sponsor presentation?
Response:

Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
You decide to run the association rules algorithm where minimum support is 50%. Which rule has a confidence at least 50%?
Response:

A study was run to identify general dietary patterns among the residents of a small town. Twelve thousand people were surveyed and the data was subject to K-means clustering. In one of the iterations, there were six clusters formed with 38, 1560, 1799, 2560, 2893, and 3150 respondents.
What should be the next step in identifying optimal clusters?
Response:

What is the primary function of the NameNode in Hadoop?
Response:

Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?
Response:

You have been assigned to perform a study of the daily revenue effect of a pricing model of online transactions. All data currently available to you has been loaded into your analytics database. This includes revenue data, pricing data, and online transaction data.
You discover that all data comes in different levels of granularity. The transaction data has timestamps consisting of day, hour, minutes, and seconds. Pricing is stored at the daily level and revenue data is only reported monthly.
What is the next step?
Response:

Imagine you are trying to hire a Data Scientist for your team. In addition to technical ability and quantitative background, which additional essential trait would you look for in people applying for this position?
Response:

You are assigned the task of creating customer profiles for your company. In your database, you have 25 key input variables that come together to define 2,500 customers. You decide to run a K-means cluster analysis on the 25 input variables based on k=4 to build your profiles.
Your analysis resulted in four cluster populations:
Cluster A=1,000 customers
Cluster B=560 customers
Cluster C=925 customers
Cluster D=15 customers
What should be attempted first to more evenly distribute the customer population across clusters?
Response:

When creating a project sponsor presentation, what is the main objective?
Response:

In which phase of the analytic lifecycle would you expect to spend most of the project time?
Response:

Your company has 3 different sales teams. Each team's sales manager has developed incentive offers to increase the size of each sales transaction.
Any sales manager whose incentive program can be shown to increase the size of the average sales transaction will receive a bonus. Data are available for the number and average sale amount for transactions offering one of the incentives as well as transactions offering no incentive.
The VP of Sales has asked you to determine analytically if any of the incentive programs has resulted in a demonstrable increase in the average sale amount.
Which analytical technique would be appropriate in this situation?
Response:

Which word or phrase completes the statement; "A data scientist would consider a RDBMS is to a table as R is to a_____."?
Response:

Refer to the exhibit.

In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan, and the blue represents borrowers that are known to have defaulted on their loan.
Which analytical method could produce the probabilities needed to build this exhibit?
Response:

What is the format of the output from the Map function of MapReduce?
Response:

In data visualization, what is used to focus the audience on a key part of a chart?
Response:

How does Pig's use of a schema differ from that of a traditional RDBMS?
Response:

In data visualization, which type of chart is recommended to represent frequency data?
Response:

There are three criterions for big data analytics projects which include:
- Decision speed
- Analysis flexibility
What is the additional criteria?
Response:

Refer to the exhibit.

You have run a linear regression model against your data, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?
Response:

How are window functions different from regular aggregate functions?
Response: