DEA-7TT2試験無料問題集「EMC Associate - Data Science and Big Data Analytics v2 認定」

Which analytic technique would be appropriate to estimate blood pressure based on age and weight?
Response:

You have just completed the Discovery phase of a project and finished interviewing the main stakeholders. You have identified the necessary data feeds and are now beginning to set up the analytic sandbox. What is the next step?
Response:

You have been assigned to do a study of the daily revenue effect of a pricing model of online transactions. All the data currently available to you has been loaded into your analytics database; revenue data, pricing data, and online transaction data.
You find that all the data comes in different levels of granularity. The transaction data has timestamps (day, hour, minutes, seconds), pricing is stored at the daily level, and revenue data is only reported monthly.
What is your next step?
Response:

Which word or phrase completes the statement; "Discovering relationships is to Association Rules as generating forecasts is to __________."?
Response:

A call center for a large electronics company handles an average of 35, 000 support calls a day. The head of the call center would like to optimize the staffing of the call center during the rollout of a new product due to recent customer complaints of long wait times.
You have been asked to create a model to optimize call center costs and customer wait times. The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative to old products.
3. Historically, what time of day does the call center need to be most heavily staffed?
4. Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?
Response:

Since R factors are categorical variables, they are most closely related to which data classification level?
Response:

You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners".
What can you determine from the lift calculation?
Response:

You have scored your Naive Bayesian Classifier model on "hold out" test data for cross validation. You have determined the way the samples scored and have tabulated them as shown in the exhibit.

What are the Precision and Recall rates of the model?
Response:

Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?
Response:

The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop.
Which tool should they use?
Response:

Which SQL OLAP grouping extension is used to provide hierarchical groupings without examining all possible combinations?
Response:

If R factors are categorical variables, which data classification level are they most closely related?
Response:

Refer to the exhibit.

Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents for the topic "solid state disk".
In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term's frequency in four documents selected from corpus.
Which of the four documents is most relevant to the analyst's search?
Response:

Which clause is required by all window functions?
Response:

Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
Response:

Which word or phrase completes the statement; "A theater actor is to 'artistic and expressive' as a data scientist is to."?
Response:

Consider the example of an analysis for fraud detection on credit card usage. You will need to ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your data for analysis, and not dropped as outliers during pre-processing.
What will be your approach for loading data into the analytical sandbox for this analysis?
Response: