CCA175試験無料問題集（96題）「Cloudera CCA Spark and Hadoop Developer 認定」

出題：1

CORRECT TEXT
Problem Scenario 60 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3} val b = a.keyBy(_.length) val c = sc.parallelize(List("dog","cat","gnu","salmon","rabbit","turkey","woif","bear","bee"), 3) val d = c.keyBy(_.length) operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)),
(6,(salmon,turkey)), (6,(salmon,salmon)), (6,(salmon,rabbit)),
(6,(salmon,turkey)), (3,(dog,dog)), (3,(dog,cat)), (3,(dog,gnu)), (3,(dog,bee)), (3,(rat,dog)),
(3,(rat,cat)), (3,(rat,gnu)), (3,(rat,bee)))

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
solution:
b.join(d).collect
join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples
(key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the original value of the newly created tuples.

出題：2

CORRECT TEXT
Problem Scenario 16 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish below assignment.
1. Create a table in hive as below.
create table departments_hive(department_id int, department_name string);
2. Now import data from mysql table departments to this hive table. Please make sure that data should be visible using below hive command, select" from departments_hive

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create hive table as said.
hive
show tables;
create table departments_hive(department_id int, department_name string);
Step 2 : The important here is, when we create a table without delimiter fields. Then default delimiter for hive is ^A (\001). Hence, while importing data we have to provide proper delimiter.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~ username=retail_dba \
-password=cloudera \
--table departments \
--hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
--hive-table departments_hive \
--fields-terminated-by '\001'
Step 3 : Check-the data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive
hdfs dfs -cat/user/hive/warehouse/departmentshive/part'
Check data in hive table.
Select * from departments_hive;

出題：3

CORRECT TEXT
Problem Scenario 14 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. Create a csv file named updated_departments.csv with the following contents in local file system.
updated_departments.csv
2 ,fitness
3 ,footwear
1 2,fathematics
1 3,fcience
1 4,engineering
1 000,management
2. Upload this csv file to hdfs filesystem,
3. Now export this data from hdfs to mysql retaildb.departments table. During upload make sure existing department will just updated and new departments needs to be inserted.
4. Now update updated_departments.csv file with below content.
2 ,Fitness
3 ,Footwear
1 2,Fathematics
1 3,Science
1 4,Engineering
1 000,Management
2 000,Quality Check
5. Now upload this file to hdfs.
6. Now export this data from hdfs to mysql retail_db.departments table. During upload make sure existing department will just updated and no new departments needs to be inserted.

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create a csv tile named updateddepartments.csv with give content.
Step 2 : Now upload this tile to HDFS.
Create a directory called newdata.
hdfs dfs -mkdir new_data
hdfs dfs -put updated_departments.csv newdata/
Step 3 : Check whether tile is uploaded or not. hdfs dfs -Is new_data
Step 4 : Export this file to departments table using sqoop.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retail_dba \
--password cloudera \
-table departments \
--export-dir new_data \
-batch \
-m 1 \
-update-key department_id \
-update-mode allowinsert
Step 5 : Check whether required data upsert is done or not. mysql --user=retail_dba - password=cloudera show databases; use retail_db;
show tables;
select" from departments;
Step 6 : Update updated_departments.csv file.
Step 7 : Override the existing file in hdfs.
hdfs dfs -put updated_departments.csv newdata/
Step 8 : Now do the Sqoop export as per the requirement.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retail_dba\
--password cloudera \
--table departments \
--export-dir new_data \
--batch \
-m 1 \
--update-key-department_id \
-update-mode updateonly
Step 9 : Check whether required data update is done or not. mysql --user=retail_dba - password=cloudera show databases; use retail db;
show tables;
select" from departments;

出題：4

CORRECT TEXT
Problem Scenario 20 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.categories
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. Write a Sqoop Job which will import "retaildb.categories" table to hdfs, in a directory name "categories_targetJob".

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Connecting to existing MySQL Database mysql -user=retail_dba -- password=cloudera retail_db
Step 2 : Show all the available tables show tables;
Step 3 : Below is the command to create Sqoop Job (Please note that - import space is mandatory) sqoop job -create sqoopjob \ -- import \
-connect "jdbc:mysql://quickstart:3306/retail_db" \
-username=retail_dba \
-password=cloudera \
-table categories \
-target-dir categories_targetJob \
-fields-terminated-by '|' \
-lines-terminated-by '\n'
Step 4 : List all the Sqoop Jobs sqoop job --list
Step 5 : Show details of the Sqoop Job sqoop job --show sqoopjob
Step 6 : Execute the sqoopjob sqoopjob --exec sqoopjob
Step 7 : Check the output of import job
hdfs dfs -Is categories_target_job
hdfs dfs -cat categories_target_job/part*

出題：5

CORRECT TEXT
Problem Scenario 2 :
There is a parent organization called "ABC Group Inc", which has two child companies named Tech Inc and MPTech.
Both companies employee information is given in two separate text file as below. Please do the following activity for employee details.
Tech Inc.txt
1,Alok,Hyderabad
2,Krish,Hongkong
3,Jyoti,Mumbai
4 ,Atul,Banglore
5 ,Ishan,Gurgaon
MPTech.txt
6 ,John,Newyork
7 ,alp2004,California
8 ,tellme,Mumbai
9 ,Gagan21,Pune
1 0,Mukesh,Chennai
1 . Which command will you use to check all the available command line options on HDFS and How will you get the Help for individual command.
2. Create a new Empty Directory named Employee using Command line. And also create an empty file named in it Techinc.txt
3. Load both companies Employee data in Employee directory (How to override existing file in HDFS).
4. Merge both the Employees data in a Single tile called MergedEmployee.txt, merged tiles should have new line character at the end of each file content.
5. Upload merged file on HDFS and change the file permission on HDFS merged file, so that owner and group member can read and write, other user can read the file.
6. Write a command to export the individual file as well as entire directory from HDFS to local file System.

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Check All Available command hdfs dfs
Step 2 : Get help on Individual command hdfs dfs -help get
Step 3 : Create a directory in HDFS using named Employee and create a Dummy file in it called e.g. Techinc.txt hdfs dfs -mkdir Employee
Now create an emplty file in Employee directory using Hue.
Step 4 : Create a directory on Local file System and then Create two files, with the given data in problems.
Step 5 : Now we have an existing directory with content in it, now using HDFS command line , overrid this existing Employee directory. While copying these files from local file
System to HDFS. cd /home/cloudera/Desktop/ hdfs dfs -put -f Employee
Step 6 : Check All files in directory copied successfully hdfs dfs -Is Employee
Step 7 : Now merge all the files in Employee directory, hdfs dfs -getmerge -nl Employee
MergedEmployee.txt
Step 8 : Check the content of the file. cat MergedEmployee.txt
Step 9 : Copy merged file in Employeed directory from local file ssytem to HDFS. hdfs dfs - put MergedEmployee.txt Employee/
Step 10 : Check file copied or not. hdfs dfs -Is Employee
Step 11 : Change the permission of the merged file on HDFS hdfs dfs -chmpd 664
Employee/MergedEmployee.txt
Step 12 : Get the file from HDFS to local file system, hdfs dfs -get Employee
Employee_hdfs

出題：6

CORRECT TEXT
Problem Scenario 83 : In Continuation of previous question, please accomplish following activities.
1. Select all the records with quantity >= 5000 and name starts with 'Pen'
2. Select all the records with quantity >= 5000, price is less than 1.24 and name starts with
'Pen'
3. Select all the records witch does not have quantity >= 5000 and name does not starts with 'Pen'
4. Select all the products which name is 'Pen Red', 'Pen Black'
5. Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity
BETWEEN 1000 AND 2000.

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Select all the records with quantity >= 5000 and name starts with 'Pen' val results = sqlContext.sql(......SELECT * FROM products WHERE quantity >= 5000 AND name LIKE 'Pen %.......) results.show()
Step 2 : Select all the records with quantity >= 5000 , price is less than 1.24 and name starts with 'Pen' val results = sqlContext.sql(......SELECT * FROM products WHERE quantity >= 5000 AND price < 1.24 AND name LIKE 'Pen %.......) results. showQ
Step 3 : Select all the records witch does not have quantity >= 5000 and name does not starts with 'Pen' val results = sqlContext.sql('.....SELECT * FROM products WHERE NOT (quantity >= 5000
AND name LIKE 'Pen %')......)
results. showQ
Step 4 : Select all the products wchich name is 'Pen Red', 'Pen Black'
val results = sqlContext.sql('.....SELECT' FROM products WHERE name IN ('Pen Red',
'Pen Black')......)
results. showQ
Step 5 : Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity
BETWEEN 1000 AND 2000.
val results = sqlContext.sql(......SELECT * FROM products WHERE (price BETWEEN 1.0
AND 2.0) AND (quantity BETWEEN 1000 AND 2000)......)
results. show()

出題：7

CORRECT TEXT
Problem Scenario 65 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "cat", "owl", "gnu", "ant"), 2)
val b = sc.parallelize(1 to a.count.tolnt, 2)
val c = a.zip(b)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Int)] = Array((owl,3), (gnu,4), (dog,1), (cat,2>, (ant,5))

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution : c.sortByKey(false).collect
sortByKey [Ordered] : This function sorts the input RDD's data and stores it in a new RDD.
"The output RDD is a shuffled RDD because it stores data that is output by a reducer which has been shuffled. The implementation of this function is actually very clever.
First, it uses a range partitioner to partition the data in ranges within the shuffled RDD.
Then it sorts these ranges individually with mapPartitions using standard sort mechanisms.

出題：8

CORRECT TEXT
Problem Scenario 67 : You have been given below code snippet.
lines = sc.parallelize(['lts fun to have fun,','but you have to know how.'])
M = lines.map( lambda x: x.replace(',7 ').replace('.',' 'J.replaceC-V ').lower()) r2 = r1.flatMap(lambda x: x.split()) r3 = r2.map(lambda x: (x, 1)) operation1
r5 = r4.map(lambda x:(x[1],x[0]))
r6 = r5.sortByKey(ascending=False)
r6.take(20)
Write a correct code snippet for operationl which will produce desired output, shown below.
[(2, 'fun'), (2, 'to'), (2, 'have'), (1, its'), (1, 'know1), (1, 'how1), (1, 'you'), (1, 'but')]

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
r4 = r3.reduceByKey(lambda x,y:x+y)

出題：9

CORRECT TEXT
Problem Scenario 94 : You have to run your Spark application on yarn with each executor
20GB and number of executors should be 50. Please replace XXX, YYY, ZZZ export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
-class com.hadoopexam.MyTask \
xxx\
-deploy-mode cluster \ # can be client for client mode
YYY\
2 22 \
/path/to/hadoopexam.jar \
1 000

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution
XXX: -master yarn
YYY : -executor-memory 20G
ZZZ: -num-executors 50

出題：10

CORRECT TEXT
Problem Scenario 85 : In Continuation of previous question, please accomplish following activities.
1. Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price'
2. Select code and name both separated by ' -' and header name should be Product
Description'.
3. Select all distinct prices.
4 . Select distinct price and name combination.
5 . Select all price data sorted by both code and productID combination.
6 . count number of products.
7 . Count number of products for each code.

正解：

See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Select all the columns from product table with output header as below. productID
AS ID code AS Code name AS Description price AS "Unit Price'
val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS
Description, price AS Unit Price' FROM products ORDER BY ID"""
results.show()
Step 2 : Select code and name both separated by ' -' and header name should be "Product
Description.
val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description, price FROM products""" ) results.showQ
Step 3 : Select all distinct prices.
val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price" FROM products......) results.show()
Step 4 : Select distinct price and name combination.
val results = sqlContext.sql(......SELECT DISTINCT price, name FROM products""" ) results. showQ
Step 5 : Select all price data sorted by both code and productID combination.
val results = sqlContext.sql('.....SELECT' FROM products ORDER BY code, productID'.....) results.show()
Step 6 : count number of products.
val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......) results.show()
Step 7 : Count number of products for each code.
val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY code......) results. showQ val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products
GROUP BY code ORDER BY count DESC......)
results. showQ

CCA175試験無料問題集「Cloudera CCA Spark and Hadoop Developer 認定」