The emergence of HFS solves the need to store a large number of small files (below 10MB) in HDFS. At the same time, it is necessary to store some mixed scenes of large files (above 10MB)
Spark Streaming? The fault tolerance mechanism means that if any partition in the RDD fails, it can be recalculated and generated according to its parent RDD. If the parent RDD is lost, you need to go to the disk to find the original data.