MapReduce process on Yarn

By | October 29, 2014
Share the joy
  •  
  •  
  •  
  •  
  •  
  •  


1. Client submits a task to ResourceManager
2. ApplicationsManager in RM will asign this task to a Node Manager. It will launch a MapReduceApplicationsMaster. MapReduceApplicationMaster in this NodeManager will be totally responsible for this task.
3. MRApplicationMaster calculates how many CPU/Memory are needed for this task, and response to ApplicationsManager in RM.
4. Resource Scheduler knows all the resources usage in Cluster. It will tell MRApplicationMaster from which NodeManager you can get the resources you need.
5. MRApplicationMaster send the task to these NodeManagers.
6. Each NodeManger launch MapTask, ReduceTask to calculate the task. Here, we have a concept — Container. Container describes the resources, including CPU/Memory. In order to MapTask, ReduceTask, it requires CPU/Memory to run them. So MapTask, ReduceTask are type of containers.
7. When each NodeManger is running, they report to MRAapplicationMaster its current status.
8. After all NodeManager finish the work, it send the final result to ApplicationsManager