Hadoop Performance Tuning mainly includes Monitoring basic system resources on Hadoop cluster nodes such as CPU utilization and average disk data transfer rates helps to understand the overall utilization of these hardware resources and identify any bottlenecks while diagnosing performance issues. Monitoring a Hadoop cluster includes monitoring the usage of system resources on cluster nodes along with monitoring the key service metrics. The most commonly monitored resources are I/O bandwidth, number of disk I/O operations per second, average data transfer rate, network latency, and average memory and swap space utilization.
Hadoop Performance Tuning and monitoring suggests collecting performance counters’ data in order to determine whether the response times of various tasks lie within acceptable execution time range. The average percentage utilization for MapReduce tasks and HDFS storage capacity over time indicates whether your cluster’s resources are used
optimally or are underused.
Hadoop offers a substantial number of metrics and information sources for monitoring and debugging of Hadoop services. It requires correlating and collecting these system and service metrics from the cluster nodes to analyze the overall state of the Hadoop cluster along with diagnosing any problems that are discovered.
You may enhance your monitoring experience by using proven open source monitoring systems such as Chukwa, Ganglia, Nagios, and Ambari (a nonexhaustive list) to consolidate various metrics and information sources provided
by Hadoop into more meaningful service-specific summary, graphs, and alerts.
Using Chukwa to monitor Hadoop
Chukwa is an open source data collection system for monitoring and analyzing large distributed systems. It is built on top of Hadoop and includes a powerful and flexible toolkit for monitoring, analyzing, and viewing results. Many components of Chukwa are pluggable, allowing easy customization and enhancement. It provides a standardized framework for processing the collected data and can scale to thousands of nodes in both collection and analysis capacities.
Using Ganglia to monitor Hadoop
Ganglia was originally developed at the University of California, Berkeley. Its purpose is to provide a robust and resourceconsuming solution to monitor a computing cluster’s performance. This cluster can contain hundreds or thousands of nodes. Basically, Ganglia collects high-level variables such as CPU utilization and free disk space for each monitored node. Also, it can be used to monitor failed cluster nodes.
The current Hadoop version has built-in support for Ganglia (version 3.0+). It is a highly scalable cluster monitoring tool that provides graphical view information about the state of a single cluster or set of clusters, or individual machines in a cluster.
Ganglia’s architecture and implementation on Hadoop supports federations of clusters, monitoring the state within each cluster and aggregating those states. The architecture includes a Ganglia Collector that runs monitoring daemons and collects metrics for each cluster. It also runs a meta daemon that aggregates the metrics for all
clusters. The Ganglia Collector provides a web user interface that presents real-time dynamic views of memory usage, disk usage, network statistics, running processes, and other metrics.
Using Nagios to monitor Hadoop
Nagios is a popular open source monitoring tool system, which is heavily used in High Performance Computing (HPC) and other environments, and is designed to obtain system resources metrics. You can use it to monitor your Hadoop cluster resources and the status of applications and operating system attributes, such as CPU usage, disk space, and memory utilization.
Nagios has an integrated built-in notification system that focuses on alerting rather than gathering and tracking system metrics (such as Ganglia). The current version of Nagios allows you to run agents on target hosts and provides a flexible and customizable framework for collecting metrics and information data about the state
of your Hadoop cluster.
Nagios can be used to address different monitoring perspectives:
• Getting instant information about your Hadoop infrastructure organization
• Raising and receiving alerts on system failures
• Analyzing, reporting, and producing graphs on cluster utilization and making decisions about future hardware acquisitions
• Detecting and anticipating future issues
• Monitoring how exhausted the queues are and finding the availability of nodes for running the jobs
Using Apache Ambari to monitor Hadoop
The Apache Ambari project simplifies Hadoop management and cluster monitoring. Its primary goal is to simplify the deployment and management of Hadoop clusters in multi-instance environments. Ambari provides a set of intuitive and easy-to-use tools to monitor Hadoop clusters,
hiding the complexities of the Hadoop framework. It exposes RESTful APIs for administrators to allow integration with other system(s). Furthermore, Ambari relies on measures from Ganglia and Nagios for an alert system function to send e-mails to the attention of the administrator when required (for example, when a node fails, the remaining disk space is low, and so on). Additionally, Ambari supports Hadoop security by supporting installation of secure (Kerberos-based) Hadoop clusters, providing role-based user authentication, authorization, auditing, and integration with LDAP and Active Directory for user management.