Elastic Map Reduce (EMR)

Overview

Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.

Setup

Installation

If you haven’t already, first set up the AWS CloudWatch integration. No additional steps are needed for installation.

Configuration

On the AWS CloudWatch integration page, ensure that the EMR service is selected for metric collection.

Metrics

Hadoop 1 AMI’s

Name Description
IsIdle
(boolean)
Indicates that a cluster is no longer performing work, but is still
alive and accruing charges. It is set to 1 if no tasks are running
and no jobs are running, and set to 0 otherwise.
JobsRunning
(count)
The number of jobs in the cluster that are currently running.
JobsFailed
(count)
The number of jobs in the cluster that have failed.
MapTasksRunning
(count)
The number of running map tasks for each job.
MapTasksRemaining
(count)
The number of remaining map tasks for each job.
MapSlotsOpen
(count)
The unused map task capacity. This is calculated as the maximum
number of map tasks for a given cluster, less the total number of
map tasks currently running in that cluster.
RemainingMapTasksPerSlot
(ratio)
The ratio of the total map tasks remaining to the total map slots
available in the cluster.
ReduceTasksRunning
(count)
The number of running reduce tasks for each job.
ReduceTasksRemaining
(count)
The number of remaining reduce tasks for each job.
ReduceSlotsOpen
(count)
Unused reduce task capacity. This is calculated as the maximum
reduce task capacity for a given cluster, less the number of reduce
tasks currently running in that cluster.
CoreNodesRunning
(count)
The number of core nodes working.
CoreNodesPending
(count)
The number of core nodes waiting to be assigned.
LiveDataNodes
(percent)
The percentage of data nodes that are receiving work from Hadoop.
TaskNodesRunning
(count)
The number of task nodes working.
TaskNodesPending
(count)
The number of core nodes waiting to be assigned.
LiveTaskTrackers
(percent)
The percentage of task trackers that are functional.
S3BytesWritten
(bytes)
The number of bytes written to Amazon S3.
S3BytesRead
(bytes)
The number of bytes read from Amazon S3.
HDFSUtilization
(percent)
The percentage of HDFS storage currently used.
HDFSBytesRead
(bytes)
The number of bytes read from HDFS.
HDFSBytesWritten
(bytes)
The number of bytes written to HDFS.
MissingBlocks
(count)
The number of blocks in which HDFS has no replicas. These might be
corrupt blocks.
TotalLoad
(count)
The current, total number of readers and writers reported by all
DataNodes in a cluster.
BackupFailed
(boolean)
Whether the last backup failed. This is set to 0 by default and
updated to 1 if the previous backup attempt failed.
MostRecentBackupDuration
(minutes)
The amount of time it took the previous backup to complete.
TimeSinceLastSuccessfulBackup
(minutes)
The number of elapsed minutes after the last successful HBase
backup started on your cluster.

Hadoop 2 AMI’s

Name Description
IsIdle
(boolean)
Indicates that a cluster is no longer performing work, but is still
alive and accruing charges. It is set to 1 if no tasks are running
and no jobs are running, and set to 0 otherwise.
ContainerAllocated
(count)
The number of resource containers allocated by the ResourceManager.
ContainerReserved
(count)
The number of containers reserved.
ContainerPending
(count)
The number of containers in the queue that have not yet been
allocated.
ContainerPendingRatio
(ratio)
The ratio of pending containers to containers allocated.
AppsCompleted
(count)
The number of applications submitted to YARN that have completed.
AppsFailed
(count)
The number of applications submitted to YARN that have failed to
complete.
AppsKilled
(count)
The number of applications submitted to YARN that have been killed.
AppsPending
(count)
The number of applications submitted to YARN that are in a pending
state.
AppsRunning
(count)
The number of applications submitted to YARN that are running.
AppsSubmitted
(count)
The number of applications submitted to YARN.
CoreNodesRunning
(count)
The number of core nodes working.
CoreNodesPending
(count)
The number of core nodes waiting to be assigned.
LiveDataNodes
(percent)
The percentage of data nodes that are receiving work from Hadoop.
MRTotalNodes
(count)
The number of nodes presently available to MapReduce jobs.
MRActiveNodes
(count)
The number of nodes presently running MapReduce tasks or jobs.
MRLostNodes
(count)
The number of nodes allocated to MapReduce that have been marked in
a LOST state.
MRUnhealthyNodes
(count)
The number of nodes available to MapReduce jobs marked in an
UNHEALTHY state.
MRDecommissionedNodes
(count)
The number of nodes allocated to MapReduce applications that have
been marked in a DECOMMISSIONED state.
MRRebootedNodes
(count)
The number of nodes available to MapReduce that have been rebooted
and marked in a REBOOTED state.
S3BytesWritten
(bytes)
The number of bytes written to Amazon S3.
S3BytesRead
(bytes)
The number of bytes read from Amazon S3.
HDFSUtilization
(percent)
The percentage of HDFS storage currently used.
HDFSBytesRead
(bytes)
The number of bytes read from HDFS. This metric aggregates
MapReduce jobs only, and does not apply for other workloads on EMR.
HDFSBytesWritten
(bytes)
The number of bytes written to HDFS. This metric aggregates
MapReduce jobs only, and does not apply for other workloads on EMR.
MissingBlocks
(count)
The number of blocks in which HDFS has no replicas. These might be
corrupt blocks.
CorruptBlocks
(count)
The number of blocks that HDFS reports as corrupted.
TotalLoad
(count)
The total number of concurrent data transfers.
MemoryTotalMB
(bytes)
The total amount of memory in the cluster.
MemoryReservedMB
(bytes)
The amount of memory reserved.
MemoryAvailableMB
(bytes)
The amount of memory available to be allocated.
YARNMemoryAvailablePercentage
(percent)
The percentage of remaining memory available to YARN
(YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB)
MemoryAllocatedMB
(bytes)
The amount of memory allocated to the cluster.
PendingDeletionBlocks
(count)
The number of blocks marked for deletion.
UnderReplicatedBlocks
(count)
The number of blocks that need to be replicated one or more times.
DfsPendingReplicationBlocks
(count)
The status of block replication: blocks being replicated, age of
replication requests, and unsuccessful replication requests.
CapacityRemainingGB
(bytes)
The amount of remaining HDFS disk capacity.
HbaseBackupFailed
(count)
Whether the last backup failed. This is set to 0 by default and
updated to 1 if the previous backup attempt failed. This metric is
only reported for HBase clusters.
MostRecentBackupDuration
(minutes)
The amount of time it took the previous backup to complete. This
metric is set regardless of whether the last completed backup
succeeded or failed. While the backup is ongoing, this metric
returns the number of minutes after the backup started. This metric
is only reported for HBase clusters.
TimeSinceLastSuccessfulBackup
(minutes)
The number of elapsed minutes after the last successful HBase
backup started on your cluster. This metric is only reported for
HBase clusters.

Available Tags

Name Description
awsaccount AWS account associated with the metrics
clusterid ID of the cluster
jobflowid ID of the job flow
jobid ID of the job
region Name of the region