记一次MR报错:Container is running beyond physical memory limits,Current usage...

背景:
使用sqoop将业务系统的mysql表导出为Hive表,遇到如下错误(之前也遇到过,但没有记录,本次着重记录下)(集群为CDH5.13.0)


1、详细错误记录

注:从上往下慢慢看下去会有收获的!

Warning: /home/cdh/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/05/15 09:36:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
18/05/15 09:36:43 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/15 09:36:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/05/15 09:36:43 INFO tool.CodeGenTool: Beginning code generation
18/05/15 09:36:43 INFO tool.CodeGenTool: Will generate java class as codegen_iga_leads_basic_info_h
Tue May 15 09:36:44 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
18/05/15 09:36:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `iga_leads_basic_info_h` AS t LIMIT 1
18/05/15 09:36:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `iga_leads_basic_info_h` AS t LIMIT 1
18/05/15 09:36:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/cdh/cloudera/parcels/CDH/lib/hadoop-mapreduce
注: /tmp/sqoop-root/compile/6e080911392458107d06fefd05ba56ac/codegen_iga_leads_basic_info_h.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
18/05/15 09:36:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6e080911392458107d06fefd05ba56ac/codegen_iga_leads_basic_info_h.jar
18/05/15 09:36:46 INFO tool.ImportTool: Destination directory iga_leads_basic_info_h is not present, hence not deleting.
18/05/15 09:36:46 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/05/15 09:36:46 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/05/15 09:36:46 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/05/15 09:36:46 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/05/15 09:36:46 INFO mapreduce.ImportJobBase: Beginning import of iga_leads_basic_info_h
18/05/15 09:36:46 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
Tue May 15 09:36:47 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
18/05/15 09:36:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `iga_leads_basic_info_h` AS t LIMIT 1
18/05/15 09:36:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `iga_leads_basic_info_h` AS t LIMIT 1
18/05/15 09:36:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://cdh1:9083
18/05/15 09:36:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/05/15 09:36:47 INFO hive.metastore: Connected to metastore.
18/05/15 09:36:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/05/15 09:36:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://cdh1:9083
18/05/15 09:36:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/05/15 09:36:47 INFO hive.metastore: Connected to metastore.
18/05/15 09:36:48 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/05/15 09:36:48 INFO hive.metastore: Trying to connect to metastore with URI thrift://cdh1:9083
18/05/15 09:36:48 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/05/15 09:36:48 INFO hive.metastore: Connected to metastore.
18/05/15 09:36:48 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/05/15 09:36:48 INFO hive.metastore: Trying to connect to metastore with URI thrift://cdh1:9083
18/05/15 09:36:48 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/05/15 09:36:48 INFO hive.metastore: Connected to metastore.
18/05/15 09:36:48 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: iga_leads_basic_info_h
18/05/15 09:36:48 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/05/15 09:36:48 INFO hive.metastore: Trying to connect to metastore with URI thrift://cdh1:9083
18/05/15 09:36:48 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/05/15 09:36:48 INFO hive.metastore: Connected to metastore.
18/05/15 09:36:49 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/15 09:36:49 INFO client.RMProxy: Connecting to ResourceManager at cdh1/10.3.1.8:8032
Tue May 15 09:36:59 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
18/05/15 09:36:59 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/15 09:36:59 INFO mapreduce.JobSubmitter: number of splits:1
18/05/15 09:36:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525923400969_0023
18/05/15 09:36:59 INFO impl.YarnClientImpl: Submitted application application_1525923400969_0023
18/05/15 09:36:59 INFO mapreduce.Job: The url to track the job: http://cdh1:8088/proxy/application_1525923400969_0023/
18/05/15 09:36:59 INFO mapreduce.Job: Running job: job_1525923400969_0023
18/05/15 09:37:11 INFO mapreduce.Job: Job job_1525923400969_0023 running in uber mode : false
18/05/15 09:37:11 INFO mapreduce.Job:  map 0% reduce 0%
18/05/15 09:38:15 INFO mapreduce.Job: Task Id : attempt_1525923400969_0023_m_000000_0, Status : FAILED
Container [pid=27842,containerID=container_1525923400969_0023_01_000002] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1525923400969_0023_01_000002 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 28027 27842 27842 27842 (java) 10464 706 2861776896 266585 /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_0 2 
    |- 27842 27839 27842 27842 (bash) 3 3 115859456 361 /bin/bash -c /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_0 2 1>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000002/stdout 2>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000002/stderr  

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

18/05/15 09:39:21 INFO mapreduce.Job: Task Id : attempt_1525923400969_0023_m_000000_1, Status : FAILED
Container [pid=32561,containerID=container_1525923400969_0023_01_000003] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1525923400969_0023_01_000003 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 32561 32559 32561 32561 (bash) 3 2 115859456 361 /bin/bash -c /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000003/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000003 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_1 3 1>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000003/stdout 2>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000003/stderr  
    |- 32746 32561 32561 32561 (java) 10706 764 2856566784 270760 /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000003/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000003 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_1 3 

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

18/05/15 09:40:30 INFO mapreduce.Job: Task Id : attempt_1525923400969_0023_m_000000_2, Status : FAILED
Container [pid=89668,containerID=container_1525923400969_0023_01_000004] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1525923400969_0023_01_000004 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 89966 89668 89668 89668 (java) 10823 859 2856423424 267248 /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_2 4 
    |- 89668 89666 89668 89668 (bash) 3 3 115859456 361 /bin/bash -c /usr/java/jdk1.8.0_111/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/home/yarn/nm/usercache/root/appcache/application_1525923400969_0023/container_1525923400969_0023_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.3.1.9 59062 attempt_1525923400969_0023_m_000000_2 4 1>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000004/stdout 2>/yarn/container-logs/application_1525923400969_0023/container_1525923400969_0023_01_000004/stderr  

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

挑出重点:
Container [pid=27842,containerID=container_1525923400969_0023_01_000002] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8 GB of 2.1 GB virtual memory used. Killing container.


2、错误分析

1.0 GB:任务所占的物理内存
1GB: mapreduce.map.memory.mb 参数默认设置大小
2.8 GB:程序占用的虚拟内存
2.1 GB: mapreduce.map.memory.mb 乘以 yarn.nodemanager.vmem-pmem-ratio 得到的

其中 yarn.nodemanager.vmem-pmem-ratio 是 虚拟内存和物理内存比例,在yarn-site.xml中设置,默认是2.1

很明显,container占用了2.8G的虚拟内存,但是分配给container的却只有2.1GB。所以kill掉了这个container

上面只是map中产生的报错,当然也有可能在reduce中报错,如果是reduce中,那么就是mapreduce.reduce.memory.db * yarn.nodemanager.vmem-pmem-ratio

注:
物理内存:真实的硬件设备(内存条)
虚拟内存:利用磁盘空间虚拟出的一块逻辑内存,用作虚拟内存的磁盘空间被称为交换空间(Swap Space)。(为了满足物理内存的不足而提出的策略)
linux会在物理内存不足时,使用交换分区的虚拟内存。内核会将暂时不用的内存块信息写到交换空间,这样以来,物理内存得到了释放,这块内存就可以用于其它目的,当需要用到原始的内容时,这些信息会被重新从交换空间读入物理内存。


3、解决办法

1. 取消虚拟内存的检查(不建议):
在yarn-site.xml或者程序中中设置yarn.nodemanager.vmem-check-enabled为false

<property>
  <name>yarn.nodemanager.vmem-check-enabledname>
  <value>falsevalue>
  <description>Whether virtual memory limits will be enforced for containers.description>
property>

除了虚拟内存超了,也有可能是物理内存超了,同样也可以设置物理内存的检查为 yarn.nodemanager.pmem-check-enabled :false
个人认为这种办法并不太好,如果程序有内存泄漏等问题,取消这个检查,可能会导致集群崩溃。

2. 增大mapreduce.map.memory.mb 或者 mapreduce.map.memory.mb (建议)
个人觉得是一个办法,应该优先考虑这种办法,这种办法不仅仅可以解决虚拟内存,或许大多时候都是物理内存不够了,这个办法正好适用
记一次MR报错:Container is running beyond physical memory limits,Current usage..._第1张图片
记一次MR报错:Container is running beyond physical memory limits,Current usage..._第2张图片

3. 适当增大 yarn.nodemanager.vmem-pmem-ratio的大小,为物理内存增大对应的虚拟内存, 但是这个参数也不能太离谱

4. 如果任务所占用的内存太过离谱,更多考虑的应该是程序是否有内存泄漏,是否存在数据倾斜等,优先程序解决此类问题

你可能感兴趣的:(大数据)