hadoop 使用DNS(FQDN)完全限定域名

先说结论:

1.配置DNS(略),正反解析都要配置

2.在集群上配置下面两个参数

hadoop.security.dns.interface

hadoop.security.dns.nameserver

过程:

  1. 原来集群上使用hosts文件的方法进行的域名映射,每次增加机器,或者新的业务(比如spark-sreaming新增加一个kafka读取实例,就要在全体集群上的nm节点上增加对应的hosts)。也曾经想过synchosts文件,但是总感觉有点有点问题。
     
  2. 因为配置每个机器上的host比较麻烦,所以使用了dns。因为有一台集群hosts中文件没有配置正确,hosts的解析内容和hostname没有对应正确,差了一个字符,kerberos一直报错。检查了好多次配置都没发现问题的原因(其实是host文件中有两个记录,前面一条是错误的配置)。
  3. 问题解决后留下了一个疑问, 为什么一定要在hosts中加入正确的域名配置才行,明明已经在dns中配置了正确的解析呢?正好有这个报错日志,可以去看看这个原因是什么
    2019-03-13 23:38:41,147 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/[email protected] (auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerb
    eros principal: yarn/[email protected], expecting: yarn/[email protected]
    2019-03-13 23:38:41,152 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over to rm127
    2019-03-13 23:38:41,155 WARN org.apache.hadoop.ipc.Client: Failed to connect to server: node0/192.167.1.246
    :8031: retries get failed due to exceeded maximum allowed retries number: 0
    java.net.ConnectException: Connection refused
            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
            at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
            at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
            at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
            at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
            at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
            at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
            at org.apache.hadoop.ipc.Client.call(Client.java:1480)
            at org.apache.hadoop.ipc.Client.call(Client.java:1441)
            at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
            at com.sun.proxy.$Proxy39.registerNodeManager(Unknown Source)
            at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClien
    tImpl.java:68)
    
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:260)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
            at com.sun.proxy.$Proxy40.registerNodeManager(Unknown Source)
            at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:275)
            at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:209)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
            at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:329)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:563)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
    2019-03-13 23:38:41,167 INFO org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking registerNodeManager of cl
    ass ResourceTrackerPBClientImpl over rm127 after 1 fail over attempts. Trying to fail over after sleeping for 255ms.

    这个是第一个异常栈没有意义,是个nodemanager服务启动的时候调用的一个回调函数,静态代码很难直接看到时哪里的的代码。但是给出的提示已经很充分了,principal拼错了,那么我们从loginuserfromkeytab的地方开始向下查找是哪里出错的。但是登陆的地方很多,最后也没找到是具体是哪里,这个不是我们关注的重点,就略过了后面有时间在研究。关键代码在这里

    String principalName = SecurityUtil.getServerPrincipal(principalConfig,
            hostname);

    DataNode的方法和这里还略有不同,这个是从yarn的目录下找到的,那么就去看看这个东西是怎么实现的。

  public static String getLocalHostName(@Nullable Configuration conf)
      throws UnknownHostException {
    if (conf != null) {
      String dnsInterface = conf.get(HADOOP_SECURITY_DNS_INTERFACE_KEY);
      String nameServer = conf.get(HADOOP_SECURITY_DNS_NAMESERVER_KEY);

      if (dnsInterface != null) {
        return DNS.getDefaultHost(dnsInterface, nameServer, true);
      } else if (nameServer != null) {
        throw new IllegalArgumentException(HADOOP_SECURITY_DNS_NAMESERVER_KEY +
            " requires " + HADOOP_SECURITY_DNS_INTERFACE_KEY + ". Check your" +
            "configuration.");
      }
    }

    // Fallback to querying the default hostname as we did before.
    return InetAddress.getLocalHost().getCanonicalHostName();
  }

可以通过参数配置来指定一个dnsserver, 就是我们前面提到那两个参数。那么我们就试试,结果还是不行....

既然我们找到了principal的方法,那么我们就手动的取这个内容

import java.util.ArrayList;
import java.util.LinkedList;
import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.List;
import org.apache.hadoop.net.DNS;
public class Address {
	public static void main(String[] args) throws IOException{
		InetAddress inetAddress;//声明InetAddress对象
		try {
			inetAddress=InetAddress.getLocalHost();//实例化InetAddress对象,返回本地主机
			String hostName=inetAddress.getHostName();//获取本地主机名
			String canonicalHostName=inetAddress.getCanonicalHostName();//获取此 IP地址的完全限定域名
                        String[]  addresses = DNS. getIPs("bond1");
                        for (int ctr = 0; ctr < addresses.length; ctr++) {
			   System.out.println("是否能到达此IP地址:"+addresses[ctr]);
}
			String x = InetAddress.getByName(addresses[0]).getHostAddress();
			System.out.println("是否能到达此IP地址:"+x);

                        String s = DNS.getDefaultHost("bond1", "192.167.1.246");
			System.out.println("DNS.getDefaultHost:"+s);
			byte[] address=inetAddress.getAddress();//获取原始IP地址
			int a=0;
			if(address[3]<0){
				a=address[3]+256;
			}
			String hostAddress=inetAddress.getHostAddress();//获取本地主机的IP地址
			boolean reachable=inetAddress.isReachable(2000);//获取布尔类型,看是否能到达此IP地址
			System.out.println(inetAddress.toString());
			System.out.println("主机名为:"+hostName);//输出本地主机名
			System.out.println("此IP地址的完全限定域名:"+canonicalHostName);//输出此IP地址的完全限定域名
			System.out.println("原始IP地址为:"+address[0]+"."+address[1]+"."+address[2]+"."+a);//输出本地主机的原始IP地址
			System.out.println("IP地址为:"+hostAddress);//输出本地主机的IP地址
			System.out.println("是否能到达此IP地址:"+reachable);

		} catch (UnknownHostException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

DNS.getDefaultHost 这个函数没有获取到我们期望的值。 

继续看下去发现这个里面有一段关键代码

public static String[] getHosts(String strInterface,
                                  @Nullable String nameserver,
                                  boolean tryfallbackResolution)
      throws UnknownHostException {
    final List hosts = new Vector();
    final List addresses =
        getIPsAsInetAddressList(strInterface, true);
    for (InetAddress address : addresses) {
      try {
        hosts.add(reverseDns(address, nameserver));
      } catch (NamingException ignored) {
      }
    }
    if (hosts.isEmpty() && tryfallbackResolution) {
      for (InetAddress address : addresses) {
        final String canonicalHostName = address.getCanonicalHostName();
        // Don't use the result if it looks like an IP address.
        if (!InetAddresses.isInetAddress(canonicalHostName)) {
          hosts.add(canonicalHostName);
        }
      }
    }

原来需要反解,好像我们是没有配置这个内容,那么配置一下反解就生效了。

调试的过程可以intellij 来进行远程调试,但是生产环境有网络隔离,搭建测试环境又麻烦,就手写代码模拟这个过程了。

 

 

 

你可能感兴趣的:(hadoop)