记录一次线上内存问题的排查过程

所用工具MAT、IDEA

一、发现问题

线上有一个微服务内存已经将近90%,回收不过来,导致频繁gc,cpu也跟着从20%升至40%。

先临时升级机器内存,情况得到缓解,内存回到50%,cpu也降了下来,但是内存还在缓慢增长。

二、定位问题

第一步:怀疑有内存泄露。连续2天,dump了2份该微服务内存。

发现有一个ConcurrentHashMap持有的内存特别大,占整个堆内存的一半。查看这个类下具体的对象内容,发现它的key是dubbo.monitor下的Statistics类。

记录一次线上内存问题的排查过程_第1张图片
记录一次线上内存问题的排查过程_第2张图片
查看dubbo源码,发现dubbo.monitor会对每次接口调用进行统计,记录哪个client调用哪个server的哪个method多少次。

public class DubboMonitor implements Monitor {

    private static final Logger logger = LoggerFactory.getLogger(DubboMonitor.class);

    /**
     * The length of the array which is a container of the statistics
     */
    private static final int LENGTH = 10;

    /**
     * The timer for sending statistics
     */
    private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(3, new NamedThreadFactory("DubboMonitorSendTimer", true));

    /**
     * The future that can cancel the scheduledExecutorService
     */
    private final ScheduledFuture<?> sendFuture;

    private final Invoker<MonitorService> monitorInvoker;

    private final MonitorService monitorService;

    /**
     * The time interval for timer scheduledExecutorService to send data
     */
    private final long monitorInterval;

    private final ConcurrentMap<Statistics, AtomicReference<long[]>> statisticsMap = new ConcurrentHashMap<Statistics, AtomicReference<long[]>>();

    public DubboMonitor(Invoker<MonitorService> monitorInvoker, MonitorService monitorService) {
        this.monitorInvoker = monitorInvoker;
        this.monitorService = monitorService;
        this.monitorInterval = monitorInvoker.getUrl().getPositiveParameter("interval", 60000);
        // collect timer for collecting statistics data
        sendFuture = scheduledExecutorService.scheduleWithFixedDelay(() -> {
            try {
                // collect data
                send();
            } catch (Throwable t) {
                logger.error("Unexpected error occur at send statistic, cause: " + t.getMessage(), t);
            }
        }, monitorInterval, monitorInterval, TimeUnit.MILLISECONDS);
    }

    public void send() {
        logger.debug("Send statistics to monitor " + getUrl());
        String timestamp = String.valueOf(System.currentTimeMillis());
        for (Map.Entry<Statistics, AtomicReference<long[]>> entry : statisticsMap.entrySet()) {
            // get statistics data
            Statistics statistics = entry.getKey();
            AtomicReference<long[]> reference = entry.getValue();
            long[] numbers = reference.get();
            long success = numbers[0];
            long failure = numbers[1];
            long input = numbers[2];
            long output = numbers[3];
            long elapsed = numbers[4];
            long concurrent = numbers[5];
            long maxInput = numbers[6];
            long maxOutput = numbers[7];
            long maxElapsed = numbers[8];
            long maxConcurrent = numbers[9];
            String protocol = getUrl().getParameter(DEFAULT_PROTOCOL);

            // send statistics data
            URL url = statistics.getUrl()
                    .addParameters(MonitorService.TIMESTAMP, timestamp,
                            MonitorService.SUCCESS, String.valueOf(success),
                            MonitorService.FAILURE, String.valueOf(failure),
                            MonitorService.INPUT, String.valueOf(input),
                            MonitorService.OUTPUT, String.valueOf(output),
                            MonitorService.ELAPSED, String.valueOf(elapsed),
                            MonitorService.CONCURRENT, String.valueOf(concurrent),
                            MonitorService.MAX_INPUT, String.valueOf(maxInput),
                            MonitorService.MAX_OUTPUT, String.valueOf(maxOutput),
                            MonitorService.MAX_ELAPSED, String.valueOf(maxElapsed),
                            MonitorService.MAX_CONCURRENT, String.valueOf(maxConcurrent),
                            DEFAULT_PROTOCOL, protocol
                    );
            monitorService.collect(url);

            // reset
            long[] current;
            long[] update = new long[LENGTH];
            do {
                current = reference.get();
                if (current == null) {
                    update[0] = 0;
                    update[1] = 0;
                    update[2] = 0;
                    update[3] = 0;
                    update[4] = 0;
                    update[5] = 0;
                } else {
                    update[0] = current[0] - success;
                    update[1] = current[1] - failure;
                    update[2] = current[2] - input;
                    update[3] = current[3] - output;
                    update[4] = current[4] - elapsed;
                    update[5] = current[5] - concurrent;
                }
            } while (!reference.compareAndSet(current, update));
        }
    }

    @Override
    public void collect(URL url) {
        // data to collect from url
        int success = url.getParameter(MonitorService.SUCCESS, 0);
        int failure = url.getParameter(MonitorService.FAILURE, 0);
        int input = url.getParameter(MonitorService.INPUT, 0);
        int output = url.getParameter(MonitorService.OUTPUT, 0);
        int elapsed = url.getParameter(MonitorService.ELAPSED, 0);
        int concurrent = url.getParameter(MonitorService.CONCURRENT, 0);
        // init atomic reference
        Statistics statistics = new Statistics(url);
        AtomicReference<long[]> reference = statisticsMap.get(statistics);
        if (reference == null) {
            statisticsMap.putIfAbsent(statistics, new AtomicReference<long[]>());
            reference = statisticsMap.get(statistics);
        }
        // use CompareAndSet to sum
        long[] current;
        long[] update = new long[LENGTH];
        do {
            current = reference.get();
            if (current == null) {
                update[0] = success;
                update[1] = failure;
                update[2] = input;
                update[3] = output;
                update[4] = elapsed;
                update[5] = concurrent;
                update[6] = input;
                update[7] = output;
                update[8] = elapsed;
                update[9] = concurrent;
            } else {
                update[0] = current[0] + success;
                update[1] = current[1] + failure;
                update[2] = current[2] + input;
                update[3] = current[3] + output;
                update[4] = current[4] + elapsed;
                update[5] = (current[5] + concurrent) / 2;
                update[6] = current[6] > input ? current[6] : input;
                update[7] = current[7] > output ? current[7] : output;
                update[8] = current[8] > elapsed ? current[8] : elapsed;
                update[9] = current[9] > concurrent ? current[9] : concurrent;
            }
        } while (!reference.compareAndSet(current, update));
    }

    @Override
    public List<URL> lookup(URL query) {
        return monitorService.lookup(query);
    }

    @Override
    public URL getUrl() {
        return monitorInvoker.getUrl();
    }

    @Override
    public boolean isAvailable() {
        return monitorInvoker.isAvailable();
    }

    @Override
    public void destroy() {
        try {
            ExecutorUtil.cancelScheduledFuture(sendFuture);
        } catch (Throwable t) {
            logger.error("Unexpected error occur at cancel sender timer, cause: " + t.getMessage(), t);
        }
        monitorInvoker.destroy();
    }

}

看着好像没什么问题,不存在内存泄露,这应该是dubbo的常规操作。但是为了统计这些东西占用了这么多堆空间,dubbo设计的是不是不合理啊…… 别的微服务堆内存也是这样么?为什么它们的内存没有出现缓慢增长?

第二步:dump了另外2个正常微服务内存对比看看。

一看!震惊了!!另外2个微服务里没有发现一个持有特别大的内存的ConcurrentHashMap!!!

按照占用内存排序,只有一个DubboMonitor和dubbo相关,占得内存稍微多点,但和那个ConcurrentHashMap也不是一个量级的。

打开DubboMonitor查看里面的对象,发现和ConcurrentHashMap存储的内容是类似的,也是记录了dubbo的监控信息。

记录一次线上内存问题的排查过程_第3张图片

那就出现了2个问题:

  1. 为什么异常微服务和正常微服务dubbo监控的内容结构不一样?
  2. 同样是记录dubbo的监控信息,这些微服务都是2C的,存储的内存结构也是类似的,那为什么占用内存差这么多?

第1个问题,在对照代码后发现,异常微服务用的是Apache的dubbo2.7.3(阿里版、Apache版),正常微服务是用的Alibaba的dubbo2.8.4(当当版)

dubbo的版本比较混乱,这里简单介绍一下
dubbo最开始是Alibaba做的,后来贡献给了Apache。目前最新的版本是2.7.6
另外,还有一个当当版的分支,项目名称叫做dubbox。但是因为是基于Alibaba dubbo开发的,所以包的名称还都是alibaba.dubbo。目前最新的版本是2.8.4

第2个问题,再详细分析它们的存储的内容后,发现一个问题:异常微服务的dubbo监控信息里的client的ip都是公网ip(推断是用户ip),正常微服务的dubbo监控信息里的client的ip都是私网ip。(与运维核对后发现是网关ip)用户ip和网关ip当然不是一个数量级的,所有异常微服务的dubbo监控使用了大量的内存……

第三步:看源码。为什么一个记录的用户ip,一个记录的网关ip?

client是从哪里取的值呢?应该是remote_addr或者header.x-forwarded-for

如果从remote_addr里取值的话应该取的是网关的ip,如果从header.x-forwarded-for里取值应该取的是用户的真实ip

看dubbo源码,发现两个版本的dubbo的MonitorFilter处理流程确实是不一样的:

Apache版的MonitorFilter

public class MonitorFilter extends ListenableFilter {

...

	@Override
    public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
        if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
            invocation.setAttachment(MONITOR_FILTER_START_TIME, String.valueOf(System.currentTimeMillis()));
            getConcurrent(invoker, invocation).incrementAndGet(); // count up
        }
        return invoker.invoke(invocation); // proceed invocation chain
    }
	
...

	class MonitorListener implements Listener {

        @Override
        public void onResponse(Result result, Invoker<?> invoker, Invocation invocation) {
            if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
                collect(invoker, invocation, result, RpcContext.getContext().getRemoteHost(), Long.valueOf(invocation.getAttachment(MONITOR_FILTER_START_TIME)), false);
                getConcurrent(invoker, invocation).decrementAndGet(); // count down
            }
        }

        @Override
        public void onError(Throwable t, Invoker<?> invoker, Invocation invocation) {
            if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
                collect(invoker, invocation, null, RpcContext.getContext().getRemoteHost(), Long.valueOf(invocation.getAttachment(MONITOR_FILTER_START_TIME)), true);
                getConcurrent(invoker, invocation).decrementAndGet(); // count down
            }
        }

...
	
}

当当版的MonitorFilter

public class MonitorFilter implements Filter {

...

	public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
        if (invoker.getUrl().hasParameter(Constants.MONITOR_KEY)) {
            RpcContext context = RpcContext.getContext(); // 提供方必须在invoke()之前获取context信息
            long start = System.currentTimeMillis(); // 记录起始时间戮
            getConcurrent(invoker, invocation).incrementAndGet(); // 并发计数
            try {
                Result result = invoker.invoke(invocation); // 让调用链往下执行
                collect(invoker, invocation, result, context, start, false);
                return result;
            } catch (RpcException e) {
                collect(invoker, invocation, null, context, start, true);
                throw e;
            } finally {
                getConcurrent(invoker, invocation).decrementAndGet(); // 并发计数
            }
        } else {
            return invoker.invoke(invocation);
        }
    }

...

}

最主要的区别是:context的获取时机不一样,Apache版是在所有的invoke都调用完执行collect时才获取context;当当版是在invoke执行之前先保存了一份context。为什么要提前保存一份,难道在执行invoke的时候会改变context的值么?其实两个版本collect函数内部的处理和对context处理都有不同,但是后来发现不是这里的问题,所以这里不再展开

第四步:看代码无果,开启debug调试大法(真相在即)

本地debug,追踪ip来源。

发现线程栈帧不同,异常微服务多了一个栈帧RemoteIpValue
记录一次线上内存问题的排查过程_第4张图片

看源码发现是tomcat的类,用来从header.x-forwarded-for里获取用户ip的


public class RemoteIpValve extends ValveBase {

...

    public void invoke(Request request, Response response) throws IOException, ServletException {
        final String originalRemoteAddr = request.getRemoteAddr();
        final String originalRemoteHost = request.getRemoteHost();
        final String originalScheme = request.getScheme();
        final boolean originalSecure = request.isSecure();
        final int originalServerPort = request.getServerPort();
        final String originalProxiesHeader = request.getHeader(proxiesHeader);
        final String originalRemoteIpHeader = request.getHeader(remoteIpHeader);
        boolean isInternal = internalProxies != null &&
                internalProxies.matcher(originalRemoteAddr).matches();

        if (isInternal || (trustedProxies != null &&
                trustedProxies.matcher(originalRemoteAddr).matches())) {
            String remoteIp = null;
            // In java 6, proxiesHeaderValue should be declared as a java.util.Deque
            LinkedList<String> proxiesHeaderValue = new LinkedList<>();
            StringBuilder concatRemoteIpHeaderValue = new StringBuilder();

            for (Enumeration<String> e = request.getHeaders(remoteIpHeader); e.hasMoreElements();) {
                if (concatRemoteIpHeaderValue.length() > 0) {
                    concatRemoteIpHeaderValue.append(", ");
                }

                concatRemoteIpHeaderValue.append(e.nextElement());
            }

            String[] remoteIpHeaderValue = commaDelimitedListToStringArray(concatRemoteIpHeaderValue.toString());
            int idx;
            if (!isInternal) {
                proxiesHeaderValue.addFirst(originalRemoteAddr);
            }
            // loop on remoteIpHeaderValue to find the first trusted remote ip and to build the proxies chain
            for (idx = remoteIpHeaderValue.length - 1; idx >= 0; idx--) {
                String currentRemoteIp = remoteIpHeaderValue[idx];
                remoteIp = currentRemoteIp;
                if (internalProxies !=null && internalProxies.matcher(currentRemoteIp).matches()) {
                    // do nothing, internalProxies IPs are not appended to the
                } else if (trustedProxies != null &&
                        trustedProxies.matcher(currentRemoteIp).matches()) {
                    proxiesHeaderValue.addFirst(currentRemoteIp);
                } else {
                    idx--; // decrement idx because break statement doesn't do it
                    break;
                }
            }
            // continue to loop on remoteIpHeaderValue to build the new value of the remoteIpHeader
            LinkedList<String> newRemoteIpHeaderValue = new LinkedList<>();
            for (; idx >= 0; idx--) {
                String currentRemoteIp = remoteIpHeaderValue[idx];
                newRemoteIpHeaderValue.addFirst(currentRemoteIp);
            }
            if (remoteIp != null) {

                request.setRemoteAddr(remoteIp);
                request.setRemoteHost(remoteIp);

                if (proxiesHeaderValue.size() == 0) {
                    request.getCoyoteRequest().getMimeHeaders().removeHeader(proxiesHeader);
                } else {
                    String commaDelimitedListOfProxies = listToCommaDelimitedString(proxiesHeaderValue);
                    request.getCoyoteRequest().getMimeHeaders().setValue(proxiesHeader).setString(commaDelimitedListOfProxies);
                }
                if (newRemoteIpHeaderValue.size() == 0) {
                    request.getCoyoteRequest().getMimeHeaders().removeHeader(remoteIpHeader);
                } else {
                    String commaDelimitedRemoteIpHeaderValue = listToCommaDelimitedString(newRemoteIpHeaderValue);
                    request.getCoyoteRequest().getMimeHeaders().setValue(remoteIpHeader).setString(commaDelimitedRemoteIpHeaderValue);
                }
            }

            if (protocolHeader != null) {
                String protocolHeaderValue = request.getHeader(protocolHeader);
                if (protocolHeaderValue == null) {
                    // Don't modify the secure, scheme and serverPort attributes
                    // of the request
                } else if (isForwardedProtoHeaderValueSecure(protocolHeaderValue)) {
                    request.setSecure(true);
                    request.getCoyoteRequest().scheme().setString("https");
                    setPorts(request, httpsServerPort);
                } else {
                    request.setSecure(false);
                    request.getCoyoteRequest().scheme().setString("http");
                    setPorts(request, httpServerPort);
                }
            }

            if (log.isDebugEnabled()) {
                log.debug("Incoming request " + request.getRequestURI() + " with originalRemoteAddr '" + originalRemoteAddr
                          + "', originalRemoteHost='" + originalRemoteHost + "', originalSecure='" + originalSecure + "', originalScheme='"
                          + originalScheme + "' will be seen as newRemoteAddr='" + request.getRemoteAddr() + "', newRemoteHost='"
                          + request.getRemoteHost() + "', newScheme='" + request.getScheme() + "', newSecure='" + request.isSecure() + "'");
            }
        } else {
            if (log.isDebugEnabled()) {
                log.debug("Skip RemoteIpValve for request " + request.getRequestURI() + " with originalRemoteAddr '"
                        + request.getRemoteAddr() + "'");
            }
        }
        if (requestAttributesEnabled) {
            request.setAttribute(AccessLog.REMOTE_ADDR_ATTRIBUTE,
                    request.getRemoteAddr());
            request.setAttribute(Globals.REMOTE_ADDR_ATTRIBUTE,
                    request.getRemoteAddr());
            request.setAttribute(AccessLog.REMOTE_HOST_ATTRIBUTE,
                    request.getRemoteHost());
            request.setAttribute(AccessLog.PROTOCOL_ATTRIBUTE,
                    request.getProtocol());
            request.setAttribute(AccessLog.SERVER_PORT_ATTRIBUTE,
                    Integer.valueOf(request.getServerPort()));
        }
        try {
            getNext().invoke(request, response);
        } finally {
            request.setRemoteAddr(originalRemoteAddr);
            request.setRemoteHost(originalRemoteHost);
            request.setSecure(originalSecure);
            request.getCoyoteRequest().scheme().setString(originalScheme);
            request.setServerPort(originalServerPort);

            MimeHeaders headers = request.getCoyoteRequest().getMimeHeaders();
            if (originalProxiesHeader == null || originalProxiesHeader.length() == 0) {
                headers.removeHeader(proxiesHeader);
            } else {
                headers.setValue(proxiesHeader).setString(originalProxiesHeader);
            }

            if (originalRemoteIpHeader == null || originalRemoteIpHeader.length() == 0) {
                headers.removeHeader(remoteIpHeader);
            } else {
                headers.setValue(remoteIpHeader).setString(originalRemoteIpHeader);
            }
        }
    }

...

}

为什么异常微服务调用了RemoteIpValve,而正常微服务没调???

结果发现异常微服务和正常微服务的springboot版本不一样!它们对用户ip的处理逻辑是不一样的 (╯‵□′)╯︵┻━┻

异常微服务用的springboot2.2.2,而正常微服务用的springboot1.5.22

springboot2.2.2有什么特殊处理么?是的,请看:

虽然,ServerProperties类里配置了默认是不使用header.x-forwarded-for的

public class ServerProperties {

...

	/**
	 * Strategy for handling X-Forwarded-* headers.
	 */
	private ForwardHeadersStrategy forwardHeadersStrategy = ForwardHeadersStrategy.NONE;
	
...

}

但是,TomcatWebServerFactoryCustomizer类里有个判断,看当前平台是否为云平台,如果是则会使用header.x-forwarded-for

public class TomcatWebServerFactoryCustomizer implements WebServerFactoryCustomizer<ConfigurableTomcatWebServerFactory>, Ordered {

...

	private boolean getOrDeduceUseForwardHeaders() {
		if (this.serverProperties.getForwardHeadersStrategy().equals(ServerProperties.ForwardHeadersStrategy.NONE)) {
			CloudPlatform platform = CloudPlatform.getActive(this.environment);
			return platform != null && platform.isUsingForwardHeaders();
		}
		return this.serverProperties.getForwardHeadersStrategy().equals(ServerProperties.ForwardHeadersStrategy.NATIVE);
	}

	private void customizeRemoteIpValve(ConfigurableTomcatWebServerFactory factory) {
		Tomcat tomcatProperties = this.serverProperties.getTomcat();
		String protocolHeader = tomcatProperties.getProtocolHeader();
		String remoteIpHeader = tomcatProperties.getRemoteIpHeader();
		// For back compatibility the valve is also enabled if protocol-header is set
		if (StringUtils.hasText(protocolHeader) || StringUtils.hasText(remoteIpHeader)
				|| getOrDeduceUseForwardHeaders()) {
			RemoteIpValve valve = new RemoteIpValve();
			valve.setProtocolHeader(StringUtils.hasLength(protocolHeader) ? protocolHeader : "X-Forwarded-Proto");
			if (StringUtils.hasLength(remoteIpHeader)) {
				valve.setRemoteIpHeader(remoteIpHeader);
			}
			// The internal proxies default to a white list of "safe" internal IP
			// addresses
			valve.setInternalProxies(tomcatProperties.getInternalProxies());
			valve.setHostHeader(tomcatProperties.getHostHeader());
			valve.setPortHeader(tomcatProperties.getPortHeader());
			valve.setProtocolHeaderHttpsValue(tomcatProperties.getProtocolHeaderHttpsValue());
			// ... so it's safe to add this valve by default.
			factory.addEngineValves(valve);
		}
	}
...

}

springboot列举了几种常见的云平台类型:


/**
 * Simple detection for well known cloud platforms. For more advanced cloud provider
 * integration consider the Spring Cloud project.
 *
 * @author Phillip Webb
 * @since 1.3.0
 * @see "https://cloud.spring.io"
 */
public enum CloudPlatform {

	/**
	 * Cloud Foundry platform.
	 */
	CLOUD_FOUNDRY {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("VCAP_APPLICATION") || environment.containsProperty("VCAP_SERVICES");
		}

	},

	/**
	 * Heroku platform.
	 */
	HEROKU {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("DYNO");
		}

	},

	/**
	 * SAP Cloud platform.
	 */
	SAP {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("HC_LANDSCAPE");
		}

	},

	/**
	 * Kubernetes platform.
	 */
	KUBERNETES {

		private static final String SERVICE_HOST_SUFFIX = "_SERVICE_HOST";

		private static final String SERVICE_PORT_SUFFIX = "_SERVICE_PORT";

		@Override
		public boolean isActive(Environment environment) {
			if (environment instanceof ConfigurableEnvironment) {
				return isActive((ConfigurableEnvironment) environment);
			}
			return false;
		}

		private boolean isActive(ConfigurableEnvironment environment) {
			PropertySource<?> environmentPropertySource = environment.getPropertySources()
					.get(StandardEnvironment.SYSTEM_ENVIRONMENT_PROPERTY_SOURCE_NAME);
			if (environmentPropertySource instanceof EnumerablePropertySource) {
				return isActive((EnumerablePropertySource<?>) environmentPropertySource);
			}
			return false;
		}

		private boolean isActive(EnumerablePropertySource<?> environmentPropertySource) {
			for (String propertyName : environmentPropertySource.getPropertyNames()) {
				if (propertyName.endsWith(SERVICE_HOST_SUFFIX)) {
					String serviceName = propertyName.substring(0,
							propertyName.length() - SERVICE_HOST_SUFFIX.length());
					if (environmentPropertySource.getProperty(serviceName + SERVICE_PORT_SUFFIX) != null) {
						return true;
					}
				}
			}
			return false;
		}

	};

	/**
	 * Determines if the platform is active (i.e. the application is running in it).
	 * @param environment the environment
	 * @return if the platform is active.
	 */
	public abstract boolean isActive(Environment environment);

	/**
	 * Returns if the platform is behind a load balancer and uses
	 * {@literal X-Forwarded-For} headers.
	 * @return if {@literal X-Forwarded-For} headers are used
	 */
	public boolean isUsingForwardHeaders() {
		return true;
	}

	/**
	 * Returns the active {@link CloudPlatform} or {@code null} if one cannot be deduced.
	 * @param environment the environment
	 * @return the {@link CloudPlatform} or {@code null}
	 */
	public static CloudPlatform getActive(Environment environment) {
		if (environment != null) {
			for (CloudPlatform cloudPlatform : values()) {
				if (cloudPlatform.isActive(environment)) {
					return cloudPlatform;
				}
			}
		}
		return null;
	}

}

如果系统的环境变量中同时含有_SERVICE_HOST_SERVICE_PORT结尾的系统变量则认为是KUBERNETES平台。isUsingForwardHeaders返回true,所以CloudPlatform中所有的云平台都会使用header.x-forwarded-for。原因是:springboot2认为云平台都有load balancer,所以X-Forwarded-For headers are used。(见isUsingForwardHeaders函数的注释)

记录一次线上内存问题的排查过程_第5张图片
我们刚好用的就是KUBERNETES平台!
系统的环境变量中刚好就是同时含有_SERVICE_HOST_SERVICE_PORT结尾的系统变量!!
DUBBO_MONITOR_27_SERVICE_PRODUCTION_SERVICE_HOST
DUBBO_MONITOR_27_SERVICE_PRODUCTION_SERVICE_PORT
DUBBO_MONITOR_SERVICE_PRODUCTION_MICRO_SERVICE_SERVICE_HOST
DUBBO_MONITOR_SERVICE_PRODUCTION_MICRO_SERVICE_SERVICE_PORT
就是这么巧 !!!

记录一次线上内存问题的排查过程_第6张图片

在springboot1.5.22中的代码与springboot2.2.2略有不同,但是逻辑类似。真正的差异是CloudPlatform的范围不同!!

public enum CloudPlatform {

	/**
	 * Cloud Foundry platform.
	 */
	CLOUD_FOUNDRY {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("VCAP_APPLICATION") || environment.containsProperty("VCAP_SERVICES");
		}

	},

	/**
	 * Heroku platform.
	 */
	HEROKU {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("DYNO");
		}

	};

	/**
	 * Determines if the platform is active (i.e. the application is running in it).
	 * @param environment the environment
	 * @return if the platform is active.
	 */
	public abstract boolean isActive(Environment environment);

	/**
	 * Returns if the platform is behind a load balancer and uses
	 * {@literal X-Forwarded-For} headers.
	 * @return if {@literal X-Forwarded-For} headers are used
	 */
	public boolean isUsingForwardHeaders() {
		return true;
	}

	/**
	 * Returns the active {@link CloudPlatform} or {@code null} if one cannot be deduced.
	 * @param environment the environment
	 * @return the {@link CloudPlatform} or {@code null}
	 */
	public static CloudPlatform getActive(Environment environment) {
		if (environment != null) {
			for (CloudPlatform cloudPlatform : values()) {
				if (cloudPlatform.isActive(environment)) {
					return cloudPlatform;
				}
			}
		}
		return null;
	}

}

三、找到问题

1. springboot获取用户ip的配置

  • 默认使用remote_addr

  • 当系统运行在云平台上时,默认使用header.x-forwarded-for,如果想关闭需要如下配置

    server.forward-headers-strategy=FRAMEWORK
    
    // server.forward-headers-strategy的取值范围
    public enum ForwardHeadersStrategy {
    	NATIVE,  // 强制使用header.x-forwarded-for
    	FRAMEWORK, // 强制使用remote_addr
    	NONE // 默认(普通运行使用remote_addr,云平台运行使用header.x-forwarded-for)
    }
    
  • 用户可以通过配置来指定从哪个Header字段中获取用户ip

    server.use-forward-headers=true
    server.tomcat.protocol-header=X-Forwarded-Proto  # 这行可以省略
    server.tomcat.remote-ip-header=x-forwarded-for
    

2. springboot升级带来的隐藏变更

springboot2.2.2可以识别的云平台:CLOUD_FOUNDRY、HEROKU、SAP、KUBERNETES
springboot1.5.22可以识别的云平台:CLOUD_FOUNDRY、HEROKU

四、结论

异常的微服务使用了springboot2.2.2并且运行在springboot可以识别的云平台上,所以默认从header.x-forwarded-for中获取到了用户真实ip。导致dubbo监控在记录用户ip的时候记的是用户真实ip,从而占用了大量的内存。

你可能感兴趣的:(Dubbo,Java,线上问题)