kettle版本:8.2.0.0-342
【kettle】pentaho/data-integration debug 查看日志方法
核心报错内容:
org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
执行到某一个请求时候报错,内容:
2024/01/18 15:24:06 - 获取json.0 - Connecting to [http://xxx.com/apis/query?id=123456] ...
2024/01/18 15:24:06 - 获取json.0 - Header parameter [Authorization]='Bearer ***'
15:24:06,404 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://xxx.com:80
15:24:06,405 DEBUG [DefaultClientConnectionOperator] Connecting to xxx.com:80
15:24:06,407 DEBUG [RequestAddCookies] CookieSpec selected: default
15:24:06,407 DEBUG [RequestAuthCache] Auth cache not set in the context
15:24:06,407 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
15:24:06,407 DEBUG [DefaultHttpClient] Attempt 1 to execute request
15:24:06,408 DEBUG [DefaultClientConnection] Sending request: POST /apis/query?id=123456 HTTP/1.1
15:24:06,408 DEBUG [wire] >> "POST /apis/query?id=123456 HTTP/1.1[\r][\n]"
15:24:06,408 DEBUG [wire] >> "Authorization: Bearer ***[\r][\n]"
15:24:06,408 DEBUG [wire] >> "Content-Type: application/json[\r][\n]"
15:24:06,408 DEBUG [wire] >> "Content-Length: 0[\r][\n]"
15:24:06,408 DEBUG [wire] >> "Host: xxx.com[\r][\n]"
15:24:06,408 DEBUG [wire] >> "Connection: Keep-Alive[\r][\n]"
15:24:06,408 DEBUG [wire] >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
15:24:06,408 DEBUG [wire] >> "[\r][\n]"
15:24:06,408 DEBUG [headers] >> POST /apis/query?id=123456 HTTP/1.1
15:24:06,408 DEBUG [headers] >> Authorization: Bearer ***
15:24:06,408 DEBUG [headers] >> Content-Type: application/json
15:24:06,408 DEBUG [headers] >> Content-Length: 0
15:24:06,408 DEBUG [headers] >> Host: xxx.com
15:24:06,408 DEBUG [headers] >> Connection: Keep-Alive
15:24:06,408 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
15:24:06,661 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 closed
15:24:06,661 DEBUG [DefaultHttpClient] Closing the connection.
15:24:06,666 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 closed
15:24:06,666 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 shut down
15:24:06,666 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@7adaefad
2024/01/18 15:24:06 - 获取json.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Because of an error, this step can't continue:
2024/01/18 15:24:06 - 获取json.0 - Can not result from [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : org.pentaho.di.core.exception.KettleException:
2024/01/18 15:24:06 - 获取json.0 - Can not result from [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 -
2024/01/18 15:24:06 - 获取json.0 - at org.pentaho.di.trans.steps.rest.Rest.callRest(Rest.java:273)
2024/01/18 15:24:06 - 获取json.0 - at org.pentaho.di.trans.steps.rest.Rest.processRow(Rest.java:470)
2024/01/18 15:24:06 - 获取json.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2024/01/18 15:24:06 - 获取json.0 - at java.lang.Thread.run(Thread.java:748)
2024/01/18 15:24:06 - 获取json.0 - Caused by: com.sun.jersey.api.client.ClientHandlerException: org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:187)
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.api.client.Client.handle(Client.java:652)
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
2024/01/18 15:24:06 - 获取json.0 - at org.pentaho.di.trans.steps.rest.Rest.callRest(Rest.java:188)
2024/01/18 15:24:06 - 获取json.0 - ... 3 more
2024/01/18 15:24:06 - 获取json.0 - Caused by: org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:281)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:257)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:207)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:684)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
2024/01/18 15:24:06 - 获取json.0 - at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
2024/01/18 15:24:06 - 获取json.0 - at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:173)
2024/01/18 15:24:06 - 获取json.0 - ... 8 more
对比看下不报错查询的日志:
15:24:06,225 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://xxx.com:80
15:24:06,225 DEBUG [DefaultClientConnectionOperator] Connecting to xxx.com:80
15:24:06,228 DEBUG [RequestAddCookies] CookieSpec selected: default
15:24:06,228 DEBUG [RequestAuthCache] Auth cache not set in the context
15:24:06,228 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
15:24:06,228 DEBUG [DefaultHttpClient] Attempt 1 to execute request
15:24:06,228 DEBUG [DefaultClientConnection] Sending request: POST /apis/query?id=123456 HTTP/1.1
15:24:06,228 DEBUG [wire] >> "POST /apis/query?id=123456 HTTP/1.1[\r][\n]"
15:24:06,228 DEBUG [wire] >> "Authorization: Bearer ***[\r][\n]"
15:24:06,228 DEBUG [wire] >> "Content-Type: application/json[\r][\n]"
15:24:06,228 DEBUG [wire] >> "Content-Length: 0[\r][\n]"
15:24:06,228 DEBUG [wire] >> "Host: xxx.com[\r][\n]"
15:24:06,228 DEBUG [wire] >> "Connection: Keep-Alive[\r][\n]"
15:24:06,228 DEBUG [wire] >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
15:24:06,228 DEBUG [wire] >> "[\r][\n]"
15:24:06,228 DEBUG [headers] >> POST /apis/query?id=123456 HTTP/1.1
15:24:06,228 DEBUG [headers] >> Authorization: Bearer ***
15:24:06,228 DEBUG [headers] >> Content-Type: application/json
15:24:06,228 DEBUG [headers] >> Content-Length: 0
15:24:06,228 DEBUG [headers] >> Host: xxx.com
15:24:06,229 DEBUG [headers] >> Connection: Keep-Alive
15:24:06,229 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
15:24:06,343 DEBUG [wire] << "HTTP/1.1 200 [\r][\n]"
15:24:06,343 DEBUG [wire] << "Content-Type: application/json[\r][\n]"
15:24:06,347 DEBUG [wire] << "Transfer-Encoding: chunked[\r][\n]"
15:24:06,347 DEBUG [wire] << "Connection: keep-alive[\r][\n]"
15:24:06,347 DEBUG [wire] << "Vary: Origin[\r][\n]"
15:24:06,347 DEBUG [wire] << "X-Content-Type-Options: nosniff[\r][\n]"
15:24:06,347 DEBUG [wire] << "X-XSS-Protection: 1; mode=block[\r][\n]"
15:24:06,347 DEBUG [wire] << "Cache-Control: no-cache, no-store, max-age=0, must-revalidate[\r][\n]"
15:24:06,347 DEBUG [wire] << "Pragma: no-cache[\r][\n]"
15:24:06,347 DEBUG [wire] << "Expires: 0[\r][\n]"
15:24:06,347 DEBUG [wire] << "Date: Thu, 18 Jan 2024 07:24:06 GMT[\r][\n]"
15:24:06,347 DEBUG [wire] << "Access-Control-Allow-Origin: *[\r][\n]"
15:24:06,347 DEBUG [wire] << "X-Kong-Upstream-Latency: 113[\r][\n]"
15:24:06,347 DEBUG [wire] << "X-Kong-Proxy-Latency: 0[\r][\n]"
15:24:06,348 DEBUG [wire] << "Via: kong/2.7.0[\r][\n]"
15:24:06,348 DEBUG [wire] << "vary: Origin[\r][\n]"
15:24:06,348 DEBUG [wire] << "[\r][\n]"
15:24:06,348 DEBUG [DefaultClientConnection] Receiving response: HTTP/1.1 200
15:24:06,348 DEBUG [headers] << HTTP/1.1 200
15:24:06,348 DEBUG [headers] << Content-Type: application/json
15:24:06,348 DEBUG [headers] << Transfer-Encoding: chunked
15:24:06,348 DEBUG [headers] << Connection: keep-alive
15:24:06,348 DEBUG [headers] << Vary: Origin
15:24:06,348 DEBUG [headers] << X-Content-Type-Options: nosniff
15:24:06,348 DEBUG [headers] << X-XSS-Protection: 1; mode=block
15:24:06,348 DEBUG [headers] << Cache-Control: no-cache, no-store, max-age=0, must-revalidate
15:24:06,348 DEBUG [headers] << Pragma: no-cache
15:24:06,348 DEBUG [headers] << Expires: 0
15:24:06,348 DEBUG [headers] << Date: Thu, 18 Jan 2024 07:24:06 GMT
15:24:06,348 DEBUG [headers] << Access-Control-Allow-Origin: *
15:24:06,348 DEBUG [headers] << X-Kong-Upstream-Latency: 113
15:24:06,348 DEBUG [headers] << X-Kong-Proxy-Latency: 0
15:24:06,348 DEBUG [headers] << Via: kong/2.7.0
15:24:06,348 DEBUG [headers] << vary: Origin
15:24:06,348 DEBUG [DefaultHttpClient] Connection can be kept alive indefinitely
15:24:06,349 DEBUG [wire] << "38[\r][\n]"
15:24:06,349 DEBUG [wire] << "隐藏内容}"
2024/01/18 15:24:06 - 获取json.0 - Response time (milliseconds): [125] for [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - The response code is 200
15:24:06,349 DEBUG [wire] << "[\r][\n]"
15:24:06,349 DEBUG [wire] << "0[\r][\n]"
15:24:06,349 DEBUG [wire] << "[\r][\n]"
15:24:06,349 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@4bdd086a
15:24:06,349 DEBUG [BasicClientConnectionManager] Connection can be kept alive indefinitely
参考文章 记一次NoHttpResponseException:xxx failed to respond
得知kettle使用的是 apache的httpclient作为
三个组件的连接工具。
而问题原因归结为:keep-alive配置
于是手动搭建springboot项目并增加配置:
定制KeepAliveTimeout,设置10秒;5个请求则自动断开keepalive连接
import org.apache.catalina.connector.Connector;
import org.apache.coyote.http11.Http11NioProtocol;
import org.springframework.boot.web.embedded.tomcat.TomcatConnectorCustomizer;
import org.springframework.boot.web.embedded.tomcat.TomcatServletWebServerFactory;
import org.springframework.boot.web.server.ConfigurableWebServerFactory;
import org.springframework.boot.web.server.WebServerFactoryCustomizer;
import org.springframework.context.annotation.Configuration;
@Configuration
public class WebServerConfiguration implements WebServerFactoryCustomizer<ConfigurableWebServerFactory> {
@Override
public void customize(ConfigurableWebServerFactory factory) {
//使用对应工厂类提供给我们的接口定制化我们的tomcat connector
((TomcatServletWebServerFactory) factory).addConnectorCustomizers(new TomcatConnectorCustomizer() {
@Override
public void customize(Connector connector) {
Http11NioProtocol protocol = (Http11NioProtocol) connector.getProtocolHandler();
//定制KeepAliveTimeout,设置10秒内没有请求则服务器自动断开keepalive连接
protocol.setKeepAliveTimeout(10000);
//当客户端发送超过5个请求则自动断开keepalive连接
protocol.setMaxKeepAliveRequests(5);
}
});
}
}
增加测试类:
关闭重试,强制keepAlive=-1
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.conn.ConnectionKeepAliveStrategy;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy;
import org.apache.http.impl.client.DefaultHttpRequestRetryHandler;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.util.EntityUtils;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
public class HttpRequest {
// 默认keepalive策略 ,会获取response中的keepalive参数并配置到client中
private ConnectionKeepAliveStrategy udfKeepAliveStrategy = DefaultConnectionKeepAliveStrategy.INSTANCE;
// 强制 -1表示无论如何都任务server端不会关闭连接
private ConnectionKeepAliveStrategy noneKeepAliveStrategy = (response, context) -> -1;
// 如果不配置此项,也会有添加默认配置重试3次,此处增加重试次数。
private DefaultHttpRequestRetryHandler udfRetryHandler = new DefaultHttpRequestRetryHandler(30, false);
private PoolingHttpClientConnectionManager manager;
private String INCR_URL = "http://localhost:8080/api/v1/incr";
public HttpRequest() {
manager = new PoolingHttpClientConnectionManager();
manager.setDefaultMaxPerRoute(100);
manager.setMaxTotal(200);
manager.setValidateAfterInactivity(10_000);
}
public CloseableHttpClient getClient() {
HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
httpClientBuilder
.setConnectionManager(manager)
// .setRetryHandler(udfRetryHandler)
// .setKeepAliveStrategy(udfKeepAliveStrategy)
.setKeepAliveStrategy(noneKeepAliveStrategy)
.disableAutomaticRetries()
;
CloseableHttpClient client = httpClientBuilder.build();
return client;
}
@Test
public void httpRequest() throws URISyntaxException, IOException {
CloseableHttpClient client = getClient();
URI uri = new URIBuilder(INCR_URL + "/info").build();
for (int i = 0; i < 10; i++) {
HttpPost post = new HttpPost(uri);
CloseableHttpResponse response = client.execute(post);
Map<String, String> headerMap = new HashMap<>();
Arrays.stream(response.getAllHeaders()).forEach(f->headerMap.put(f.getName(),f.getValue()));
String responseStr = EntityUtils.toString(response.getEntity());
String headersStr=headerMap.toString();
System.out.println(String.format("content: %s, headers: %s",responseStr,headersStr));
}
}
}
经过测试:
16:01:57.448 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default
16:01:57.448 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Auth cache not set in the context
16:01:57.448 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection request: [route: {}->http://localhost:8080][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 200]
16:01:57.448 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection leased: [id: 1][route: {}->http://localhost:8080][total kept alive: 0; route allocated: 1 of 100; total allocated: 1 of 200]
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Executing request POST /api/v1/incr/info HTTP/1.1
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Target auth state: UNCHALLENGED
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Proxy auth state: UNCHALLENGED
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> POST /api/v1/incr/info HTTP/1.1
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Content-Length: 0
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Host: localhost:8080
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Connection: Keep-Alive
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Accept-Encoding: gzip,deflate
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "POST /api/v1/incr/info HTTP/1.1[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Content-Length: 0[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Host: localhost:8080[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Connection: Keep-Alive[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Accept-Encoding: gzip,deflate[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "HTTP/1.1 200 [\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Content-Type: application/json[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Transfer-Encoding: chunked[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Date: Wed, 17 Jan 2024 08:01:57 GMT[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Connection: close[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "18[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "{"id":9,"name":"info-9"}[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << HTTP/1.1 200
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Content-Type: application/json
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Transfer-Encoding: chunked
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Date: Wed, 17 Jan 2024 08:01:57 GMT
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Connection: close
16:01:57.759 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "0[\r][\n]"
16:01:57.759 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "[\r][\n]"
16:01:57.759 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-1: Close connection
16:01:57.759 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded
16:01:57.759 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 1][route: {}->http://localhost:8080][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 200]
结果:
程序根本不会报错!!!但是debug srpingboot端设置的close connection生效了,处理每5条日志此日志就会打印获取的response中含有Connection: close
内容。apache 会根据此内容,重建connection。所以没有任何报错内容。备注:其他记录都含有Connection: keep-alive
字样,表示服务器还没关闭connection。
办法就是增加重试配置。
把kettle/lib中的包install到本地仓库:
需要的kettle包都在kettle安装目录/lib下:KETTLE_HOME/lib
mvn install:install-file "-DgroupId=pentaho-kettle" "-DartifactId=kettle-core" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/kettle-core-8.2.0.0-342.jar"
mvn install:install-file "-DgroupId=pentaho-kettle" "-DartifactId=kettle-engine" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/kettle-engine-8.2.0.0-342.jar"
mvn install:install-file "-DgroupId=pentaho" "-DartifactId=metastore" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/metastore-8.2.0.0-342.jar"
新建maven项目:
pom.xml增加依赖:
<dependency>
<groupId>pentaho-kettlegroupId>
<artifactId>kettle-coreartifactId>
<version>8.2.0.0-342version>
<scope>providedscope>
dependency>
<dependency>
<groupId>pentaho-kettlegroupId>
<artifactId>kettle-engineartifactId>
<version>8.2.0.0-342version>
<scope>providedscope>
dependency>
<dependency>
<groupId>pentahogroupId>
<artifactId>metastoreartifactId>
<version>8.2.0.0-342version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.httpcomponentsgroupId>
<artifactId>httpclientartifactId>
<version>4.5.3version>
<scope>providedscope>
dependency>
<dependency>
<groupId>commons-langgroupId>
<artifactId>commons-langartifactId>
<version>2.6version>
<scope>providedscope>
dependency>
<dependency>
<groupId>com.googlecode.json-simplegroupId>
<artifactId>json-simpleartifactId>
<version>1.1version>
<scope>providedscope>
dependency>
<dependency>
<groupId>com.github.rholdergroupId>
<artifactId>guava-retryingartifactId>
<version>2.0.0version>
<scope>providedscope>
dependency>
所以按照之前参考文章的内容: org.pentaho.di.cluster.SlaveConnectionManager
和 org.pentaho.di.core.util.HttpClientManager
类中
修改所有使用 HttpClients
和 HttpClientBuilder
新建 client的位置,增加retry相关配置
注意:使用DefaultHttpRequestRetryHandler
必须设置true,或者使用StandardHttpRequestRetryHandler
也可以。
// private ConnectionKeepAliveStrategy udfKeepAliveStrategy = DefaultConnectionKeepAliveStrategy.INSTANCE;
// private DefaultClientConnectionReuseStrategy udfReuseHandler= new DefaultClientConnectionReuseStrategy();
private HttpRequestRetryHandler udfRetryHandler=new DefaultHttpRequestRetryHandler(5,true); // 必须true!
// private HttpRequestRetryHandler udfRetryHandler = new StandardHttpRequestRetryHandler(5,true) // true false无所谓
KeepAlive和ReuseStrategy默认都是会添加的,所以不配置也可以。retry策略必须添加。
编译,用刚刚编译的替换kettle-core-8.2.0.0-342.jar
中的.class文件,并替换KETTLE_HOME/lib/kettle-core-8.2.0.0-342.jar包
。
用kettle调用自己的写的springboot服务看下:
使用kettle的rest组件调用自己的springboot接口发现,cmd窗口打印的debug日志response信息中不会出现:Connection: close
。所以kettle中rest或者http在每条数据运行过程都是新建了一个apache httpclient对象,就更没有复用connection!每条数据服务器返回的response都有Connection: keep-alive
,看来效率很低。
2024/01/17 13:35:04 - 写日志.0 -
2024/01/17 13:35:04 - 写日志.0 - ------------> 行号 99------------------------------
2024/01/17 13:35:04 - 写日志.0 - res = {"id":1510,"name":"info-1510"}
2024/01/17 13:35:04 - 写日志.0 -
2024/01/17 13:35:04 - 写日志.0 - ====================
2024/01/17 13:35:04 - REST client.0 - Connecting to [http://localhost:8080/api/v1/incr/info] ...
2024/01/17 13:35:04 - REST client.0 - Connecting to [http://localhost:8080/api/v1/incr/info] ...
2024/01/17 13:35:04 - REST client.0 - Adding HTTP body value [1]
13:35:04,357 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://localhost:8080
13:35:04,358 DEBUG [DefaultClientConnectionOperator] Connecting to localhost:8080
13:35:04,359 DEBUG [RequestAddCookies] CookieSpec selected: default
13:35:04,359 DEBUG [RequestAuthCache] Auth cache not set in the context
13:35:04,359 DEBUG [RequestTargetAuthentication] Target auth state: UNCHALLENGED
13:35:04,359 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
13:35:04,359 DEBUG [DefaultHttpClient] Attempt 1 to execute request
13:35:04,359 DEBUG [DefaultClientConnection] Sending request: POST /api/v1/incr/info HTTP/1.1
13:35:04,359 DEBUG [wire] >> "POST /api/v1/incr/info HTTP/1.1[\r][\n]"
13:35:04,359 DEBUG [wire] >> "Content-Type: application/json[\r][\n]"
13:35:04,359 DEBUG [wire] >> "Transfer-Encoding: chunked[\r][\n]"
13:35:04,359 DEBUG [wire] >> "Host: localhost:8080[\r][\n]"
13:35:04,359 DEBUG [wire] >> "Connection: Keep-Alive[\r][\n]"
13:35:04,359 DEBUG [wire] >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
13:35:04,360 DEBUG [wire] >> "[\r][\n]"
13:35:04,360 DEBUG [headers] >> POST /api/v1/incr/info HTTP/1.1
13:35:04,360 DEBUG [headers] >> Content-Type: application/json
13:35:04,360 DEBUG [headers] >> Transfer-Encoding: chunked
13:35:04,360 DEBUG [headers] >> Host: localhost:8080
13:35:04,360 DEBUG [headers] >> Connection: Keep-Alive
13:35:04,360 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
13:35:04,360 DEBUG [wire] >> "1[\r][\n]"
13:35:04,360 DEBUG [wire] >> "1"
13:35:04,360 DEBUG [wire] >> "[\r][\n]"
13:35:04,360 DEBUG [wire] >> "0[\r][\n]"
13:35:04,360 DEBUG [wire] >> "[\r][\n]"
13:35:04,674 DEBUG [wire] << "HTTP/1.1 200 [\r][\n]"
13:35:04,675 DEBUG [wire] << "Content-Type: application/json[\r][\n]"
13:35:04,680 DEBUG [wire] << "Transfer-Encoding: chunked[\r][\n]"
13:35:04,680 DEBUG [wire] << "Date: Wed, 17 Jan 2024 05:35:04 GMT[\r][\n]"
13:35:04,680 DEBUG [wire] << "Keep-Alive: timeout=10[\r][\n]"
13:35:04,680 DEBUG [wire] << "Connection: keep-alive[\r][\n]"
13:35:04,680 DEBUG [wire] << "[\r][\n]"
13:35:04,680 DEBUG [DefaultClientConnection] Receiving response: HTTP/1.1 200
13:35:04,680 DEBUG [headers] << HTTP/1.1 200
13:35:04,680 DEBUG [headers] << Content-Type: application/json
13:35:04,680 DEBUG [headers] << Transfer-Encoding: chunked
13:35:04,680 DEBUG [headers] << Date: Wed, 17 Jan 2024 05:35:04 GMT
13:35:04,680 DEBUG [headers] << Keep-Alive: timeout=10
13:35:04,680 DEBUG [headers] << Connection: keep-alive
13:35:04,681 DEBUG [DefaultHttpClient] Connection can be kept alive for 10000 MILLISECONDS
13:35:04,681 DEBUG [wire] << "1e[\r][\n]"
13:35:04,681 DEBUG [wire] << "{"id":1511,"name":"info-1511"}"
2024/01/17 13:35:04 - REST client.0 - Response time (milliseconds): [325] for [http://localhost:8080/api/v1/incr/info]
2024/01/17 13:35:04 - REST client.0 - The response code is 200
13:35:04,681 DEBUG [wire] << "[\r][\n]"
13:35:04,681 DEBUG [wire] << "0[\r][\n]"
13:35:04,682 DEBUG [wire] << "[\r][\n]"
13:35:04,682 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@2192ee11
13:35:04,682 DEBUG [BasicClientConnectionManager] Connection can be kept alive for 10000 MILLISECONDS
现在可以断定不是keepalive配置或connection复用的问题了。
再次运行kettle访问生产服务,由于增加了retry,也没有报错了!
推测是kettle中的apache httpclient创建connection后进行get/post发现服务端connection就已经关闭了,是服务端问题,但服务端问题没权限解决。
具体可见org.apache.http.impl.client.HttpClientBuilder
如果没有HttpClientBuilder
没有配置retry则会在build是时候设置默认retryhandler
org.apache.http.impl.execchain.RetryExec
就是重试的执行器,retryHandler.retryRequet
的if判断最重要,重试过程如下:
但是设置了默认的org.apache.http.impl.client.DefaultHttpRequestRetryHandler
重试handler是不是就可以重试呢,不是的,如下图:默认重试handler,要重试需要满足:
在限定次数3次内,requestSentRetryEnabled=ture或Request是幂等的。
如果使用默认的重试器,发起HttpPut或者HttpPost从源码推定不是幂等的,所以默认重试器不会重试。
StandardHttpRequestRetryHandler
类中重写了handleAsIdempotent
幂等判断的方法,基本所有请求都视为幂等得了,所以requestSentRetryEnabled配置成true/false就无所谓了。
org.apache.http.impl.client.DefaultClientConnectionReuseStrategy
类的功能就是根据response返回值获取connection参数是否close并返回boolean值。
org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy
根据response解析connection_keep_alive的timeout值
reuse和keepalive的使用在主执行器中:
org.apache.http.impl.execchain.MainClientExec#execute
方法,描述了reuse 和 keepalive 使用。
即,如果从response中获取connection=keep-alive则,再去获取keep-alive的timeout值并将此值回填至connection,待connection pool使用此connection前校验是否有效。
在org.apache.http.impl.client.HttpClientBuilder#build
方法中都是默认添加reuse和keepalive的handler的。
可以在外层自己实现重试!
kettle中http或rest组件就是使用如下两个类:
org.pentaho.di.trans.steps.http.HTTP
和org.pentaho.di.trans.steps.httppost.HTTPPOST
其中HTTP
负责除POST外的请求,HTTPPOST
负责POST。
这两个类都在kettle-engine-8.2.0.0-342.jar
包中
把kettle包中的 org.pentaho.di.trans.steps.http.HTTP
和org.pentaho.di.trans.steps.httppost.HTTPPOST
复制源码到自己的项目中:
新建一个retryutils,这里使用guava-retrying实现,也可以使用spring-retry,策略更丰富些。
此处重试次数和延迟时间都写死了,根据自己需求修改。
import com.github.rholder.retry.*;
import org.apache.http.NoHttpResponseException;
import org.apache.http.client.methods.CloseableHttpResponse;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
public class RetryUtils {
public static Retryer<CloseableHttpResponse> getHttpResponseRetryer() {
Retryer<CloseableHttpResponse> retryer = RetryerBuilder.<CloseableHttpResponse>newBuilder()
.retryIfExceptionOfType(NoHttpResponseException.class) //设置异常重试源
.retryIfResult(res -> res == null) //设置根据结果重试
.withWaitStrategy(WaitStrategies.fixedWait(2, TimeUnit.SECONDS)) //设置等待间隔时间
.withStopStrategy(StopStrategies.stopAfterAttempt(999)) //设置最大重试次数
.build();
return retryer;
}
public static CloseableHttpResponse getResponseWithRetry(Callable<CloseableHttpResponse> supplier) throws ExecutionException, RetryException {
Retryer<CloseableHttpResponse> retryer = getHttpResponseRetryer();
CloseableHttpResponse res = retryer.call(supplier);
return res;
}
}
修改源码:
把 org.pentaho.di.trans.steps.http.HTTP
和org.pentaho.di.trans.steps.httppost.HTTPPOST
中的httpClient.execute
都用RetryUtils.getResponseWithRetry
给包起来,如下:
编译,并把原始kettle-engine-8.2.0.0-342.jar
包中的HTTP.class和HTTPPOST.class用自己刚刚编译的替换掉。再把KETTLE_HOME/lib/kettle-engine-8.2.0.0-342.jar
包替换掉。把guava-retrying-2.0.0.jar
也复制到KETTLE_HOME/lib
下。
大功告成,启动kettle!
注意事项:
重新发起rest/http只是适用于请求数据操作,如果发起请求后的操作不是幂等的,重试机制就会造成服务端操作被执行多次,切记!!!