最近准备在Impala中增加对UTF-8的支持,以修正跟Hive、Spark等基于Java的系统在UTF-8字符串上的不兼容表现(如IMPALA-2019、IMPALA-5675、IMPALA-9662等)。结果在impala-shell里简单跑了个substring查询就挂住了:
[localhost:21050] default> select substr("引擎", 1, 3);
Query: select substr("引擎", 1, 3)
Query submitted at: 2020-10-28 11:54:26 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=464f44d405bb16a1:5b41e00e00000000
+----------------------+
| substr('引擎', 1, 3) |
+----------------------+
| 引 |
+----------------------+
Fetched 1 row(s) in 0.13s
[localhost:21050] default> select substr("引擎", 1, 2);
Query: select substr("引擎", 1, 2)
Query submitted at: 2020-10-28 11:54:32 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=d048231be08f29b4:860c959500000000
Caught exception 'utf8' codec can't decode bytes in position 0-1: unexpected end of data, type=exceptions.UnicodeDecodeError'> in FetchResults.
第一个查询执行 substr(“引擎”, 1, 3) 没有问题,因为每个中文字符的UTF-8编码是3个字节,而Impala中把String都当成了字节数组(byte array),因此前3个字节就是第一个中文字符,没问题。
第二个查询读出来的是 “引” 的UTF-8编码的前2个字节,在解码成UTF-8字符时报了unexpected end of data的错误,并且impala-shell卡住不动,只能强行Ctrl+C.
这个在以前的版本是没问题的,以前解码失败的UTF-8字符是以乱码的形式展现,impala-shell也不会卡死。于是我先建了个JIRA汇报问题:IMPALA-10299,我所在的分支是master分支,commit为1bd27a3.
进程卡死时一般是把堆栈打出来,看它卡在哪了。Impala-shell是个python进程,先用pystack看下:
$ ps aux | grep shell
quanlong 30957 0.5 0.0 296100 24716 pts/22 S+ 15:06 0:00 bin/../infra/python/env-gcc7.5.0/bin/python /home/quanlong/workspace/Impala/shell/impala_shell.py
quanlong 31003 0.0 0.0 15960 1088 pts/20 S+ 15:06 0:00 grep --color=auto shell
$ sudo /home/quanlong/.local/bin/pystack 30957
Dumping Threads....
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 2058, in <module>
impala_shell_main()
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 2023, in impala_shell_main
shell.cmdloop(intro)
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 687, in onecmd
return func(arg)
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1113, in do_select
return self._execute_stmt(query_str, print_web_link=True)
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1303, in _execute_stmt
self.imp_client.close_query(self.last_query_handle)
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 847, in close_query
resp = self._do_hs2_rpc(CloseImpalaOperation, retry_on_error=True)
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 959, in _do_hs2_rpc
return rpc()
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 845, in CloseImpalaOperation
return self.imp_service.CloseImpalaOperation(req)
File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/ImpalaService/ImpalaHiveServer2Service.py", line 155, in CloseImpalaOperation
return self.recv_CloseImpalaOperation()
File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/ImpalaService/ImpalaHiveServer2Service.py", line 167, in recv_CloseImpalaOperation
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 148, in readMessageBegin
name = self.trans.readAll(sz)
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 162, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 117, in read
buff = self.handle.recv(sz)
File "" , line 1, in <module>
File "" , line 1, in <module>
这里能看到impala-shell是卡在一个close_query操作,具体是发hs2协议的CloseImpalaOperation RPC,在等impalad那边回复。可以看到thrift层的代码都在做各种read操作并卡住。那么问题很可能出现在server端,要排查一下CloseImpalaOperation这个RPC为什么没有返回。
查看Impalad的日志,看不出什么异常,看起来查询就是正常结束了。比如我重新跑了个 select substr(“引擎”, 1, 4) 的查询,impala-shell 也是卡住不动,impalad端的日志如下:
I1106 09:49:19.698134 15454 Frontend.java:1532] fc450a0fa19a6fbc:9315824900000000] Analyzing query: select substr("引擎", 1, 4) db: default
I1106 09:49:19.702113 15454 BaseAuthorizationChecker.java:110] fc450a0fa19a6fbc:9315824900000000] Authorization check took 3 ms
I1106 09:49:19.702214 15454 Frontend.java:1574] fc450a0fa19a6fbc:9315824900000000] Analysis and authorization finished.
W1106 09:49:19.706602 15459 executor-group.cc:164] fc450a0fa19a6fbc:9315824900000000] Executor group empty group (using coordinator only) is unhealthy: 0 out of 1 are available.
I1106 09:49:19.706990 15459 admission-controller.cc:1646] fc450a0fa19a6fbc:9315824900000000] Trying to admit id=fc450a0fa19a6fbc:9315824900000000 in pool_name=default-pool executor_group_name=empty group (using coordinator only) per_host_mem_estimate=10.00 MB dedicated_coord_mem_estimate=100.00 MB max_requests=-1 max_queued=200 max_mem=-1.00 B
I1106 09:49:19.707115 15459 admission-controller.cc:1654] fc450a0fa19a6fbc:9315824900000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[], total_mem_consumed=0; pool_level_stats: num_running=0, min=0, max=0, pool_total_mem=0)
I1106 09:49:19.707186 15459 admission-controller.cc:1200] fc450a0fa19a6fbc:9315824900000000] Admitting query id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.707155 15454 impala-hs2-server.cc:523] ExecuteStatement(): return_val=TExecuteStatementResp {
01: status (struct) = TStatus {
01: statusCode (i32) = 0,
},
02: operationHandle (struct) = TOperationHandle {
01: operationId (struct) = THandleIdentifier {
01: guid (string) = "\xbco\x9a\xa1\x0f\nE\xfc\x00\x00\x00\x00I\x82\x15\x93",
02: secret (string) = "\xa5|\xf9A\xbc\x8aG\xd4\x8a\r\xb8\xae\xb4\xcf\x9f\xfc",
},
02: operationType (i32) = 0,
03: hasResultSet (bool) = true,
},
}
I1106 09:49:19.707319 15459 impala-server.cc:2062] fc450a0fa19a6fbc:9315824900000000] Registering query locations
I1106 09:49:19.707382 15459 coordinator.cc:149] fc450a0fa19a6fbc:9315824900000000] Exec() query_id=fc450a0fa19a6fbc:9315824900000000 stmt=select substr("引擎", 1, 4)
I1106 09:49:19.707787 15454 impala-hs2-server.cc:832] GetResultSetMetadata(): query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.707967 15459 coordinator.cc:473] fc450a0fa19a6fbc:9315824900000000] starting execution on 1 backends for query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.707902 15454 impala-hs2-server.cc:864] GetResultSetMetadata(): return_val=TGetResultSetMetadataResp {
01: status (struct) = TStatus {
01: statusCode (i32) = 0,
},
02: schema (struct) = TTableSchema {
01: columns (list) = list<struct>[1] {
[0] = TColumnDesc {
01: columnName (string) = "substr('\xe5\xbc\x95\xe6\x93\x8e', 1, 4)",
02: typeDesc (struct) = TTypeDesc {
01: types (list) = list<struct>[1] {
[0] = TTypeEntry {
01: primitiveEntry (struct) = TPrimitiveTypeEntry {
01: type (i32) = 7,
},
},
},
},
03: position (i32) = 0,
},
},
},
}
I1106 09:49:19.709297 7513 control-service.cc:142] fc450a0fa19a6fbc:9315824900000000] ExecQueryFInstances(): query_id=fc450a0fa19a6fbc:9315824900000000 coord=quanlong-OptiPlex-BJ:27000 #instances=1
I1106 09:49:19.711102 15462 query-state.cc:897] fc450a0fa19a6fbc:9315824900000000] Executing instance. instance_id=fc450a0fa19a6fbc:9315824900000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1
I1106 09:49:19.711118 15459 coordinator.cc:532] fc450a0fa19a6fbc:9315824900000000] started execution on 1 backends for query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.810067 15454 coordinator.cc:672] ExecState: query id=fc450a0fa19a6fbc:9315824900000000 execution completed
I1106 09:49:19.810063 15462 krpc-data-stream-mgr.cc:337] fc450a0fa19a6fbc:9315824900000000] cancelling active streams for fragment_instance_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.810087 15454 coordinator.cc:821] Coordinator waiting for backends to finish, 1 remaining. query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.810119 15462 query-state.cc:906] fc450a0fa19a6fbc:9315824900000000] Instance completed. instance_id=fc450a0fa19a6fbc:9315824900000000 #in-flight=0 status=OK
I1106 09:49:19.810176 15461 query-state.cc:464] fc450a0fa19a6fbc:9315824900000000] UpdateBackendExecState(): last report for fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.810976 7513 coordinator.cc:965] Backend completed: host=quanlong-OptiPlex-BJ:27000 remaining=1 query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811048 15454 coordinator.cc:1284] Release admission control resources for query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811172 15461 query-state.cc:686] fc450a0fa19a6fbc:9315824900000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled
I1106 09:49:19.811204 15461 query-state.cc:923] fc450a0fa19a6fbc:9315824900000000] Cancel: query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811213 15461 krpc-data-stream-mgr.cc:337] fc450a0fa19a6fbc:9315824900000000] cancelling active streams for fragment_instance_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811383 15454 impala-hs2-server.cc:800] CloseOperation(): query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811399 15454 impala-server.cc:1320] UnregisterQuery(): query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.811409 15454 coordinator-backend-state.cc:921] query_id=fc450a0fa19a6fbc:9315824900000000 target backend=127.0.0.1:27000: Not cancelling because the backend is already done:
I1106 09:49:19.811415 15454 coordinator.cc:932] CancelBackends() query_id=fc450a0fa19a6fbc:9315824900000000, tried to cancel 0 backends
I1106 09:49:19.812331 8319 impala-server.cc:1352] Query successfully unregistered: query_id=fc450a0fa19a6fbc:9315824900000000
I1106 09:49:19.812489 8319 query-exec-mgr.cc:213] ReleaseQueryState(): deleted query_id=fc450a0fa19a6fbc:9315824900000000
那有一种可能,就是卡在了没有日志的地方。好在我这是开发环境,而且问题可固定重现,可以加点日志重新编译来Debug。先找一下处理 hs2 协议里 CloseImpalaOperation 这个RPC的代码,主要是在 be/src/service/impala-hs2-server.cc 中:https://github.com/apache/impala/blob/1bd27a3/be/src/service/impala-hs2-server.cc#L793-L822
void ImpalaServer::CloseOperation(TCloseOperationResp& return_val,
const TCloseOperationReq& request) {
TCloseImpalaOperationReq request2;
request2.operationHandle = request.operationHandle;
TCloseImpalaOperationResp tmp_resp;
CloseImpalaOperation(tmp_resp, request2);
return_val.status = tmp_resp.status;
}
void ImpalaServer::CloseImpalaOperation(TCloseImpalaOperationResp& return_val,
const TCloseImpalaOperationReq& request) {
TUniqueId query_id;
TUniqueId op_secret;
HS2_RETURN_IF_ERROR(return_val, THandleIdentifierToTUniqueId(
request.operationHandle.operationId, &query_id, &op_secret),
SQLSTATE_GENERAL_ERROR);
VLOG_QUERY << "CloseOperation(): query_id=" << PrintId(query_id);
QueryHandle query_handle;
HS2_RETURN_IF_ERROR(
return_val, GetActiveQueryHandle(query_id, &query_handle), SQLSTATE_GENERAL_ERROR);
ScopedSessionState session_handle(this);
const TUniqueId session_id = query_handle->session_id();
HS2_RETURN_IF_ERROR(return_val,
session_handle.WithSession(session_id, SecretArg::Operation(op_secret, query_id)),
SQLSTATE_GENERAL_ERROR);
if (query_handle->stmt_type() == TStmtType::DML) {
Status query_status;
if (query_handle->GetDmlStats(&return_val.dml_result, &query_status)) {
return_val.__isset.dml_result = true;
}
}
// TODO: use timeout to get rid of unwanted query_handle.
HS2_RETURN_IF_ERROR(return_val, UnregisterQuery(query_id, true),
SQLSTATE_GENERAL_ERROR);
return_val.status.__set_statusCode(thrift::TStatusCode::SUCCESS_STATUS);
}
这里有一些耗时的操作,比如最后的 UnregisterQuery,可能卡在这。简单暴力的方式就是加log,看这个函数卡在哪了。CloseImpalaOperation还被前面一个CloseOperation调用了,也加点日志。于是做了如下修改重新编译:
diff --git a/be/src/service/impala-hs2-server.cc b/be/src/service/impala-hs2-server.cc
index 5c618fb..82a2820 100644
--- a/be/src/service/impala-hs2-server.cc
+++ b/be/src/service/impala-hs2-server.cc
@@ -786,14 +786,17 @@ void ImpalaServer::CloseOperation(TCloseOperationResp& return_val,
TCloseImpalaOperationReq request2;
request2.operationHandle = request.operationHandle;
TCloseImpalaOperationResp tmp_resp;
+ VLOG_QUERY << "0";
CloseImpalaOperation(tmp_resp, request2);
return_val.status = tmp_resp.status;
+ VLOG_QUERY << "5";
}
void ImpalaServer::CloseImpalaOperation(TCloseImpalaOperationResp& return_val,
const TCloseImpalaOperationReq& request) {
TUniqueId query_id;
TUniqueId op_secret;
+ VLOG_QUERY << "1";
HS2_RETURN_IF_ERROR(return_val, THandleIdentifierToTUniqueId(
request.operationHandle.operationId, &query_id, &op_secret),
SQLSTATE_GENERAL_ERROR);
@@ -802,12 +805,14 @@ void ImpalaServer::CloseImpalaOperation(TCloseImpalaOperationResp& return_val,
QueryHandle query_handle;
HS2_RETURN_IF_ERROR(
return_val, GetActiveQueryHandle(query_id, &query_handle), SQLSTATE_GENERAL_ERROR);
+ VLOG_QUERY << "2";
ScopedSessionState session_handle(this);
const TUniqueId session_id = query_handle->session_id();
HS2_RETURN_IF_ERROR(return_val,
session_handle.WithSession(session_id, SecretArg::Operation(op_secret, query_id)),
SQLSTATE_GENERAL_ERROR);
+ VLOG_QUERY << "3";
if (query_handle->stmt_type() == TStmtType::DML) {
Status query_status;
if (query_handle->GetDmlStats(&return_val.dml_result, &query_status)) {
@@ -819,6 +824,7 @@ void ImpalaServer::CloseImpalaOperation(TCloseImpalaOperationResp& return_val,
HS2_RETURN_IF_ERROR(return_val, UnregisterQuery(query_id, true),
SQLSTATE_GENERAL_ERROR);
return_val.status.__set_statusCode(thrift::TStatusCode::SUCCESS_STATUS);
+ VLOG_QUERY << "4";
}
void ImpalaServer::GetResultSetMetadata(TGetResultSetMetadataResp& return_val,
改完只需要 ninja impalad 重新编译 impalad(如果最初编译时没启用ninja的话,则是用 make impalad ),因为只改了BE的代码。然后重启集群重现问题,发现 ImpalaServer::CloseImpalaOperation 新加的日志都打出来了,但 ImpalaServer::CloseOperation 没有打。说明 CloseOperation 没有被调用,而 CloseImpalaOperation 正常结束了。日志如下:
I1106 10:12:25.581028 29674 query-state.cc:464] bc46ff8a74592f6b:9146663900000000] UpdateBackendExecState(): last report for bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.581405 29667 impala-hs2-server.cc:799] 1
I1106 10:12:25.581435 29667 impala-hs2-server.cc:803] CloseOperation(): query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.581471 29667 impala-hs2-server.cc:808] 2
I1106 10:12:25.581503 29667 impala-hs2-server.cc:815] 3
I1106 10:12:25.581519 29667 impala-server.cc:1320] UnregisterQuery(): query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.581562 29667 coordinator.cc:672] ExecState: query id=bc46ff8a74592f6b:9146663900000000 execution cancelled
I1106 10:12:25.581620 29667 coordinator-backend-state.cc:921] query_id=bc46ff8a74592f6b:9146663900000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
I1106 10:12:25.582509 28909 control-service.cc:213] CancelQueryFInstances(): query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.582554 28909 query-exec-mgr.cc:126] QueryState: query_id=bc46ff8a74592f6b:9146663900000000 refcnt=3
I1106 10:12:25.582571 28909 query-state.cc:923] Cancel: query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.582594 28909 krpc-data-stream-mgr.cc:337] cancelling active streams for fragment_instance_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.582669 28908 impala-server.cc:1483] Invalid or unknown query handle: bc46ff8a74592f6b:9146663900000000.
I1106 10:12:25.582744 28908 control-service.cc:173] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): bc46ff8a74592f6b:9146663900000000 remote host=127.0.0.1:43294
I1106 10:12:25.583061 29667 coordinator.cc:932] CancelBackends() query_id=bc46ff8a74592f6b:9146663900000000, tried to cancel 1 backends
I1106 10:12:25.583123 29674 query-state.cc:686] bc46ff8a74592f6b:9146663900000000] Cancelling fragment instances as directed by the coordinator. Returned status: ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): bc46ff8a74592f6b:9146663900000000 remote host=127.0.0.1:43294
I1106 10:12:25.583168 29674 query-state.cc:923] bc46ff8a74592f6b:9146663900000000] Cancel: query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.583214 29667 coordinator.cc:1284] Release admission control resources for query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.583902 29667 impala-hs2-server.cc:827] 4
I1106 10:12:25.586117 29513 impala-server.cc:1352] Query successfully unregistered: query_id=bc46ff8a74592f6b:9146663900000000
I1106 10:12:25.586696 29513 query-exec-mgr.cc:213] ReleaseQueryState(): deleted query_id=bc46ff8a74592f6b:9146663900000000
那么可能是卡在什么地方呢?看一下在impalad里是谁调用了CloseImpalaOperation,有可能卡在上层调用方。impala里有一个GetStackTrace()函数,定义在debug-util.h里,用来打印运行时的代码堆栈。于是我把 4 号日志改为:
+ VLOG_QUERY << "4" << "\n" << GetStackTrace();
再重现一次问题,可以看到堆栈:
I1106 10:26:58.427156 565 impala-hs2-server.cc:827] 4
@ 0x2580dac impala::ImpalaServer::CloseImpalaOperation()
@ 0x2b5ccba impala::ImpalaHiveServer2ServiceProcessor::process_CloseImpalaOperation()
@ 0x2b5b418 impala::ImpalaHiveServer2ServiceProcessor::dispatchCall()
@ 0x1cd7721 apache::thrift::TDispatchProcessor::process()
@ 0x21ef3fc apache::thrift::server::TAcceptQueueServer::Task::run()
@ 0x21e4810 impala::ThriftThread::RunRunnable()
@ 0x21e5e4c boost::_mfi::mf2<>::operator()()
@ 0x21e5ce0 boost::_bi::list3<>::operator()<>()
@ 0x21e5a26 boost::_bi::bind_t<>::operator()()
@ 0x21e5938 boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x215e37b boost::function0<>::operator()()
@ 0x274cef9 impala::Thread::SuperviseThread()
@ 0x2754e96 boost::_bi::list5<>::operator()<>()
@ 0x2754dba boost::_bi::bind_t<>::operator()()
@ 0x2754d7b boost::detail::thread_data<>::run()
@ 0x3f41e21 thread_proxy
@ 0x7f2e95b706b9 start_thread
@ 0x7f2e9263e4dc clone
I1106 10:26:58.577329 565 query-exec-mgr.cc:213] ReleaseQueryState(): deleted query_id=ea4b5873a9a4bc14:240fca4800000000
ImpalaHiveServer2ServiceProcessor::process_CloseImpalaOperation 这种函数代码是Thrift编译出来的,在 be/generated-sources/gen-cpp/ImpalaHiveServer2Service.cpp 里,这种出问题的可能性较小。我用minidump打印出了impalad的所有线程堆栈(具体操作参考《Impala查询卡顿分析案例》中的 “使用Minidump替代Core Dump” 章节,或参考官方文档 Debugging Impala Minidumps),但并没有找到相关的线程,这条Debug思路走不下去。
转变思路,看是哪个Commit引入了这个bug,然后对症下药。impala-shell 卡住的地方是最近这个 commit 新加的代码:fe6e625
即 shell/impala_shell.py 第1303 行:
shell/impala_shell.py 里处理 UnicodeDecodeError 时会调 close_query:
1297 except UnicodeDecodeError as e:
1298 # An error occoured possibly during the fetching.
1299 # Depending of which protocol is at use it can come from different places.
1300 # Possibly occours because we try to display binary data which contains
1301 # undecodable elements.
1302 if self.last_query_handle is not None:
1303 self.imp_client.close_query(self.last_query_handle)
1304 print('UnicodeDecodeError : %s \nPlease check for columns containing binary data '
1305 'to find the possible source of the error.' % (e,), file=sys.stderr)
说明首先是遇到了 UnicodeDecodeError,然后想关闭查询时卡住了。所以有两个问题要解释:(1) 为什么抛 UnicodeDecodeError;(2) 为什么会卡住。
这个commit(fe6e625)其实加了一段会抛 UnicodeDecodeError 的代码,但只对 beeswax 协议的逻辑加了,在 shell/impala_client.py 里:
1016 class ImpalaBeeswaxClient(ImpalaClient):
...
1108 def fetch(self, query_handle):
1109 while True:
1110 result, rpc_status = self._do_beeswax_rpc(
1111 lambda: self.imp_service.fetch(query_handle, False,
1112 self.fetch_size))
1113 if rpc_status != RpcStatus.OK:
1114 raise RPCException()
1115
1116 def split_row_and_decode_if_needed(row):
1117 try:
1118 return row.split('\t')
1119 except UnicodeDecodeError:
1120 return row.decode('utf-8', 'replace').split('\t')
1121
1122 yield [split_row_and_decode_if_needed(row) for row in result.data]
突然发现还没有玩过 beeswax 协议的情况。在 impala 4.0 里,impala-shell 默认使用 hs2 协议和 impalad 通信,切回 beeswax 协议需要显式指定 --protocol=beeswax:
$ bin/impala-shell.sh --protocol=beeswax -q 'select substr("引擎", 1, 4)'
Unknown Exception : 'NoneType' object is not iterable
Traceback (most recent call last):
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1255, in _execute_stmt
for rows in rows_fetched:
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1111, in fetch
lambda: self.imp_service.fetch(query_handle, False,
TypeError: 'NoneType' object is not iterable
Could not execute command: select substr("引擎", 1, 4)
用 beeswax 协议的话是直接报错退出,表现也不一样,并没有报 utf-8 解码失败的错。
然而直接回滚这个commit并不能修复这个问题,应该是跟thrift相关的某个改动引起的,因为utf-8解码错误就是从thrift层抛出来的。其实在四月份发布的 Impala 3.4 中是没有这个bug的,可以把中间跟 thrift 相关的commit都找出来,然后用二分法去找出问题的commit。
$ git log --oneline -i --grep=thrift --since='2020-04-01'
981ef10 IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
ee4043e IMPALA-10168: Expose JSON catalog objects in catalogd's debug page
6542b60 IMPALA-10243: ConcurrentModificationException during parallel INSERTs
63c435c IMPALA-9232 Potential overflow in serializeThriftMsg
6bb3b88 IMPALA-9180 (part 1): Remove legacy ImpalaInternalService
28181cb IMPALA-9930 (part 1): Initial refactor for admission control service
79b17db IMPALA-9382: part 1: transposed profile prototype
13f50ea IMPALA-9229: impala-shell 'profile' to show original and retried queries
fe6e625 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
75146c9 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
074731e IMPALA-3127: Support incremental metadata updates in partition level
ee18dc3 IMPALA-9987: Improve logging around HTTP connections
8aeb282 IMPALA-9540 Test that Impala Shell no longer sends duplicate "Host" headers in http mode.
b67c090 IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf
8b09343 IMPALA-9719: Upgrade sasl-0.1.1 -> 0.2.1
e46bbf2 IMPALA-9692 (part 1): Refactor TBackendDescriptor to protobuf
bc9d7e0 IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
bd4458b IMPALA-9422: Re-visit and improve join node and builder's counters
2451846 IMPALA-9543 Reduce duplicate code in thrift CMakeLists.txt
最终定位到是这个commit开始有问题:
bc9d7e0 IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
具体内容见Github: https://github.com/apache/impala/commit/bc9d7e0
或Gerrit: http://gerrit.cloudera.org:8080/15524
但这个commit实在太大了,一次性做了好几件事情,比如给impala-shell加了个option可以指定任意virtualenv,设置PYTHONIOENCODING=utf8,把impala-shell依赖的thrift版本升级到0.11.0等。现在过了很久了没法clean地回滚这个commit,怀疑是PYTHONIOENCODING设置为utf8引起的,但改为ascii并不能修复问题。完全理解这个大commit工作量有点大,这条路不好走下去。
还是回到最初的思路,直接定位impalad端卡在哪了导致不返回RPC结果。同事提议可以用 tcpdump 对比下正常和异常的查询,这确实是个很好的思路!其实之前也对比过正常查询和异常查询在 impalad 端的日志,发现都一样。但对比 TCP 包的内容却是个更深入的 debug 手段,应用层看不出端倪,我们就往下走一层,看 TCP 层是怎么样!
在我的开发环境里,impala-shell是通过localhost连接的impalad,所以监听的设备要选 lo (Local Loopback)。现在master分支的impala-shell默认使用hs2协议,因此连的是impalad的21050端口。由于只有我的 impala-shell 能连接到这个 impalad,因此我直接监听 21050 端口的 TCP 包就是所有该 impala-shell 的通信包了。如果是在生产环境中,则要只监听 impala-shell 使用的端口,以区分其它应用跟 impalad 的通信。
这里我用的指令是
sudo tcpdump -i lo 'port 21050'
把 impala-shell 启动起来,执行一条正常语句 select substr(“引擎”, 1, 3),然后再执行一条异常语句 select substr(“引擎”, 1, 4),明显能看到异常语句少了几个包。
正常语句的TCP包有26个:
11:04:52.169568 IP localhost.40938 > localhost.21050: Flags [P.], seq 2344715580:2344715671, ack 246892403, win 512, options [nop,nop,TS val 3249330901 ecr 3249311189], length 91
11:04:52.169913 IP localhost.21050 > localhost.40938: Flags [P.], seq 1:181, ack 91, win 512, options [nop,nop,TS val 3249330902 ecr 3249330901], length 180
11:04:52.169944 IP localhost.40938 > localhost.21050: Flags [.], ack 181, win 511, options [nop,nop,TS val 3249330902 ecr 3249330902], length 0
11:04:52.170831 IP localhost.40938 > localhost.21050: Flags [P.], seq 91:292, ack 181, win 512, options [nop,nop,TS val 3249330902 ecr 3249330902], length 201
11:04:52.187515 IP localhost.21050 > localhost.40938: Flags [P.], seq 181:290, ack 292, win 512, options [nop,nop,TS val 3249330919 ecr 3249330902], length 109
11:04:52.187852 IP localhost.40938 > localhost.21050: Flags [P.], seq 292:394, ack 290, win 512, options [nop,nop,TS val 3249330920 ecr 3249330919], length 102
11:04:52.188237 IP localhost.21050 > localhost.40938: Flags [P.], seq 290:411, ack 394, win 512, options [nop,nop,TS val 3249330920 ecr 3249330920], length 121
11:04:52.189283 IP localhost.40938 > localhost.21050: Flags [P.], seq 394:494, ack 411, win 512, options [nop,nop,TS val 3249330921 ecr 3249330920], length 100
11:04:52.189459 IP localhost.21050 > localhost.40938: Flags [P.], seq 411:464, ack 494, win 512, options [nop,nop,TS val 3249330921 ecr 3249330921], length 53
11:04:52.231963 IP localhost.40938 > localhost.21050: Flags [.], ack 464, win 512, options [nop,nop,TS val 3249330964 ecr 3249330921], length 0
11:04:52.290038 IP localhost.40938 > localhost.21050: Flags [P.], seq 494:594, ack 464, win 512, options [nop,nop,TS val 3249331022 ecr 3249330921], length 100
11:04:52.290276 IP localhost.21050 > localhost.40938: Flags [P.], seq 464:517, ack 594, win 512, options [nop,nop,TS val 3249331022 ecr 3249331022], length 53
11:04:52.290348 IP localhost.40938 > localhost.21050: Flags [.], ack 517, win 512, options [nop,nop,TS val 3249331022 ecr 3249331022], length 0
11:04:52.290815 IP localhost.40938 > localhost.21050: Flags [P.], seq 594:706, ack 517, win 512, options [nop,nop,TS val 3249331022 ecr 3249331022], length 112
11:04:52.291191 IP localhost.21050 > localhost.40938: Flags [P.], seq 517:620, ack 706, win 512, options [nop,nop,TS val 3249331023 ecr 3249331022], length 103
11:04:52.292094 IP localhost.40938 > localhost.21050: Flags [P.], seq 706:818, ack 620, win 512, options [nop,nop,TS val 3249331024 ecr 3249331023], length 112
11:04:52.294791 IP localhost.21050 > localhost.40938: Flags [P.], seq 620:715, ack 818, win 512, options [nop,nop,TS val 3249331026 ecr 3249331024], length 95
11:04:52.295077 IP localhost.40938 > localhost.21050: Flags [P.], seq 818:906, ack 715, win 512, options [nop,nop,TS val 3249331027 ecr 3249331026], length 88
11:04:52.295349 IP localhost.21050 > localhost.40938: Flags [P.], seq 715:823, ack 906, win 512, options [nop,nop,TS val 3249331027 ecr 3249331027], length 108
11:04:52.295624 IP localhost.40938 > localhost.21050: Flags [P.], seq 906:1008, ack 823, win 512, options [nop,nop,TS val 3249331027 ecr 3249331027], length 102
11:04:52.335924 IP localhost.21050 > localhost.40938: Flags [.], ack 1008, win 512, options [nop,nop,TS val 3249331068 ecr 3249331027], length 0
11:04:52.458929 IP localhost.21050 > localhost.40938: Flags [P.], seq 823:871, ack 1008, win 512, options [nop,nop,TS val 3249331191 ecr 3249331027], length 48
11:04:52.459074 IP localhost.40938 > localhost.21050: Flags [P.], seq 1008:1172, ack 871, win 512, options [nop,nop,TS val 3249331191 ecr 3249331191], length 164
11:04:52.459080 IP localhost.21050 > localhost.40938: Flags [.], ack 1172, win 511, options [nop,nop,TS val 3249331191 ecr 3249331191], length 0
11:04:52.460225 IP localhost.21050 > localhost.40938: Flags [P.], seq 871:12312, ack 1172, win 512, options [nop,nop,TS val 3249331192 ecr 3249331191], length 11441
11:04:52.503980 IP localhost.40938 > localhost.21050: Flags [.], ack 12312, win 512, options [nop,nop,TS val 3249331236 ecr 3249331192], length 0
异常语句的TCP包只有19个:
11:05:00.805456 IP localhost.40938 > localhost.21050: Flags [P.], seq 1172:1263, ack 12312, win 512, options [nop,nop,TS val 3249339537 ecr 3249331192], length 91
11:05:00.805554 IP localhost.21050 > localhost.40938: Flags [P.], seq 12312:12492, ack 1263, win 512, options [nop,nop,TS val 3249339537 ecr 3249339537], length 180
11:05:00.805561 IP localhost.40938 > localhost.21050: Flags [.], ack 12492, win 511, options [nop,nop,TS val 3249339537 ecr 3249339537], length 0
11:05:00.805751 IP localhost.40938 > localhost.21050: Flags [P.], seq 1263:1464, ack 12492, win 512, options [nop,nop,TS val 3249339537 ecr 3249339537], length 201
11:05:00.810144 IP localhost.21050 > localhost.40938: Flags [P.], seq 12492:12601, ack 1464, win 512, options [nop,nop,TS val 3249339542 ecr 3249339537], length 109
11:05:00.810290 IP localhost.40938 > localhost.21050: Flags [P.], seq 1464:1566, ack 12601, win 512, options [nop,nop,TS val 3249339542 ecr 3249339542], length 102
11:05:00.810415 IP localhost.21050 > localhost.40938: Flags [P.], seq 12601:12722, ack 1566, win 512, options [nop,nop,TS val 3249339542 ecr 3249339542], length 121
11:05:00.810576 IP localhost.40938 > localhost.21050: Flags [P.], seq 1566:1666, ack 12722, win 512, options [nop,nop,TS val 3249339542 ecr 3249339542], length 100
11:05:00.810650 IP localhost.21050 > localhost.40938: Flags [P.], seq 12722:12775, ack 1666, win 512, options [nop,nop,TS val 3249339542 ecr 3249339542], length 53
11:05:00.851921 IP localhost.40938 > localhost.21050: Flags [.], ack 12775, win 512, options [nop,nop,TS val 3249339583 ecr 3249339542], length 0
11:05:00.911128 IP localhost.40938 > localhost.21050: Flags [P.], seq 1666:1766, ack 12775, win 512, options [nop,nop,TS val 3249339643 ecr 3249339542], length 100
11:05:00.911375 IP localhost.21050 > localhost.40938: Flags [P.], seq 12775:12828, ack 1766, win 512, options [nop,nop,TS val 3249339643 ecr 3249339643], length 53
11:05:00.911468 IP localhost.40938 > localhost.21050: Flags [.], ack 12828, win 512, options [nop,nop,TS val 3249339643 ecr 3249339643], length 0
11:05:00.911948 IP localhost.40938 > localhost.21050: Flags [P.], seq 1766:1878, ack 12828, win 512, options [nop,nop,TS val 3249339644 ecr 3249339643], length 112
11:05:00.912416 IP localhost.21050 > localhost.40938: Flags [P.], seq 12828:12932, ack 1878, win 512, options [nop,nop,TS val 3249339644 ecr 3249339644], length 104
11:05:00.912848 IP localhost.40938 > localhost.21050: Flags [P.], seq 1878:1980, ack 12932, win 512, options [nop,nop,TS val 3249339644 ecr 3249339644], length 102
11:05:00.959907 IP localhost.21050 > localhost.40938: Flags [.], ack 1980, win 512, options [nop,nop,TS val 3249339691 ecr 3249339644], length 0
11:05:01.097326 IP localhost.21050 > localhost.40938: Flags [P.], seq 12932:12980, ack 1980, win 512, options [nop,nop,TS val 3249339829 ecr 3249339644], length 48
11:05:01.139967 IP localhost.40938 > localhost.21050: Flags [.], ack 12980, win 512, options [nop,nop,TS val 3249339871 ecr 3249339829], length 0
在这里我们终于看到了区别!接下来再重现一次,把两个查询的TCP包都dump下来。在一个终端打开impala-shell,另一个终端执行
sudo tcpdump -i lo 'port 21050' -w good_query_tcp_dump.txt
这样会把tcp包的内容都dump到文件good_query_tcp_dump.txt里。在impala-shell中执行正常查询 select substr(“引擎”, 1, 3),然后Ctrl+C关掉tcpdump进程。再执行
sudo tcpdump -i lo 'port 21050' -w bad_query_tcp_dump.txt
这次在impala-shell中跑异常查询 select substr(“引擎”, 1, 4),再次Ctrl+C关掉tcpdump。这样我们就采集到两个查询的TCP包了。直接用vim打开,能看到里面有文本内容,FetchResults、GetLog、CloseImpalaOperation这些都是成对出现,一个是request,另一个是response。正常查询在CloseImpalaOperation后还会拿profile:
但是异常查询在CloseImpalaOperation后只有一个未知的小包:
从这里我们已经能看出,impalad其实已经返回了CloseImpalaOperation的结果!看起来是 impala-shell 收到 CloseImpalaOperation 的结果后,又发出了个很短的包,然后在等结果。看一下 impala-shell 这边 thrift 层的代码在做什么。
前面已经用 pystack 打出了 impala-shell 的代码堆栈,最底下几层是 thrift 代码:
File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/ImpalaService/ImpalaHiveServer2Service.py", line 155, in CloseImpalaOperation
return self.recv_CloseImpalaOperation()
File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/ImpalaService/ImpalaHiveServer2Service.py", line 167, in recv_CloseImpalaOperation
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 148, in readMessageBegin
name = self.trans.readAll(sz)
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 162, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p2/python/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 117, in read
buff = self.handle.recv(sz)
好在都给出了文件名和行号,直接就能找到代码。从上到下依次看,recv_CloseImpalaOperation() 卡在第二行代码的 iprot.readMessageBegin():
def recv_CloseImpalaOperation(self):
iprot = self._iprot
(fname, mtype, rseqid) = iprot.readMessageBegin()
if mtype == TMessageType.EXCEPTION:
x = TApplicationException()
x.read(iprot)
iprot.readMessageEnd()
raise x
result = CloseImpalaOperation_result()
result.read(iprot)
iprot.readMessageEnd()
if result.success is not None:
return result.success
raise TApplicationException(TApplicationException.MISSING_RESULT, "CloseImpalaOperation failed: unknown result")
readMessageBegin() 是在 thrift/protocol/TBinaryProtocol.py 里定义的,这是 thrift 自己的代码。前面的 recv_CloseImpalaOperation 所在的文件是 ImpalaService/ImpalaHiveServer2Service.py,这是 thrift 编译出来的 python 代码。
133 def readMessageBegin(self):
134 sz = self.readI32()
135 if sz < 0:
136 version = sz & TBinaryProtocol.VERSION_MASK
137 if version != TBinaryProtocol.VERSION_1:
138 raise TProtocolException(
139 type=TProtocolException.BAD_VERSION,
140 message='Bad version in readMessageBegin: %d' % (sz))
141 type = sz & TBinaryProtocol.TYPE_MASK
142 name = self.readString()
143 seqid = self.readI32()
144 else:
145 if self.strictRead:
146 raise TProtocolException(type=TProtocolException.BAD_VERSION,
147 message='No protocol version header')
148 name = self.trans.readAll(sz) # <-------- 卡在了这一行
149 type = self.readByte()
150 seqid = self.readI32()
151 return (name, type, seqid)
如上代码,堆栈显示是卡在了第148行的 readAll() 调用上。这个函数主体是个 if 语句,由第一行读入的 int32 值的正负来决定走哪个分支。可以直接改这段代码,把 sz 的值 print 出来。对比正常和异常和查询,发现正常查询的 sz 值都是负的,异常查询在读到一个正的 sz 值后进入了 else 分支,然后就卡住不动了。所以 impala-shell 里的 thrift 代码很可能解读了错误的 sz 值,从而在等一些不可能到来的数据。而这个误读的 sz 值,很可能是前面解码失败的 utf-8 字符的遗留字节。
带着这个猜想,我回头调查为什么会出现 utf-8 解码失败。研究了一些 thrift 的代码、python2 和 python3 在str和unicode上的不同表现、以及处理 utf-8 的规范 (即 “UTF-8 Sandwith”——系统外部是byte数组,系统内部是utf8字符串,在系统边界做编码/解码),最终发现 thrift 的 fast_decode 逻辑里有一个 bug:THRIFT-5303。大致解析一下:Thrift在解译(decode) 字节数组到thrift对象时有一个fast_decode的方案,调用C++的library来加速解码。在fast_decode的代码(C++代码)里调用了 Python 的decode函数来解码 utf-8 字符串,但却没有指定错误处理函数,因此在里面直接抛了 UnicodeDecodeError 出来。
修完这个后 impala-shell 的问题也就修复了。其实 thrift-0.11.0 已经做到了 “UTF-8 Sandwith” 的规范,因此在客户端解译 thrift 对象时是不应该抛 UnicodeDecodeError 的,因为这属于 Thrift 层的内部,字符串都已经是正常的 utf-8 形式了。问题出在 impalad 这边还用的是 thrift-0.9.3,因此impalad这边 thrift 内部的字符串还是字节数组的形式,并没有做任何 utf-8 相关的事情,所以坑了一把高版本的 thrift 客户端。当然这本身是客户端的 bug,修完就好了。
这次 Debug 之所以曲折,是因为一开始走错了方向,一味地在找 impalad 端卡在哪,因为无法确认其是否正常返回了 RPC,一开始也很难怀疑 Thrift 层有问题。陷入僵局后,还是用 tcpdump 打开了局面,证实了 impalad 其实已经返回了 RPC,只是 impala-shell 这边在等过多的输入。然后才走上正轨,去研究 Thrift 层为什么会抛 utf-8 解码失败的错,从而修复了问题。这次最大的感悟就是,系统都是分层的,这一层看不清的问题,不防往下走一层,或许就看清了。
最后列举一下一些有用的知识点: