Happy_Enger

NCCL源码解析: proxy 线程

文章目录

前言
概括
详解
- - 1. 用到的变量
  - 2. proxy 线程创建
  - - 2.1 ncclProxyService()
    - 2.2 proxyServiceInitOp()
    - 2.2 proxyProgressAsync()
  - 4. ncclProxyConnect()
  - - 4.1 ncclProxyCallBlocking()
    - 4.2 ncclPollProxyResponse()

前言

NCCL 源码解析总目录

我尽量在每个函数之前介绍每个函数的作用，建议先不要投入到函数内部实现，先把函数作用搞清楚，有了整体框架，再回归到细节。

习惯： 我的笔记习惯：为了便于快速理解，函数调用关系通过缩进表示，也可能是函数展开，根据情况而定。

如下

// 调用 proxyConnInit
NCCLCHECK(proxyConnInit(peer, connectionPool, proxyState, (ncclProxyInitReq*) op->reqBuff, (ncclProxyInitResp*) op->respBuff, &op->connection));
// 对函数 proxyConnInit 进行展开，可方便看参数
static ncclResult_t proxyConnInit(struct ncclProxyLocalPeer* peer, struct ncclProxyConnectionPool* connectionPool, struct ncclProxyState* proxyState, ncclProxyInitReq* req, ncclProxyInitResp* resp, struct

如有问题，请留言指正。

图后面再补；
有些遗漏之处，还没涉及，后面补；
闲话后面再补。

概括

每个GPU对应一个管理线程或者进程，在卡与卡之间建立通信的时候，会额外创建一个代理线程去完成这件事，代理线程是被动的，该做什么事还是由GPU对应的管理线程去通过TCP下发。
代理线程的主要工作有:

监听TCP端口
调用 ncclTransportComm 的 proxySharedInit， proxySetup，proxyConnect
关闭TCP链接

详解

1. 用到的变量

主要关注 comm->proxyState 的初始化，后面会作为理线程参数代使用，用到的时候再来看也行。

// 初始化
commAlloc()
	NCCLCHECK(ncclCalloc(&sharedRes, 1));
bootstrapInit()
	// proxy is aborted through a message; don't set abortFlag
	// 申请内存
	NCCLCHECK(ncclCalloc(&proxySocket, 1));
	// 建立 socket -> proxySocket
	NCCLCHECK(ncclSocketInit(proxySocket, &bootstrapNetIfAddr, comm->magic, ncclSocketTypeProxy, comm->abortFlag));
	// Listen 状态
	NCCLCHECK(ncclSocketListen(proxySocket));
	// 获取地址保存在 state->peerProxyAddresses + rank , IP + Port
	NCCLCHECK(ncclSocketGetAddr(proxySocket, state->peerProxyAddresses+rank));
		struct bootstrapState* state;
		comm->bootstrap = state;
	// 所有节点聚合， state->peerProxyAddresses 保存全部地址
	NCCLCHECK(bootstrapAllGather(state, state->peerProxyAddresses, sizeof(union ncclSocketAddress)));
	// 申请内存初始化 comm->proxyState
	NCCLCHECK(ncclProxyInit(comm, proxySocket, state->peerProxyAddresses));
		NCCLCHECK(ncclCalloc(&comm->sharedRes->proxyState, 1));
		comm->proxyState = comm->sharedRes->proxyState;
		comm->proxyState->refCount = 1;
		comm->proxyState->listenSock = proxySocket;
		comm->proxyState->peerAddresses = state->peerProxyAddresses;

2. proxy 线程创建

主要通过 ncclProxyCreate() 进行 proxyState 对象属性初始化，NCCL 初始化时会创建线程ncclProxyService。

ncclProxyCreate(comm) 
{
	// proxyState 来自 comm->proxyState
	struct ncclProxyState* proxyState = comm->proxyState;
	// 属性初始化，每个属性什么用，用到的时候介绍
	proxyState->tpRank = comm->rank;
	proxyState->tpnRanks = comm->nRanks;
	proxyState->tpLocalnRanks = comm->localRanks;
	proxyState->cudaDev = comm->cudaDev;
	proxyState->abortFlag = comm->abortFlag;
	proxyState->p2pnChannels = comm->p2pnChannels;
	proxyState->p2pChunkSize = comm->p2pChunkSize;
	proxyState->nChannels = comm->nChannels;
	proxyState->allocP2pNetLLBuffers = comm->allocP2pNetLLBuffers;
	proxyState->dmaBufSupport = comm->dmaBufSupport;
	proxyState->ncclNet = comm->ncclNet;
	proxyState->ncclCollNet = comm->ncclCollNet;
	memcpy(proxyState->buffSizes, comm->buffSizes, sizeof(comm->buffSizes));
	// 创建线程
	pthread_create(&comm->proxyState->thread, NULL, ncclProxyService, comm->proxyState);
}

2.1 ncclProxyService()

proxy 服务线程代码, 一个设备起一个 proxy 线程，线程名为 NCCL Service %rank。
线程主要做三件事：

建立TCP连接
根据每个卡的线程客户端命令 type 做事
关闭TCP连接

type 定义如下：

enum ncclProxyMsgType {
  ncclProxyMsgInit = 1,      // 建立 tcp 连接
  ncclProxyMsgSharedInit = 2, // 代理线程调用 ncclTransportComm 的 proxySharedInit
  ncclProxyMsgSetup = 3,   // 代理线程调用 ncclTransportComm 的 proxySetup
  ncclProxyMsgConnect = 4, // 代理线程调用 ncclTransportComm 的 proxyConnect
  ncclProxyMsgStart = 5,   // 还没用
  ncclProxyMsgClose = 6,   // 关闭 TCP 链接
  ncclProxyMsgAbort = 7,   // 还没用
  ncclProxyMsgStop = 8,   // 停用链接，如果所有链接都停用了，代理线程才退出
  ncclProxyMsgConvertFd = 9, // cuMem API support (UDS)
};

线程中主要的处理函数是 proxyServiceInitOp()

// 参数
args = comm->proxyState
void* ncclProxyService(void* _args) {
	struct ncclProxyState* proxyState =  (struct ncclProxyState*) _args;

	// Prepare poll descriptor
	struct ncclProxyConnectionPool connectionPool;
	connectionPool.pools = NULL;
	connectionPool.banks = 0;
	connectionPool.offset = NCCL_PROXY_CONN_POOL_SIZE;

	struct pollfd pollfds[NCCL_MAX_LOCAL_RANKS+1];
	struct ncclProxyLocalPeer peers[NCCL_MAX_LOCAL_RANKS];
	memset(&peers, 0, sizeof(struct ncclProxyLocalPeer)*NCCL_MAX_LOCAL_RANKS);
	for (int s=0; s<NCCL_MAX_LOCAL_RANKS; s++) {
		pollfds[s].fd = -1;
		pollfds[s].events = POLLHUP|POLLIN;
	}
	if (ncclSocketGetFd(proxyState->listenSock, &pollfds[NCCL_MAX_LOCAL_RANKS].fd) != ncclSuccess) {
		WARN("[Proxy Service] Get listenSock fd fails");
		return NULL;
	};
	// 监听输入
	pollfds[NCCL_MAX_LOCAL_RANKS].events = POLLIN;

	int maxnpeers = 0;
	int npeers = 0;
	int stop = 0;
	int asyncOpCount = 0;
	while (stop == 0 || (stop == 1 && npeers > 0)) {
		/* Even if local comm aborts, we cannot let proxy thread exit if we still have peer
			* connections. Need to wait until all other related comms call abort and safely exit
			* together, or we could face segmentation fault. */
		// 本地退出，不能推出线程，需要等其他 comms 也停止才能一起退出
		if (*proxyState->abortFlag != 0) stop = 1;
		/* never let proxy service thread blocks in poll, or it cannot receive abortFlag. */
		int ret;
		do {
			ret = poll(pollfds, NCCL_MAX_LOCAL_RANKS+1, asyncOpCount ? 0 : 500);
		} while (ret < 0 && errno == EINTR);
		if (ret < 0) {
			WARN("[Proxy Service] Poll failed: %s", strerror(errno));
			return NULL;
		}
		if (pollfds[NCCL_MAX_LOCAL_RANKS].revents) {
			int s = 0;
			while (s < NCCL_MAX_LOCAL_RANKS && pollfds[s].fd >= 0) s++;
			if (s == NCCL_MAX_LOCAL_RANKS) {
				WARN("[Proxy service] Too many connections (%d max)", NCCL_MAX_LOCAL_RANKS);
				return NULL;
			}
			if (maxnpeers < s+1) maxnpeers = s+1;
			// 初始化 socket
			if (ncclSocketInit(&peers[s].sock) != ncclSuccess) {
				WARN("[Service thread] Initialize peers[%d].sock fails", s);
				return NULL;
			}
			// accept
			if (ncclSocketAccept(&peers[s].sock, proxyState->listenSock) != ncclSuccess) {
				WARN("[Service thread] Accept failed %s", strerror(errno));
			} else {
				// 监听 fd 到 pollfds
				if (ncclSocketGetFd(&peers[s].sock, &pollfds[s].fd) != ncclSuccess) {
					WARN("[Service thread] Get peers[%d].sock fd fails", s);
					return NULL;
				}
				npeers++;
				peers[s].tpLocalRank = -1;
			}
		}
		for (int s=0; s<maxnpeers; s++) {
			struct ncclProxyLocalPeer* peer = peers+s;
			struct ncclSocket* sock = &peer->sock;
			int closeConn = 0;
			int type = 0;
			ncclResult_t res = ncclSuccess;
			if (pollfds[s].fd == -1) 
				continue;

			// Progress all ops for this ncclProxyLocalPeer
			ncclProxyAsyncOp* op = peer->asyncOps;
			while (op != nullptr) {
				ncclProxyAsyncOp* opnext = op->next; /* in case op is freed in proxyProgressAsync */
				type = op->type;
				res = proxyProgressAsync(op, proxyState, &asyncOpCount, peer, &connectionPool);
				if (res == ncclSuccess || res == ncclInProgress) {
					op = opnext;
				} else {
					// Res is a bad result
					closeConn = 1;
					WARN("[Service thread] Error encountered progressing operation=%s, res=%d, closing connection", ncclProxyMsgTypeStr[type], res);
					break;
				}
			}

			// Check for additional ops coming in
			// 检查是否有输入
			if (pollfds[s].revents & POLLIN) {
				int closed;
				// 先接收 Type
				res = ncclSocketTryRecv(sock, &type, sizeof(int), &closed, false /*blocking*/);
				if (res != ncclSuccess && res != ncclInProgress) {
					WARN("[Service thread] Could not receive type from localRank %d, res=%u, closed=%d", peer->tpLocalRank, res, closed);
					closeConn = 1;
				} else if (closed) {
					INFO(NCCL_INIT|NCCL_NET|NCCL_PROXY, "[Service thread] Connection closed by localRank %d", peer->tpLocalRank);
					closeConn = 1;
				} else if (res == ncclSuccess) { // We received something from the sock
					// 接收到数据，根据 type 做不同的动作
					if (type == ncclProxyMsgStop) {
					// 关闭连接
						stop = 1;
					closeConn = 1;
					} else if (type == ncclProxyMsgClose) {
					// 关闭连接
						closeConn = 1;
					} else if (proxyMatchOpType(type)) {
						// 处理客户端即设备的请求，根据 type 进行不同的处理
						res = proxyServiceInitOp(type, peers+s, &connectionPool, proxyState, &asyncOpCount);
					} else {
						// 关闭连接
						WARN("[Service thread] Unknown command %d from localRank %d", type, peer->tpLocalRank);
						closeConn = 1;
					}

					INFO(NCCL_PROXY, "Received and initiated operation=%s res=%d", ncclProxyMsgTypeStr[type], res);
				}
			} else if (pollfds[s].revents & POLLHUP) {
				// 关闭连接
				closeConn = 1;
			}

			if (res != ncclSuccess && res != ncclInProgress) {
				// 关闭连接
				WARN("[Proxy Service %d] Failed to execute operation %s from rank %d, retcode %d", proxyState->tpRank, ncclProxyMsgTypeStr[type], peer->tpRank, res);
				closeConn = 1;
			}

			if (closeConn) {
				// 关闭连接
				ncclSocketClose(sock);

				if (op != nullptr) {
					asyncProxyOpDequeue(peer, op);
					asyncOpCount--;
				}
				pollfds[s].fd = -1;
				npeers--;
			}
		}
	}

	// 退出操作
	// Wait for all operations to complete and stop progress thread before freeing any resource
	if (ncclProxyProgressDestroy(proxyState) != ncclSuccess) {
		WARN("[Proxy Service] proxyDestroy failed");
	}
	for (int s=0; s<maxnpeers; s++) {
		ncclSocketClose(&peers[s].sock);
	}
	ncclProxyFreeConnections(&connectionPool, proxyState);
	ncclSocketClose(proxyState->listenSock);
	free(proxyState->listenSock);
	proxyOpsFree(proxyState);
	return NULL;
}

2.2 proxyServiceInitOp()

线程中的主要处理函数，因为客户端发送数据的时候是先发什么后发什么的顺序，所以接收也先按一定的顺序接收数据，然后调用 proxyProgressAsync 进行处理；


// 本地 rank 的 proxyState
// peers 是保存在服务端的数据，数据保存的是客户端的信息
// peer 抽象的是客户端对象
res = proxyServiceInitOp(type, peers+s, &connectionPool, proxyState, &asyncOpCount);
static ncclResult_t proxyServiceInitOp(int type, struct ncclProxyLocalPeer* peer, struct ncclProxyConnectionPool* connectionPool, struct ncclProxyState* proxyState, int* asyncOpCount) {
	// 服务端 sock
	struct ncclSocket* sock = &peer->sock;
	// 申请内存
	struct ncclProxyAsyncOp* asyncOp;
	NCCLCHECK(ncclCalloc(&asyncOp, 1));

	asyncOp->type = type;
	// 按照客户端发送的顺序，接收各个字段
	// 接收 connection， 指向发送端 connection 对象的首地址
	NCCLCHECK(ncclSocketRecv(sock, &asyncOp->connection, sizeof(void*)));
	// 获取发送长度
	NCCLCHECK(ncclSocketRecv(sock, &asyncOp->reqSize, sizeof(int)));
	// 获取接收缓冲区大小
	NCCLCHECK(ncclSocketRecv(sock, &asyncOp->respSize, sizeof(int)));
	if (asyncOp->reqSize) {
		// 如果发送长度大于0，发送端会发送数据，接收端要接收数据
		// 先申请内存再接收数据
		NCCLCHECK(ncclCalloc(&asyncOp->reqBuff, asyncOp->reqSize));
		NCCLCHECK(ncclSocketRecv(sock, asyncOp->reqBuff, asyncOp->reqSize));
	}

	// Store opId for completion response
	// 接收发送端 opId 的首地址
	NCCLCHECK(ncclSocketRecv(sock, &asyncOp->opId, sizeof(asyncOp->opId)));

	// 如果发送端要接收数据，则接收数据大小大于0，服务端要申请发送缓冲区内存
	if (asyncOp->respSize) 
		NCCLCHECK(ncclCalloc(&asyncOp->respBuff, asyncOp->respSize));

	// 请求 asyncOp 加入peer 对象链表中 peer->asyncOps
	asyncProxyOpEnqueue(peer, asyncOp);

	(*asyncOpCount)++;
	// 处理请求
	NCCLCHECK(proxyProgressAsync(asyncOp, proxyState, asyncOpCount, peer, connectionPool));
	return ncclSuccess;
}

2.2 proxyProgressAsync()

处理请求函数，根据参数 type 进行不同的逻辑处理，然后按照一定的顺序返回数据

NCCLCHECK(proxyProgressAsync(asyncOp, proxyState, asyncOpCount, peer, connectionPool));
static ncclResult_t proxyProgressAsync(struct ncclProxyAsyncOp* op, struct ncclProxyState* proxyState, int* asyncOpCount, struct ncclProxyLocalPeer* peer, struct ncclProxyConnectionPool* connectionPool) {
	int done = 1;
	if (op->type == ncclProxyMsgSetup) {
		// 调用 proxy proxySetup API
		TRACE(NCCL_PROXY, "proxyProgressAsync::proxySetup() opId=%p", op->opId);
		NCCLCHECK(op->connection->tcomm->proxySetup(op->connection, proxyState, op->reqBuff, op->reqSize, op->respBuff, op->respSize, &done));
	} else if (op->type == ncclProxyMsgConnect) {
		// 调用 proxy proxyConnect API
		TRACE(NCCL_PROXY, "proxyProgressAsync::proxyConnect() opId=%p op.reqBuff=%p", op->opId, op->reqBuff);
		NCCLCHECK(op->connection->tcomm->proxyConnect(op->connection, proxyState, op->reqBuff, op->reqSize, op->respBuff, op->respSize, &done));
	} else if (op->type == ncclProxyMsgSharedInit) {
		int nChannels = (int) *op->reqBuff;
		// 调用 proxy proxySharedInit API
		TRACE(NCCL_PROXY, "proxyProgressAsync::ncclProxyMsgSharedInit opId=%p op.reqBuff=%p nChannels=%d", op->opId, op->reqBuff, nChannels);
		if (op->connection->tcomm->proxySharedInit) NCCLCHECK(op->connection->tcomm->proxySharedInit(op->connection, proxyState, nChannels));
		__atomic_store_n(&op->connection->state, connSharedInitialized, __ATOMIC_RELEASE);
	} else if (op->type == ncclProxyMsgConvertFd) {
		int fd = *(int *)op->reqBuff;
		TRACE(NCCL_PROXY, "proxyProgressAsync::ncclProxyMsgConvertFd opId=%p op.reqBuff=%p fd=%d", op->opId, op->reqBuff, fd);
		NCCLCHECK(proxyConvertFd(peer, op->opId, proxyState, fd)); // cuMem API support
	} else if (op->type == ncclProxyMsgInit) {
		// 
		TRACE(NCCL_PROXY, "proxyProgressAsync::ncclProxyMsgInit opId=%p op.reqBuff=%p", op->opId, op->reqBuff);
		NCCLCHECK(proxyConnInit(peer, connectionPool, proxyState, (ncclProxyInitReq*) op->reqBuff, (ncclProxyInitResp*) op->respBuff, &op->connection));
		static ncclResult_t proxyConnInit(struct ncclProxyLocalPeer* peer, struct ncclProxyConnectionPool* connectionPool, struct ncclProxyState* proxyState, ncclProxyInitReq* req, ncclProxyInitResp* resp, struct ncclProxyConnection** connection) 
		{
			int id;
			// 为 connectionPool-> pools 分配空间，
			// connectionPool->offset++
			// id = ((pool->banks-1) << NCCL_PROXY_CONN_POOL_SIZE_POW2) + pool->offset;
			// offset 为 (1 << 7) 个，为一个 bank
			NCCLCHECK(ncclProxyNewConnection(connectionPool, &id));
			// 根据 id 获取 bank 与 offset
			// 根据 bank与 offset 获取 ncclProxyConnection 首地址 connection
			NCCLCHECK(ncclProxyGetConnection(connectionPool, id, connection));
			// 填充 connection
			(*connection)->sock = &peer->sock;
			(*connection)->transport = req->transport;
			(*connection)->send = req->send;
			(*connection)->tpLocalRank = req->tpLocalRank;
			(*connection)->sameProcess = req->sameProcess;
			peer->tpLocalRank = req->tpLocalRank;
			peer->tpRank = req->tpRank;
			// connection 首地址给 resp->connection, 要告诉客户端
			resp->connection = *connection;

			(*connection)->tcomm = (*connection)->send ? &ncclTransports[(*connection)->transport]->send : &ncclTransports[(*connection)->transport]->recv;
			// If we need proxy progress, let's allocate ops and start the thread
			if ((*connection)->tcomm->proxyProgress) {
				NCCLCHECK(proxyProgressInit(proxyState));
				struct ncclProxyProgressState* state = &proxyState->progressState;
				strncpy(resp->devShmPath, state->opsPoolShmSuffix, sizeof(resp->devShmPath));
			}
			INFO(NCCL_NET|NCCL_PROXY, "New proxy %s connection %d from local rank %d, transport %d", (*connection)->send ? "send":"recv", id, (*connection)->tpLocalRank, (*connection)->transport);
			__atomic_store_n(&(*connection)->state, connInitialized, __ATOMIC_RELEASE);
			return ncclSuccess;
		}
	} else 
		return ncclInternalError;

	if (done) {
		INFO(NCCL_PROXY, "proxyProgressAsync opId=%p op.type=%d op.reqBuff=%p op.respSize=%d done", op->opId, op->type, op->reqBuff, op->respSize);
		if (op->type == ncclProxyMsgSetup)
			__atomic_store_n(&op->connection->state, connSetupDone, __ATOMIC_RELEASE);
		else if (op->type == ncclProxyMsgConnect)
			__atomic_store_n(&op->connection->state, connConnected, __ATOMIC_RELEASE);
		/* if setup or connect is done, we should not return any error at this point since
			* ncclSocketSend might already send the respBuff to the requester. If we still choose
			* to abort and close the connection, it can cause segfault if the requester is using
			* the respBuff. */

		// Send the opId for referencing async operation
		// 发送 opId
		NCCLCHECK(ncclSocketSend(op->connection->sock, &op->opId, sizeof(op->opId)));

		// Send the response size
		// 发送接收大小
		NCCLCHECK(ncclSocketSend(op->connection->sock, &op->respSize, sizeof(op->respSize)));

		if (op->respSize) {
			// Send the response
			// 发送响应
			NCCLCHECK(ncclSocketSend(op->connection->sock, op->respBuff, op->respSize));
		}
		// op 移出链表
		asyncProxyOpDequeue(peer, op);
		(*asyncOpCount)--;
		return ncclSuccess;

	} else if (*proxyState->abortFlag != 0) {
		return ncclInternalError;
	}

	return ncclInProgress;
}

4. ncclProxyConnect()

以其中链接为例：如果要使用代理，那么首先要先连接，通过 type 为 ncclProxyMsgInit 告诉代理，我要链接，代理线程会 accept 建立 socket, 返回连接的 ncclProxyConnection connection 对象的首地址

链接流程如下，主要关注数据传输，有的传数据，有的传首地址：


// p2p send connector
// rank GPU 设备连接 proxy TCP 服务端，服务端建立保存连接，申请通信所需的内存资源
struct ncclConnector* send
NCCLCHECK(ncclProxyConnect(comm, TRANSPORT_P2P, 1, tpProxyRank, &send->proxyConn));
ncclResult_t ncclProxyConnect(struct ncclComm* comm, int transport, int send, int tpProxyRank, struct ncclProxyConnector* proxyConn) {
  struct ncclSocket* sock;
  int ready, proxyRank = -1;
  struct ncclProxyState* sharedProxyState = comm->proxyState;

  // Keep one connection per mlocal rank
  for (int i = 0; i < comm->localRanks; ++i) {
    /* find the proxy rank in comm. */
    if (comm->topParentRanks[comm->localRankToRank[i]] == tpProxyRank) {
      proxyRank = comm->localRankToRank[i];
      break;
    }
  }
  proxyConn->sameProcess = comm->peerInfo[proxyRank].pidHash == comm->peerInfo[comm->rank].pidHash ? 1 : 0;
  // Keep one connection per local rank
  proxyConn->connection = NULL;
  proxyConn->tpRank = tpProxyRank;
  // peerSocks 初始化
  if (sharedProxyState->peerSocks == NULL) {
    NCCLCHECK(ncclCalloc(&sharedProxyState->peerSocks, comm->sharedRes->tpNLocalRanks));
    NCCLCHECK(ncclCalloc(&sharedProxyState->proxyOps, comm->sharedRes->tpNLocalRanks));
    NCCLCHECK(ncclCalloc(&sharedProxyState->sharedDevMems, comm->sharedRes->tpNLocalRanks));
    for (int i = 0; i < comm->sharedRes->tpNLocalRanks; ++i) {
      NCCLCHECK(ncclSocketSetFd(-1, &sharedProxyState->peerSocks[i]));
    }
  }

  proxyConn->tpLocalRank = comm->sharedRes->tpRankToLocalRank[proxyConn->tpRank];
  sock = sharedProxyState->peerSocks + proxyConn->tpLocalRank;
  NCCLCHECK(ncclSocketReady(sock, &ready));
  if (!ready) {
	// scoket 初始化 socket
    NCCLCHECK(ncclSocketInit(sock, sharedProxyState->peerAddresses+proxyConn->tpRank, comm->sharedRes->magic, ncclSocketTypeProxy, comm->abortFlag));
	// 连接 proxy 服务线程中监听的端口
    NCCLCHECK(ncclSocketConnect(sock));
  }

  struct ncclProxyInitReq req = {0};
  req.transport = transport;
  req.send = send;
  req.tpLocalRank = comm->topParentLocalRanks[comm->localRank];
  req.tpRank = comm->topParentRanks[comm->rank];
  req.sameProcess = proxyConn->sameProcess;

  struct ncclProxyInitResp resp = {0};
  // This usually sends proxyConn->connection to identify which connection this is.
  // However, this is part of the response and therefore is ignored
  // 收发消息初始化，proxy 服务端申请内存，建立连接
  NCCLCHECK(ncclProxyCallBlocking(comm, proxyConn, ncclProxyMsgInit, &req, sizeof(req), &resp, sizeof(resp)));
  // resp.connection 为服务端的 connection 对象的首地址
  proxyConn->connection = resp.connection;

  // If we need proxy progress, map progress ops
  struct ncclTransportComm* tcomm = send ? &ncclTransports[transport]->send : &ncclTransports[transport]->recv;
  if (tcomm->proxyProgress) {
    char poolPath[] = "/dev/shm/nccl-XXXXXX";
    strncpy(poolPath+sizeof("/dev/shm/nccl-")-1, resp.devShmPath, sizeof("XXXXXX")-1);
    struct ncclProxyOps* proxyOps = sharedProxyState->proxyOps + proxyConn->tpLocalRank;
    if (proxyOps->pool == NULL) {
      NCCLCHECK(ncclShmOpen(poolPath, sizeof(struct ncclProxyOpsPool), (void**)(&proxyOps->pool), NULL, 0, &proxyOps->handle));
      proxyOps->nextOps = proxyOps->nextOpsEnd = proxyOps->freeOp = -1;
    }
  }
  INFO(NCCL_NET|NCCL_PROXY, "Connection to proxy localRank %d -> connection %p", proxyConn->tpLocalRank, proxyConn->connection);
  return ncclSuccess;
}

4.1 ncclProxyCallBlocking()

调用代理线程接口，即开始发送命令，接收返回。

// 客户端通知 proxy 服务端调用响应接口，服务端根据 type 做不同的处理
// ncclProxyMsgInit 表示服务端进行通信初始化
NCCLCHECK(ncclProxyCallBlocking(comm, proxyConn, ncclProxyMsgInit, &req, sizeof(req), &resp, sizeof(resp)));
ncclResult_t ncclProxyCallBlocking(struct ncclComm* comm, struct ncclProxyConnector* proxyConn, int type, void* reqBuff, int reqSize, void* respBuff, int respSize) {
	// Alloc some memory to act as a handle
	ncclResult_t res = ncclSuccess;
	void* opId = malloc(1);
	// ncclProxyCallAsync()
	// 首先发送 type
	// 再发送 proxyConn->connection 的首地址
	// 发送 reqSize
	// 发送 respSize
	// 如果 reqSize 大于0，说明有发送数据，即发送数据
	// 发送 opId 的首地址
	NCCLCHECKGOTO(ncclProxyCallAsync(comm, proxyConn, type, reqBuff, reqSize, respSize, opId), res, fail);
		struct ncclProxyState* sharedProxyState = comm->proxyState;
		sock = sharedProxyState->peerSocks + proxyConn->tpLocalRank;
		// 将当前 请求放入 state 的链表中  state->expectedResponses;
		NCCLCHECK(expectedProxyResponseEnqueue(sharedProxyState, opId, respSize));
		{
			struct ncclExpectedProxyResponse* ex;
			NCCLCHECK(ncclCalloc(&ex, 1));
			ex->opId = opId;

			// Pre-alloc response buffer
			ex->respBuff = malloc(respSize);
			ex->respSize = respSize;
			ex->done     = false;
			struct ncclExpectedProxyResponse* list = state->expectedResponses;
			if (list == NULL) {
				state->expectedResponses = ex;
				return ncclSuccess;
			}
			while (list->next) list = list->next;
			list->next = ex;
		}
		    

	do {
		res = ncclPollProxyResponse(comm, proxyConn, respBuff, opId);
		{
			int found = 0;
			// 如果 opId 在链表中找到，且 done 字段已被置为 True, 则拷贝数据到 respBuff， found 置 1
  			NCCLCHECK(expectedProxyResponseDequeue(sharedProxyState, opId, respBuff, &found));
		}
	} while (res == ncclInProgress);

exit:
	free(opId);
	return res;
fail:
	goto exit;
}

4.2 ncclPollProxyResponse()

发送的时候有 opId 作为此次通信的标识，代理线程返回数据时也会把这个opId带回来
所以接受的时候要比较 opId, 如果与本次发送的 opId 一样，那么就接收成功；
如果不一样，那么把接受的数据放入缓冲区，继续接收

// 轮询等待 opId 的返回数据
res = ncclPollProxyResponse(comm, proxyConn, respBuff, opId);
ncclResult_t ncclPollProxyResponse(struct ncclComm* comm, struct ncclProxyConnector* proxyConn, void* respBuff, void* opId) {
	struct ncclProxyState* sharedProxyState = comm->proxyState;
	// Receive the connection pointer from the Proxy
	// 检查停止字段
	if (*comm->abortFlag) {
		WARN("Comm %p is in abort state", comm);
		return ncclInternalError;
	}

	if (sharedProxyState->peerSocks == NULL) 
		return ncclInternalError;

	// Check response queue
	int found = 0;
	// 如果 opId 在链表中找到，且 done 字段已被置为 True, 则拷贝数据到 respBuff， found 置 1
	NCCLCHECK(expectedProxyResponseDequeue(sharedProxyState, opId, respBuff, &found));
	if (found == 0) {
		// 发送完之后，还没收到回复，虽然有 opId, 但是 done 字段仍为 False, 所以 found == 0
		// Attempt to read in a new response header from the proxy thread
		// 对于没有父节点的 comm来说，tpLocalRank 就是 comm->localrank
		// 获取发送端的 socket
		struct ncclSocket* sock = sharedProxyState->peerSocks + proxyConn->tpLocalRank;

		void* recvOpId;
		int offset = 0;
		// 接收数据，先接受 opId
		if (ncclSuccess != ncclSocketProgress(NCCL_SOCKET_RECV, sock, &recvOpId, sizeof(recvOpId), &offset)) {
			WARN("Socket recv failed while polling for opId=%p", opId);
			return ncclInternalError;
		}

		// 确保接收全部数据， offset == 0 返回 ncclInProgress 继续接收数据
		if (offset == 0) {
			return ncclInProgress;
		// If we've returned a partial response, block to receive the rest of it
		} else if (offset < sizeof(recvOpId)) {
			while (offset < sizeof(recvOpId))
			NCCLCHECK(ncclSocketProgress(NCCL_SOCKET_RECV, sock, &recvOpId, sizeof(recvOpId), &offset));
		}

		INFO(NCCL_PROXY, "ncclPollProxyResponse Received new opId=%p", recvOpId);

		// Now do a blocking recv of the response size
		int respSize = 0;
		// 接收返回数据的大小
		NCCLCHECK(ncclSocketRecv(sock, &respSize, sizeof(respSize)));

		// If there's a respSize to recv
		if (respSize > 0) {
			// 有返回数据
			if (recvOpId != opId) {
				// Unexpected response, need to buffer the socket data
				// 对于意想不到的 opId, 申请内存保存数据
				respBuff = malloc(respSize);
			}
			assert(respBuff != NULL);
			// 接收返回的数据
			NCCLCHECK(ncclSocketRecv(sock, respBuff, respSize));
		}

		if (recvOpId == opId) {
			// 如果已经接收了 opId 的数据，则在 state->expectedResponses 链表中移除 opId 相对应的项
			INFO(NCCL_PROXY, "recvOpId=%p matches expected opId=%p", recvOpId, opId);
			NCCLCHECK(expectedProxyResponseRemove(sharedProxyState, recvOpId));
			// 返回成功
			return ncclSuccess;
		} else {
			INFO(NCCL_PROXY, "Queuing opId=%p respBuff=%p respSize=%d", recvOpId, respBuff, respSize);
			// Store the result and mark response as completed
			// 如果接收的是其他 opId 的数据，则拷贝数据到缓冲区，并置 elem->done 为 True
			NCCLCHECK(expectedProxyResponseStore(sharedProxyState, recvOpId, respBuff, respSize));
			// 返回，继续处理接收数据
			return ncclInProgress;
		}
	} else {
		INFO(NCCL_PROXY, "ncclPollProxyResponse Dequeued cached opId=%p", opId);
	}

	return ncclSuccess;
}

你可能感兴趣的:(NCCL,NCCL,Linux,nvidia)

Ubuntu 安装 Cursor 编辑器火火Yu ubuntu linux
1.编辑安装脚本,命名为：install_cursor.sh#!/bin/bashinstallCursor(){if![-f/opt/cursor.appimage];thenecho"InstallingCursorAIIDE..."#URLsforCursorAppImageandIconCURSOR_URL="https://downloader.cursor.sh/linux/appIm
拥抱Linux Mint，安装迅雷和微信 zhqh100 linux 运维服务器
迅雷的下载地址http://archive.kylinos.cn/kylin/partner/pool/com.xunlei.download_1.0.0.1_amd64.debLinuxMint自带的Transmission今天下载速度还可以，几兆的速度，挺满意的微信的下载地址https://linux.weixin.qq.com/搜狗拼音输入法虽然有官网，但官网最后说是支持Ubuntu20.0
在 Ubuntu 18.04 环境下通过 qemu 运行 aarch64 linux 内核古道上的西风与瘦马 linux
1.1Ubuntu环境(Ubuntu18.04LTS)$lsb_release-aNoLSBmodulesareavailable.DistributorID:UbuntuDescription:Ubuntu18.04LTSRelease:18.04Codename:bionic1.2安装基础软件sudoaptupdatesudoaptinstallflexbisonlibncurses5-dev
假如你从现在开始学习软件测试，需要多久才能学会呢？ AIZHINAN 学习
首先，不要去网上找那些零零碎碎的教程，很难学懂！你可以根据这个学习大纲定计划只要3-6个月就可以掌握软件测试，升职涨薪不在话下：1.基础阶段：先搞懂测试理论、用例设计，会用Jira写Bug；2.中级阶段：学SQL查数据、Linux看日志，Postman测接口，再用Selenium玩自动化；3.进阶阶段：搭Pytest框架、用JMeter压测，安全测试搞BurpSuite；4.扩展技能：Python
CentOS DHCP服务器部署指南
title:DHCP服务器部署以及配置search:2024-03-21tags:“#DHCP服务器部署以及配置”CentOSDHCP服务器部署指南背景：因上了Linux的实验课程，在课程中，老师要求我们自己搭建DHCP服务器构建局域网，在构建的时候问题百出，不过也极其有意思一、补充网络基本概念（了解的可以直接跳过）IP地址：通俗来讲，我认为IP就是相当于在互联网的身份证，是用来标识自己在互联网上
CentOS 入门必备基础知识与操作指南码上有潜 linux centos linux 运维
标题：CentOS入门必备基础知识与操作指南简介CentOS是基于RedHatEnterpriseLinux(RHEL)的社区版本，适合企业级服务器的稳定性和安全性要求。本文将带领你了解CentOS的基础知识、安装过程、常用命令以及一些常见的运维操作。1.什么是CentOS？定义：CentOS是一个开放源代码的企业级操作系统，免费提供，但与RHEL完全兼容。优势：稳定、安全、适合长时间运行的服务器
Linux 命令：cd hweiyu00 Linux命令 linux 运维
Linuxcd命令详细教程一、cd命令概述cd是Linux系统中用于切换工作目录的核心命令，全称“changedirectory”。它是文件导航的基础工具，通过绝对路径、相对路径或特殊符号，可快速在文件系统中移动，掌握其用法是Linux操作的必备技能。资料已经分类整理好：https://pan.quark.cn/s/26d73f7dd8a7二、cd命令基本语法cd[目标目录]核心参数说明：目标目录
git使用详解和示例点云SLAM 开发环境 git 代码工具代码管理 git学习服务器
什么是Git？Git是一个分布式版本控制系统（DVCS），用于跟踪文件的变化，协调多人协作开发。由LinusTorvalds开发，用于管理Linux内核代码。Git的核心概念名称说明工作区(WorkingDirectory)你看到的项目目录。你在这里新增、编辑、删除文件。暂存区(StagingArea/Index)暂时保存将要提交的修改（gitadd的作用）。本地仓库(LocalRepositor
linux驱动开发（20）-DMA（四） yyc_audio linux驱动开发驱动开发 linux 服务器
分散/聚集映射分散/聚集映射通过将虚拟地址上分散的DMA缓冲区通过一个类型为structscatterlist的数组或者链表组织起来，然后通过一次的DMA传输操作在主存RAM与设备之间传输数据，如图所示：图中显示了主存中三个分散的物理页面与设备之间进行的一次DMA传输时分散/聚集映射示意，其中单个物理页面与设备之间可以看做是一个单一的流式映射，每个这样的单一映射在内核中有数据结构structsca
Git安装全攻略：避坑指南与最佳实践编程在手天下我有 git
1、系统环境检查确认操作系统版本（Windows/macOS/Linux）及位数（32/64位）检查是否已安装旧版Git，避免版本冲突确保系统环境变量配置权限2、下载安装包注意事项官方下载地址推荐（避免第三方镜像源）选择与系统匹配的安装包类型（如Windows选.exe，macOS选.dmg）验证安装包完整性（校验SHA值或数字签名）3、安装过程中的关键选项路径选择：避免中文或特殊字符路径组件选择
Linux应用开发实验班——JSON-RPC JiaH求学嵌入式 Linux应用开发 json linux 驱动开发
目录前言1.是什么JSON-RPC2.常用的JSON函数1.创建JSON2.根据名字获取JSON3.获取JSON的值4.删除JSON3.如何进行远程调用服务器客户端4.基于JSON-RPC进行硬件操作课程链接前言学习的课程是百问网韦东山老师的课程，对更详细步骤感兴趣的同学，可以去学习视频课程。代码里的led和dht11的驱动都是学习韦老师的课程写的。1.是什么JSON-RPCJSON（JavaSc
COLT_CMDB_linux_userInfo_20250508.sh修复历史脚本输出指标信息中userName与输出信息不一致问题
#!/bin/bash#IT_BEGIN#IT_TYPE=3#ITSYSTEM_LINUX_AGENTUSERDISCOVER|discovery.user[disc]#原型指标#IT_RULESYSTEM_LINUX_AGENTUSERGROUPID|groupId[{#USERNAME}]#IT_RULESYSTEM_LINUX_AGENTUSERHOME|userHome[{#USERNAM
Linux 面试知识（附常见命令）笑衬人心。 linux 运维服务器
目录结构与重要文件Linux中一切皆文件，掌握目录结构有助于理解系统管理与配置。目录说明/根目录，所有文件起点/bin基本命令的可执行文件，如ls,cp/sbin系统管理员用的命令，如shutdown/etc配置文件目录，如/etc/passwd/home普通用户的主目录/root超级用户的主目录/dev设备文件，如磁盘/dev/sda/var可变数据，如日志/var/log/tmp临时文件目录/
Git 学习笔记笑衬人心。 git 学习笔记
Git简介Git是一个分布式版本控制系统，用于跟踪文件更改，协作开发软件项目。特点：分布式：每个开发者本地都有完整仓库。高效：分支和合并操作快速。安全：数据通过哈希存储，不易被篡改。安装GitWindows:下载地址：https://git-scm.com/安装后可使用GitBash。macOS:brewinstallgitLinux:sudoaptupdatesudoaptinstallgitG
arm交叉编译qt应用中含opengl问题解决 m0_55576290 青泥何盘盘 qt arm开发 qt 开发语言
问题是采用正点原子方案中，用虚拟机交叉编译含opengl的qt程序会出现编译失败问题，因为正点原子中的交叉编译qt源码时没有编opengl。野火似乎有解决：https://doc.embedfire.com/linux/rk356x/Qt/zh/latest/lubancat_qt/install/install_arm_2.html
Linux 网络设置(ifconfig、route、traceroute、netstat、ss、nslookup、dig、ping状态返回分析）
Linux网络设置一、查看网络1、查看网络接口地址2、查看更改主机名3、查看路由表条目4、查看网络连接情况netstat命令ss命令二、测试网络连接1、测试网络连通性2、跟踪数据包的路由途径3、测试DNS域名解析nslookup命令dig命令三、使用网络配置命令1、临时配置和永久配置的解释2、修改网卡的地址、状态2、添加、删除静态路由与默认路由记录四、修改网络配置文件1、网络接口配置文件2、启用、
linux ARM64架构下进程切换核心代码分析
一、概述‌阶段‌‌核心代码/函数‌‌ARM64实现细节‌‌相关数据结构‌‌作用‌‌调度入口‌__schedule()调用context_switch()完成实际切换16structrq触发调度流程，选择下一个运行进程‌地址空间切换‌switch_mm_irqs_off()通过ttbr0_el1寄存器更新进程页表基址（PGD）3，处理ASID和TLB失效410structmm_struct（含pgd
linux查看内存泄露工具,linux 内存泄露检测工具陈濯濯 linux查看内存泄露工具
ValgrindMemcheck一个强大开源的程序检测工具安装cdvalgrind./autogen.sh./configuremakemakeinstall使用1、编译你的程序debug版本./TestMem2、执行：valgrind--tool=memcheck--leak-check=full--log-file=./log.txt./TestMemmtraceGNU扩展，用来跟踪mallo
ifconfig工具源码分析 weixin_34357887 操作系统数据结构与算法
ifconfig是linux中用于显示或配置网络设备（网络接口卡）的命令，英文全称是networkinterfacesconfiguring。同netstat一样，ifconfig源码也位于net-tools中。源码位于net-tools工具包中，这是linux网络的基本工具包，此外还有arp,hostname,route等命令。项目链接：http://net-tools.sourceforge.
kali换源 william️_Aaron 命令 Linux 服务器运维 linux ubuntu
在KaliLinux中切换软件源可以提高软件下载速度，下面为你介绍切换源的方法。一、备份原配置文件首先备份原配置文件，避免操作失误导致问题：sudocp/etc/apt/sources.list/etc/apt/sources.list.bak二、编辑源配置文件使用以下命令编辑源配置文件：sudonano/etc/apt/sources.list三、添加国内镜像源在打开的文件中添加以下镜像源地址，
C程序中根据ifconfig命令的过程获取Linux本地ip Mar` linux ifreq ifconfig
Ifreq结构用来配置ip地址、获取ip、获取MTU等等关于网卡的信息，Linux下可以使用ioctl()函数以及结构体structifreq来获取网卡的各种信息。在Linux系统中获取IP地址通常都是通过ifconfig命令来实现的，然而ifconfig命令实际是通过ioctl接口与内核通信。ioctl(Sock,SIOCGIFCONF,&ifr)，第二个参数则对应下面的功能。SIOCSIFBR
前端领域Node.js环境搭建详细教程前端视界前端艺匠馆前端 node.js ai
前端领域Node.js环境搭建详细教程关键词：Node.js、环境搭建、NPM、版本管理、前端开发摘要：本文是面向前端开发者的Node.js环境搭建全流程指南。我们将从Node.js的核心作用讲起，用“快递站”“魔法翻译机”等生活比喻拆解技术概念，逐步讲解Windows/macOS/Linux三平台的安装步骤、版本管理工具nvm的使用、镜像源配置技巧，最后通过一个“静态网页服务器”实战案例验证环境
mac可以安装linux的nginx吗,mac 下nginx安装及使用，macnginx安装使用 nlp小白菜
mac下nginx安装及使用，macnginx安装使用安装nginx1、brewsearchnginx2、brewinstallnginx启动nginx，sudonginx;访问localhost:8080发现已出现nginx的欢迎页面了。备注：ln-s/usr/local/sbin/nginx/usr/bin/nginx做个软连接。常用的指令有：nginx-V查看版本，以及配置文件地址nginx
Linux离线安装mysql 为你奋斗！开发环境软件安装 android adb
Linux离线安装mysql(centos7)1、下载MySQL2、使用xshell文件传输工具，上传到Linux服务器3、安装前准备4、安装MySQL5、Navicat连接测试1、下载MySQLhttps://dev.mysql.com/downloads/mysql/5.7.html#downloads2、使用xshell文件传输工具，上传到Linux服务器在/usr/local文件夹下创建m
linux下用ffmpeg测试nvidia 显示驱动是否安装成功的脚本谢平康 linux ffmpeg 运维
#!/bin/bash#生成一个10秒用户来测试的文件ffmpeg-y-flavfi-itestsrc=duration=10:size=1280x720:rate=30\-flavfi-isine=frequency=1000:duration=10\-c:vlibx264-presetfast-crf23\-c:aaac-b:a192k\input.mp4#测试输入文件（可替换为你自己的视频文
【4.23号更新，docker可用镜像源】2025最新 Docker 国内可用镜像源仓库地址尤物程序猿 docker 容器运维
好久没用docker突然镜像源不能用了，好像是国外封了好多。今天从网上找了可以用的，装载于猫头虎分享：2025最新Docker国内可用镜像源仓库地址（01月01日更新）-腾讯云开发者社区-腾讯云源地址适用于linux系统对于Linux用户，需要手动修改Docker的配置文件来添加镜像源：使用编辑器打开配置文件/etc/docker/daemon.json（如果没有该文件，可以新建一个）。将以下内容
Linux 文件权限管理详解（chmod/chown）真IT布道者 linux 运维服务器
查看文件权限2.1使用ls-l命令$ls-l/etc/passwd-rw-r--r--1rootroot2412Mar110:00/etc/passwd输出解析：-rw-r--r--：权限字符串第一个root：属主第二个root：属组2.2权限字符串解析类型属主权限属组权限其他用户权限-rw-r--r--更多面试题：https://duoke360.com/tutorial/iv-linux/l7
SmartSoftHelp NetCoreApi+MySQL/Oracle/SqlServer 部署Windows/Linux--深度优化版：SmartSoftHelp DeepCore XSuite SmartSoftHelp魔法精灵工作室优化安全科技 mysql oracle sqlserver
NetCoreAPI优势明显：SmartSofHelp菜单之Net9API智能微代码(SmartNetCoreAIDeep)NetCoreAPI与数据库组合在Linux/Windows部署的深度分析一、跨平台部署基础架构对比组合类型Linux部署方案Windows部署方案NetCoreAPI+MySQLDocker+MySQLDockerImageIIS+MySQLInstaller(MSI)Ne
探秘 SELinux Notebook：安全增强的利器与实践指南纪亚钧
探秘SELinuxNotebook：安全增强的利器与实践指南去发现同类优质开源项目:https://gitcode.com/在网络安全日益重要的今天，SELinux（Security-EnhancedLinux）作为Linux内核的安全模块，提供了强制访问控制机制，极大地增强了系统的安全性。而SELinuxNotebook则是为了帮助用户更好地理解和使用SELinux的一款交互式教程平台。它将复杂
SELinux 从理论到实践：深入解析与实战指南智驾 Linux SELinux TEE Linux 安全启动
文章目录引言：为什么需要SELinux？第一部分：SELinux核心理论1.1SELinux的三大核心模型1.2安全上下文（SecurityContext）1.3策略语言与模块化第二部分：实战操作指南2.1SELinux状态管理2.2文件上下文管理2.3服务配置与排错第三部分：高级技巧与最佳实践3.1自定义策略模块开发3.2常见问题与解决方案总结：SELinux的价值与学习路径参考引言：为什么需要
HQL之投影查询归来朝歌 HQL Hibernate 查询语句投影查询
在HQL查询中，常常面临这样一个场景，对于多表查询，是要将一个表的对象查出来还是要只需要每个表中的几个字段，最后放在一起显示？针对上面的场景，如果需要将一个对象查出来： HQL语句写“from 对象”即可 Session session = HibernateUtil.openSession();
Spring整合redis bylijinnan redis
pom.xml <dependencies>  <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-redi
org.hibernate.NonUniqueResultException: query did not return a unique result: 2 0624chenhong Hibernate
参考：http://blog.csdn.net/qingfeilee/article/details/7052736 org.hibernate.NonUniqueResultException: query did not return a unique result: 2 在项目中出现了org.hiber
android动画效果不懂事的小屁孩 android动画
前几天弄alertdialog和popupwindow的时候，用到了android的动画效果，今天专门研究了一下关于android的动画效果，列出来，方便以后使用。 Android 平台提供了两类动画。一类是Tween动画，就是对场景里的对象不断的进行图像变化来产生动画效果（旋转、平移、放缩和渐变）。第二类就是 Frame动画，即顺序的播放事先做好的图像，与gif图片原理类似。
js delete 删除机理以及它的内存泄露问题的解决方案换个号韩国红果果 JavaScript
delete删除属性时只是解除了属性与对象的绑定，故当属性值为一个对象时，删除时会造成内存泄露（其实还未删除）举例： var person={name:{firstname:'bob'}} var p=person.name delete person.name p.firstname -->'bob' // 依然可以访问p.firstname，存在内存泄露
Oracle将零干预分析加入网络即服务计划蓝儿唯美 oracle
由Oracle通信技术部门主导的演示项目并没有在本月较早前法国南斯举行的行业集团TM论坛大会中获得嘉奖。但是，Oracle通信官员解雇致力于打造一个支持零干预分配和编制功能的网络即服务（NaaS）平台，帮助企业以更灵活和更适合云的方式实现通信服务提供商（CSP）的连接产品。这个Oracle主导的项目属于TM Forum Live!活动上展示的Catalyst计划的19个项目之一。Catalyst计
spring学习——springmvc（二） a-john springMVC
Spring MVC提供了非常方便的文件上传功能。 1，配置Spring支持文件上传： DispatcherServlet本身并不知道如何处理multipart的表单数据，需要一个multipart解析器把POST请求的multipart数据中抽取出来，这样DispatcherServlet就能将其传递给我们的控制器了。为了在Spring中注册multipart解析器，需要声明一个实现了Mul
POJ-2828-Buy Tickets aijuans ACM_POJ
POJ-2828-Buy Tickets http://poj.org/problem?id=2828 线段树，逆序插入 #include<iostream>#include<cstdio>#include<cstring>#include<cstdlib>using namespace std;#define N 200010struct
Java Ant build.xml详解 asia007 build.xml
1,什么是antant是构建工具2,什么是构建概念到处可查到，形象来说，你要把代码从某个地方拿来，编译，再拷贝到某个地方去等等操作，当然不仅与此，但是主要用来干这个3,ant的好处跨平台 --因为ant是使用java实现的，所以它跨平台使用简单--与ant的兄弟make比起来语法清晰--同样是和make相比功能强大--ant能做的事情很多，可能你用了很久，你仍然不知道它能有
android按钮监听器的四种技术百合不是茶 android xml配置监听器实现接口
android开发中经常会用到各种各样的监听器,android监听器的写法与java又有不同的地方; 1,activity中使用内部类实现接口 ,创建内部类实例使用add方法与java类似创建监听器的实例 myLis lis = new myLis(); 使用add方法给按钮添加监听器
软件架构师不等同于资深程序员 bijian1013 程序员架构师架构设计
本文的作者Armel Nene是ETAPIX Global公司的首席架构师，他居住在伦敦，他参与过的开源项目包括 Apache Lucene,，Apache Nutch， Liferay 和 Pentaho等。如今很多的公司
TeamForge Wiki Syntax & CollabNet User Information Center sunjing TeamForge How do Attachement Anchor Wiki Syntax
the CollabNet user information center http://help.collab.net/ How do I create a new Wiki page? A CollabNet TeamForge project can have any number of Wiki pages. All Wiki pages are linked, and
【Redis四】Redis数据类型 bit1129 redis
概述 Redis是一个高性能的数据结构服务器，称之为数据结构服务器的原因是，它提供了丰富的数据类型以满足不同的应用场景，本文对Redis的数据类型以及对这些类型可能的操作进行总结。 Redis常用的数据类型包括string、set、list、hash以及sorted set.Redis本身是K/V系统，这里的数据类型指的是value的类型，而不是key的类型，key的类型只有一种即string
SSH2整合-附源码白糖_ eclipse spring tomcat Hibernate Google
今天用eclipse终于整合出了struts2+hibernate+spring框架。我创建的是tomcat项目，需要有tomcat插件。导入项目以后，鼠标右键选择属性，然后再找到“tomcat”项，勾选一下“Is a tomcat project”即可。具体方法见源码里的jsp图片，sql也在源码里。补充1：项目中部分jar包不是最新版的，可能导
[转]开源项目代码的学习方法 braveCS 学习方法
转自： http://blog.sina.com.cn/s/blog_693458530100lk5m.html http://www.cnblogs.com/west-link/archive/2011/06/07/2074466.html 1）阅读features。以此来搞清楚该项目有哪些特性2）思考。想想如果自己来做有这些features的项目该如何构架3）下载并安装d
编程之美-子数组的最大和（二维） bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; import java.util.Random; public class MaxSubArraySum2 { /** * 编程之美子数组之和的最大值（二维） */ private static final int ROW = 5; private stat
读书笔记-3 chengxuyuancsdn jquery笔记 resultMap配置 ibatis一对多配置
1、resultMap配置 2、ibatis一对多配置 3、jquery笔记 1、resultMap配置当<select resultMap="topic_data"> <resultMap id="topic_data">必须一一对应。 (1)<resultMap class="tblTopic&q
[物理与天文]物理学新进展 comsci
如果我们必须获得某种地球上没有的矿石,才能够进行某些能量输出装置的设计和建造,而要获得这种矿石,又必须首先进行深空探测,而要进行深空探测,又必须获得这种能量输出装置,这个矛盾的循环,会导致地球联盟在与宇宙文明建立关系的时候,陷入困境怎么办呢?
Oracle 11g新特性:Automatic Diagnostic Repository daizj oracle ADR
Oracle Database 11g的FDI（Fault Diagnosability Infrastructure）是自动化诊断方面的又一增强。 FDI的一个关键组件是自动诊断库（Automatic Diagnostic Repository-ADR）。在oracle 11g中，alert文件的信息是以xml的文件格式存在的，另外提供了普通文本格式的alert文件。这两份log文
简单排序:选择排序 dieslrae 选择排序
public void selectSort(int[] array){ int select; for(int i=0;i<array.length;i++){ select = i; for(int k=i+1;k<array.leng
C语言学习六指针的经典程序，互换两个数字 dcj3sjt126com c
示例程序，swap_1和swap_2都是错误的，推理从1开始推到2，2没完成，推到3就完成了 # include <stdio.h> void swap_1(int, int); void swap_2(int *, int *); void swap_3(int *, int *); int main(void) { int a = 3; int b =
php 5.4中php-fpm 的重启、终止操作命令 dcj3sjt126com PHP
php 5.4中php-fpm 的重启、终止操作命令: 查看php运行目录命令：which php/usr/bin/php 查看php-fpm进程数：ps aux | grep -c php-fpm 查看运行内存/usr/bin/php -i|grep mem 重启php-fpm/etc/init.d/php-fpm restart 在phpinfo()输出内容可以看到php
线程同步工具类 shuizhaosi888 同步工具类
同步工具类包括信号量（Semaphore）、栅栏（barrier）、闭锁（CountDownLatch）闭锁（CountDownLatch） public class RunMain { public long timeTasks(int nThreads, final Runnable task) throws InterruptedException { fin
bleeding edge是什么意思 haojinghua DI
不止一次，看到很多讲技术的文章里面出现过这个词语。今天终于弄懂了——通过朋友给的浏览软件，上了wiki。我再一次感到，没有辞典能像WiKi一样，给出这样体贴人心、一清二楚的解释了。为了表达我对WiKi的喜爱，只好在此一一中英对照，给大家上次课。 In computer science, bleeding edge is a term that
c中实现utf8和gbk的互转 jimmee c iconv utf8&gbk编码
#include <iconv.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <sys/stat.h> int code_c
大型分布式网站架构设计与实践 lilin530 应用服务器搜索引擎
1.大型网站软件系统的特点？ a.高并发，大流量。 b.高可用。 c.海量数据。 d.用户分布广泛，网络情况复杂。 e.安全环境恶劣。 f.需求快速变更，发布频繁。 g.渐进式发展。 2.大型网站架构演化发展历程？ a.初始阶段的网站架构。应用程序，数据库，文件等所有的资源都在一台服务器上。 b.应用服务器和数据服务器分离。 c.使用缓存改善网站性能。 d.使用应用
在代码中获取Android theme中的attr属性值 OliveExcel android theme
Android的Theme是由各种attr组合而成, 每个attr对应了这个属性的一个引用, 这个引用又可以是各种东西. 在某些情况下, 我们需要获取非自定义的主题下某个属性的内容 (比如拿到系统默认的配色colorAccent), 操作方式举例一则: int defaultColor = 0xFF000000; int[] attrsArray = { andorid.r.
基于Zookeeper的分布式共享锁 roadrunners zookeeper 分布式共享锁
首先，说说我们的场景，订单服务是做成集群的，当两个以上结点同时收到一个相同订单的创建指令，这时并发就产生了，系统就会重复创建订单。等等......场景。这时，分布式共享锁就闪亮登场了。共享锁在同一个进程中是很容易实现的，但在跨进程或者在不同Server之间就不好实现了。Zookeeper就很容易实现。具体的实现原理官网和其它网站也有翻译，这里就不在赘述了。官
两个容易被忽略的MySQL知识 tomcat_oracle mysql
1、varchar(5)可以存储多少个汉字，多少个字母数字？　　相信有好多人应该跟我一样，对这个已经很熟悉了，根据经验我们能很快的做出决定，比如说用varchar(200)去存储url等等，但是，即使你用了很多次也很熟悉了，也有可能对上面的问题做出错误的回答。　　这个问题我查了好多资料，有的人说是可以存储5个字符，2.5个汉字（每个汉字占用两个字节的话），有的人说这个要区分版本，5.0
zoj 3827 Information Entropy(水题) 阿尔萨斯 format
题目链接：zoj 3827 Information Entropy 题目大意：三种底，计算和。解题思路：调用库函数就可以直接算了，不过要注意Pi = 0的时候，不过它题目里居然也讲了。。。limp→0+plogb(p)=0，因为p是logp的高阶。 #include <cstdio> #include <cstring> #include <cmath&