荒-于嬉

Hadoop HA高可用部署

文章目录

Hadoop HA高可用安装
- 主机基础环境配置
- 安装JAVA环境
- 安装zookeeper
- 安装Hadoop
- - 开始修改配置
  - 开始启动
  - 页面访问
  - 测试
可能存在的其他疑惑
- 1. edits日志数量特别多
- 2. 高可用状态下主备切换有问题
- 3. 阿里云的emr集群配置

Hadoop HA高可用安装

此方案注意问题

hdfs-site.xml文件中的dfs.ha.fencing.methods参数为shell而非sshfence.因为sshfence存在主节点所在主机宕机(主机宕机而非停止服务)无法切换问题.但是百度到的大部分Hadoop HA相关文章都是使用的sshfence方式.

官方参考文档:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

官网原文:The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service’s TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Thus, one must also configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files. For example:

当active的namenode连接失败后,standby会去ssh到之前active的namenode节点,然后再次杀一遍进程.

问题在于,如果之前active的namenode节点挂掉了,standby的节点ssh不到这个节点,也就无法切换.所以这种高可用只适用于预防namenode服务死掉而不能承受节点死掉.如果active的节点挂掉了,那么,从节点的zkfc日志中将能看到java.net.NoRouteToHostException: No route to host (Host unreachable)的异常.

注意点2: 为了防止脑裂问题,Hadoop HA也采用了半数机制.即(n - 1) / 2.所以,至少需要运行三个journalnode服务,也就是三个主节点(一个主节点两个备用节点).

官方原文:

JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.

文件中列出了每个文件的所有配置,很多配置并不是高可用的(如配置每个节点可以申请到的资源)
很多解释为了准确性粘贴了官方原文,译文可以用 https://fanyi.youdao.com/

主机基础环境配置

关闭防火墙,关闭selinux,确保节点通信正常

修改主机名(因机器数量较少所以很多组件共用一个节点)(y因高可用要求所以准备三个主节点)

IP地址	主机名	描述
20.88.10.31	emr-header-01	主节点,zookeeper节点
20.88.10.32	emr-header-02	备用主节点,zookeeper节点
20.88.10.33	emr-worker-01	备用主节点,zookeeper节点,工作节点
20.88.10.34	emr-worker-02	工作节点

修改hosts文件和主机名
```
hostnamectl set-hostname emr-header-01;bash
```
hosts文件内容添加(每个节点都一样)

此项必操作

因为Hadoop配置中指定了主节点及从节点主机名,如果没有加hosts,解析主机名会失败
```
20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02
```

节点互相免密(emr-header-01节点执行)

Hadoop启动时会ssh连接到其他节点,如果不做免秘钥会提示输入密码

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

分发到每个主机

sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-header-01
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-header-02
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-worker-01
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-worker-02

安装JAVA环境

安装系统centos7,用户为root权限(没有也可以),所有包安装目录为/opt

所有节点均需安装

按照版本所需下载对应jdk,下载略过,Java官网:https://www.oracle.com/java/technologies/downloads/

tar xf jdk-8u181-linux-x64.tar.gz -C /opt/
echo 'export JAVA_HOME=/opt/jdk1.8.0_181/
export PATH=${PATH}:${JAVA_HOME}/bin' >>/etc/profile
source /etc/profile
java -version

安装zookeeper

zookeeper安装于emr-header-01,emr-header-02,emr-worker-01三个节点

zookeeper官方下载地址:https://zookeeper.apache.org/releases.html

emr-header-01节点操作:

# 下载后上传,此处略过
tar xf zookeeper-3.4.13.tar.gz -C /opt/

修改配置

cd /opt/zookeeper-3.4.13/conf/
vim zoo.cfg

全部配置:

minSessionTimeout=16000
maxSessionTimeout=300000
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/datalog
clientPort=2181
server.1=emr-header-01:2888:3888
server.2=emr-header-02:2888:3888
server.3=emr-worker-01:2888:3888

添加变量

echo 'ZK_HOME=/opt/zookeeper-3.4.13' >>/etc/profile
echo 'PATH=$PATH:${ZK_HOME}/bin/' >>/etc/profile
source /etc/profile

分发到其他节点

scp -r /opt/* emr-header-02:/opt/
scp -r /opt/* emr-worker-01:/opt/
scp -r /opt/* emr-worker-02:/opt/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/hosts emr-header-02:/etc/
scp /etc/hosts emr-worker-01:/etc/
scp /etc/hosts emr-worker-02:/etc/

所有节点执行

mkdir -p /data/zookeeper/data
cd /data/zookeeper/data

注意:此处每个节点执行内容不同

每个文件的数字全局唯一同时不能随便写

如emr-header-01的数字1对应的是上面配置中的server.1=emr-header-01:2888:3888 的 1

# emr-header-01节点执行
echo "1" >myid
# emr-header-02节点执行
echo "2" >myid
# emr-worker-01节点执行
echo "3" >myid

启动,所有节点执行

source /etc/profile;zkServer.sh start

检查状态

zkServer.sh status

三个节点执行后有两个是Mode: follower一个Mode: leader即为成功

安装Hadoop

官网地址:https://archive.apache.org/dist/hadoop/common/

此处使用清华源下载,地址为:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz

emr-header-01执行

curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar xf hadoop-2.10.1.tar.gz -C /opt/
cd /opt/hadoop-2.10.1/etc/hadoop/

检查是否有fuser命令,没有则安装(所有节点)

如果是sshfence方式回使用到这个命令,因为他为了防止脑裂要ssh连接到已经down掉的NameNode节点重新杀一遍NameNode的进程.而杀进程的时候回用到这条命令,如果没有的话sshfence方式在主节点执行命令hadoop-daemon,.sh stop namenode停止掉NameNode节点也是无法切换过去的,更不用说停掉主节点的主机.

yum install -y psmisc

开始修改配置

如果主机名和解析不同,需要修改文件中对应的内容

注意yarn-site.xml文件中的yarn.nodemanager.resource.memory-mb和yarn.nodemanager.resource.cpu-vcores参数,改参数用于该节点可被调用的最大CPU数量和内存大小,此处设置为主机内存一半.
配置中除了高可用相关配置还有其他配置,如开启日志收集设置节点最大可用资源等,可以单独加入配置也可以直接覆盖之前所有配置,Hadoop版本1.10.1测试可用.

hadoop-env.sh第一行插入,或者修改文件中的JAVA_HOME变量

source /etc/profile

slaves文件内容(删除原有的localhost)

# 此文件中写所有从节点别名
emr-worker-01
emr-worker-02

core-site.xml文件内容,注意文件中的不要重复






<configuration>
	<property>
		<name>fs.defaultFSname>
		
		<value>hdfs://emr-header-01:9000value>
		<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.description>
	property>
	<property>
		
		<name>hadoop.tmp.dirname>
		<value>/data/hadoop/tmpvalue>
		<description>A base for other temporary directories.description>
	property>
	<property>
		
		<name>fs.defaultFSname>
		<value>hdfs://myclustervalue>
		<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.description>
	property>
	<property>
		
		<name>hadoop.tmp.dirname>
		<value>/data/tmpvalue>
		<description>A base for other temporary directories.description>
	property>
	<property>
		
		<name>ha.zookeeper.quorumname>
		<value>emr-header-01:2181,emr-header-02:2181,emr-worker-01:2181value>
		<description>A list of ZooKeeper server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover.description>
	property>
configuration>

yarn-site.xml



<configuration>


	<property>
    	
		<name>yarn.nodemanager.aux-servicesname>
		<value>mapreduce_shufflevalue>
		<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbersdescription>
	property>
	<property>
    	
		<name>yarn.resourcemanager.hostnamename>
		<value>emr-header-01value>
		<description>The hostname of the RM.description>
	property>
	<property>
    	
    	<name>yarn.nodemanager.resource.cpu-vcoresname>
    	<value>2value>
		<description>Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux. In other cases, number of vcores is 8 by default.description>
	property>
	<property>
    	
    	<name>yarn.nodemanager.resource.memory-mbname>
    	<value>4096value>
		<description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB.description>
	property>
	<property>
    	
    	<name>yarn.scheduler.maximum-allocation-vcoresname>
    	<value>4value>
		<description>The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException.description>
	property>
	<property>
    	
    	<name>yarn.scheduler.maximum-allocation-mbname>
    	<value>3072value>
		<description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.description>
	property>
	<property>
    	
    	<name>yarn.scheduler.minimum-allocation-vcoresname>
    	<value>1value>
		<description>	The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.description>
	property>
	<property>
    	
    	<name>yarn.scheduler.minimum-allocation-mbname>
    	<value>1024value>
		<description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.description>
	property>
	<property>
		
		<name>yarn.log-aggregation-enablename>
		<value>truevalue>
		<description>Whether to enable log aggregation. Log aggregation collects each container's logs and moves these logs onto a file-system, for e.g. HDFS, after the application completes. Users can configure the "yarn.nodemanager.remote-app-log-dir" and "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine where these logs are moved to. Users can access the logs via the Application Timeline Server.description>
	property>
	<property>
		
		<name>yarn.log-aggregation.retain-secondsname>
		<value>604800value>
		<description>	How long to keep aggregation logs before deleting them. -1 disables. Be careful set this too small and you will spam the name node.description>
	property>
    <property>
        <name>yarn.nodemanager.aux-servicesname>
        <value>mapreduce_shufflevalue>
		<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbersdescription>
    property>
    
    <property>
        <name>yarn.resourcemanager.ha.enabledname>
        <value>truevalue>
		<description>Enable RM high-availability. When enabled, (1) The RM starts in the Standby mode by default, and transitions to the Active mode when prompted to. (2) The nodes in the RM ensemble are listed in yarn.resourcemanager.ha.rm-ids (3) The id of each RM either comes from yarn.resourcemanager.ha.id if yarn.resourcemanager.ha.id is explicitly specified or can be figured out by matching yarn.resourcemanager.address.{id} with local address (4) The actual physical addresses come from the configs of the pattern - {rpc-config}.{id}description>
    property>
    
	<property>
		<name>yarn.resourcemanager.cluster-idname>
		<value>cluster-yarnvalue>
		<description>Name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster and ensures it does not affect other clustersdescription>
	property>
	
	<property>
		<name>yarn.resourcemanager.ha.rm-idsname>
		<value>rm1,rm2value>
		<description>The list of RM nodes in the cluster when HA is enabled. See description of yarn.resourcemanager.ha .enabled for full details on how this is used.description>
	property>
	
	
	<property>
		<name>yarn.resourcemanager.hostname.rm1name>
		<value>emr-header-01value>
	property>
	
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1name>
		<value>emr-header-01:8088value>
	property>
	
	<property>
		<name>yarn.resourcemanager.address.rm1name>
		<value>emr-header-01:8032value>
	property>
	
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm1name>
		<value>emr-header-01:8030value>
	property>
	
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm1name>
		<value>emr-header-01:8031value>
	property>
	
	
	<property>
		<name>yarn.resourcemanager.hostname.rm2name>
		<value>emr-header-02value>
	property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2name>
		<value>emr-header-02:8088value>
	property>
	<property>
		<name>yarn.resourcemanager.address.rm2name>
        <value>emr-header-02:8032value>
	property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm2name>
		<value>emr-header-02:8030value>
	property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm2name>
		<value>emr-header-02:8031value>
	property>
	
	<property>
		<name>yarn.resourcemanager.zk-addressname>
		<value>emr-header-01:2181,emr-header-02:2181,emr-header-03:2181value>
		<description>description>
	property>
	
	<property>
		<name>yarn.resourcemanager.recovery.enabledname>
		<value>truevalue>
		<description>Enable RM to recover state after starting. If true, then yarn.resourcemanager.store.class must be specified.description>
	property>
	
	<property>
		<name>yarn.resourcemanager.store.classname>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
		<description>The class to use as the persistent store. If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used, the store is implicitly fenced; meaning a single ResourceManager is able to use the store at any point in time. More details on this implicit fencing, along with setting up appropriate ACLs is discussed under yarn.resourcemanager.zk-state-store.root-node.acl.description>
	property>
	
	<property>
		<name>yarn.nodemanager.env-whitelistname>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOMEvalue>
		<description>Environment variables that containers may override rather than use NodeManager's default.description>
	property>
configuration>

hdfs-site.xml







<configuration>
    
	<property>
		<name>dfs.replicationname>
		<value>1value>
	property>
    
    <property>
		<name>dfs.namenode.secondary.http-addressname>
		<value>emr-header-01:50090value>
	property>
    
    <property>
        <name>dfs.namenode.name.dirname>
        <value>file://${hadoop.tmp.dir}/namevalue>
    property>
    
    <property>
        <name>dfs.datanode.data.dirname>
        <value>file://${hadoop.tmp.dir}/datavalue>
        <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.description>
    property>
    
    <property>
        <name>dfs.journalnode.edits.dirname>
        <value>${hadoop.tmp.dir}/jnvalue>
        <description>The directory where the journal edit files are stored.description>
    property>
    
    <property>
        <name>dfs.nameservicesname>
        <value>myclustervalue>
        <description>Comma-separated list of nameservices.description>
    property>
    
    <property>
        <name>dfs.ha.namenodes.myclustername>
        <value>nn1,nn2,nn3value>
        <description>dfs.ha.namenodes.EXAMPLENAMESERVICE The prefix for a given nameservice, contains a comma-separated list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE). Unique identifiers for each NameNode in the nameservice, delimited by commas. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you used “mycluster” as the nameservice ID previously, and you wanted to use “nn1” and “nn2” as the individual IDs of the NameNodes, you would configure a property dfs.ha.namenodes.mycluster, and its value "nn1,nn2".description>
    property>
    
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1name>
        <value>emr-header-01:8020value>
    property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2name>
        <value>emr-header-02:8020value>
    property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn3name>
        <value>emr-worker-01:8020value>
    property>
    
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1name>
        <value>emr-header-01:50070value>
    property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2name>
        <value>emr-header-02:50070value>
    property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn3name>
        <value>emr-worker-01:50070value>
    property>
    
    <property>
        <name>dfs.namenode.shared.edits.dirname>
        <value>qjournal://emr-header-01:8485;emr-header-02:8485;emr-worker-01:8485/myclustervalue>
        <description>A directory on shared storage between the multiple namenodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir above. It should be left empty in a non-HA cluster.description>
    property>
    <property>
        <name>dfs.client.failover.proxy.provider.myclustername>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
    property>
    
    <property>
        <name>dfs.ha.fencing.methodsname>
        <value>shell(/bin/true)value>
        <description>A list of scripts or Java classes which will be used to fence the Active NameNode during a failover. See the HDFS High Availability documentation for details on automatic HA configuration.description>
    property>
    
    <property>
        <name>dfs.ha.fencing.ssh.private-key-filesname>
        <value>/root/.ssh/id_dsavalue>
    property>
    
    <property>
           <name>dfs.ha.automatic-failover.enabledname>
           <value>truevalue>
           <description>Whether automatic failover is enabled. See the HDFS High Availability documentation for details on automatic HA configuration.description>
     property>
	 <property>
		  <name>dfs.namenode.checkpoint.txnsname>
		  <value>1000000value>
		<description>The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every 'dfs.namenode.checkpoint.txns' transactions, regardless of whether 'dfs.namenode.checkpoint.period' has expired.description>
	property>
	<property>
		<name>dfs.namenode.checkpoint.check.periodname>
		<value>60value>
		<description>The SecondaryNameNode and CheckpointNode will poll the NameNode every 'dfs.namenode.checkpoint.check.period' seconds to query the number of uncheckpointed transactions. Support multiple time unit suffix(case insensitive), as described in dfs.heartbeat.interval.If no time unit is specified then seconds is assumed.description>
	property>
    <property>
        
        <name>dfs.namenode.num.extra.edits.retainedname>
        <value>1000000value>
        <description>The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file's age, or the number of files kept, but to the number of transactions (here "edits" means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don't include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs. NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained.description>
    property>
    <property>
        
        <name>dfs.namenode.num.checkpoints.retainedname>
        <value>2value>
        <description>The number of image checkpoint files (fsimage_*) that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs (stored on edits_* files) necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.description>
    property>
    <property>
        <name>dfs.namenode.max.extra.edits.segments.retainedname>
        <value>10000value>
        <description>The maximum number of extra edit log segments which should be retained beyond what is minimally necessary for a NN restart. When used in conjunction with dfs.namenode.num.extra.edits.retained, this configuration property serves to cap the number of extra edits files to a reasonable value.description>
    property>
configuration>

mapred-site.xml,需要先复制出一份cp mapred-site.xml.template mapred-site.xml







<configuration>
<property>
    
	<name>dfs.replicationname>
	<value>1value>
property>
<property>
    
	<name>dfs.namenode.secondary.http-addressname>
	<value>emr-header-01:50090value>
property>
configuration>

编辑/opt/hadoop-2.10.1/sbin/start-yarn.sh

# start resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager
# 插入下面这一段 启动时同时启动emr-header-02节点的resourcemanager进程
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager
# start nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR  start nodemanager
# start proxyserver
#"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start proxyserver

编辑/opt/hadoop-2.10.1/sbin/stop-yarn.sh

# stop resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop resourcemanager
# 插入下面一行 停止任务时同时停止emr-header-02节点的resource-manager进程
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop resourcemanager
# stop nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR  stop nodemanager
# stop proxy server
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop proxyserver

插入环境变量

echo 'export HADOOP_HOME=/opt/hadoop-2.10.1/' >>/etc/profile
echo 'export PATH=${PATH}:${HADOOP_HOME}/sbin/:${HADOOP_HOME}/bin/' >>/etc/profile
source /etc/profile

复制到其他节点(emr-header-01节点操作)

# 复制密钥到其他节点,因为上面配置了dfs.ha.fencing.ssh.private-key-files参数指定了密钥文件
scp -r /root/.ssh/* emr-header-02:/root/.ssh/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/profile emr-worker-02:/etc/
scp -r /opt/hadoop-2.10.1/ emr-header-02:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-01:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-02:/opt/

开始启动

所有节点刷新环境变量

source /etc/profile

emr-header-01,emr-header-02,emr-worker-01 三个节点执行

hadoop-daemon.sh start journalnode

若想检查启动情况见日志,路径与/opt/hadoop-2.10.1/logs/hadoop-root-journalnode-emr-header-01.log,不同节点文件名不同.

emr-header-01节点执行

# 此条重复执行会提示输入y
hdfs zkfc -formatZK
hdfs namenode -format
hadoop-daemon.sh start namenode

emr-header-02和emr-worker-02节点执行(同步namenode数据)

hdfs namenode -bootstrapStandby

emr-header-01节点运行

stop-all.sh

开始启动集群

start-all.sh

页面访问

建议现在本地电脑添加hosts,不添加同样可用,不过访问到resourceManager备用节点时(也就是YARN)会跳转到主节点的主机别名.

hosts文件内容(同上面每个主机的)

20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02

访问header1的

# HDFS 每个运行了journalnode的都可以访问
http://20.88.10.31:50070/
http://20.88.10.32:50070/
http://20.88.10.33:50070/

HDFS第一行Overview 'emr-header-01:8020' (active)为主节点,Overview 'emr-header-02:8020' (standby)为备用节点

# YARN 或者MapReduce
http://20.88.10.31:8088/

查看节点状态

rm1和rm2从上面配置指定,rm1为emr-header-01节点,rm2为emr-header-02节点.

[root@emr-header-01 ~]#  yarn rmadmin -getServiceState rm1
standby
[root@emr-header-01 ~]#  yarn rmadmin -getServiceState rm2
active
[root@emr-header-01 ~]#

测试

根据上面查看的情况,HDFS主节点为emr-header-01,而Yarn主节点为emr-header-02.

# emr-header-01 执行,停止namenode,注意是hadoop-daemon.sh而不是hadoop-daemons.sh
hadoop-daemon.sh stop namenode
# emr-header-02节点,执行命令停止resourceManager
yarn-daemon.sh stop resourcemanager

Yarn header02节点已不可访问,访问http://emr-header-01:8088/cluster可用.

HDFS header-01节点不可访问,访问header-02 http://emr-header-02:50070/dfshealth.html#tab-overview,显示为active状态.

停止的服务重新启动,即自动成为备用节点,启动命令(stop在那个节点执行的,对应的start同样在那个节点执行):

yarn-daemon.sh start resourcemanager
hadoop-daemon.sh start namenode

当上面的测试都可用的时候,那么可以随意停掉任何一个主节点的机器模拟故障,查看是否能够切换过去.dfs.ha.fencing.methods参数为shell方式是没问题的而sshfence是切换不过去的,原因可见开头.相对比之下,shell方式主备切换过程会比较长需要多等待一会(测试大概一分钟以内),而sshfence方式会很快在active节点down掉之后补上去.但是shell方式可以承受主机节点的异常.

可能存在的其他疑惑

1. edits日志数量特别多

如果经常观察Hadoop的NameNode数据存储目录,可以发现高可用情况下NameNode的edits日志数量越来越多.
这个是通过hdfs配置中的dfs.namenode.num.extra.edits.retained参数控制的,高可用情况下,默认保留的edits日志数量是1000000个.
对此Hadoop官方给出的解释是:

The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file’s age, or the number of files kept, but to the number of transactions (here “edits” means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don’t include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs. NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained.

同时提到了参数dfs.namenode.max.extra.edits.segments.retained,默认值是10000,解释为:

The maximum number of extra edit log segments which should be retained beyond what is minimally necessary for a NN restart. When used in conjunction with dfs.namenode.num.extra.edits.retained, this configuration property serves to cap the number of extra edits files to a reasonable value.

2. 高可用状态下主备切换有问题

可参考: https://blog.csdn.net/weixin_44455125/article/details/122524280

3. 阿里云的emr集群配置

阿里云的emr集群配置同样是shell方式,但是他的主节点数量支持两个.

不知道为啥,还在看.有知道的大佬劳烦给讲讲.

你可能感兴趣的:(Linux,hadoop,hdfs)

draw.io（现更名为 diagrams.net）的详细介绍及详细使用教程小纯洁w draw.io
以下是关于draw.io（现更名为diagrams.net）的详细介绍及详细使用教程，结合其核心功能、操作步骤和实用技巧整理而成：一、draw.io核心介绍基本定位免费开源：完全免费且无广告，支持网页版和桌面端（Windows/macOS/Linux）。多场景适用：支持流程图、UML图、网络拓扑图、组织结构图、电路图等数十种图表类型。云端集成：无缝对接GoogleDrive、OneDrive、Gi
新手学习linux关于CentOS下载及版本选择 \光辉岁月/ linux
i386是给32位机器使用的，而x86_64适用于64位机器。前者只能使用32位软件，后者可以兼用32位软件，这就是两者区别。如果你的服务器内存超4GB，强烈建议使用64位版本；如果只在虚拟机器里安装学习，那么32位就行了，也就是选择i386版本。如果想做服务器，则建议选64位。进入之后我们看到这样一个界面：这么多文件该怎么选择呢？对新手来说，可能一下子要蒙了。不急，慢慢来。先观察文件后缀名，分.
基础的Linux命令和http状态码 lichenyang453 linux 运维服务器
1.Linux基础命令速查1.cd-目录导航常用操作cd~返回用户主目录cd..返回上级目录cd-返回上一个所在目录cd/var/log进入绝对路径2.pwd路径定位直接查看当前工作目录，当目录结构迷失，快速确认当前位置。3.mkdir创造文件夹mkdirstore创造一个名为store的文件夹4.touchtouchstore创造一个文件（名为touch）2.HTTP状态码1.200OK表示请求
HoRain云--Java集合框架：从入门到精通 HoRain云小助手 java 开发语言
HoRain云小助手：个人主页个人专栏:《Linux系列教程》《c语言教程》⛺️生活的理想，就是为了理想的生活!⛳️推荐前些天发现了一个超棒的服务器购买网站，性价比超高，大内存超划算！忍不住分享一下给大家。点击跳转到网站。专栏介绍专栏名称专栏介绍《C语言》本专栏主要撰写C干货内容和编程技巧，让大家从底层了解C，把更多的知识由抽象到简单通俗易懂。《网络协议》本专栏主要是注重从底层来给大家一步步剖析网
Linux 终止进程咖啡续命又一天 Linux linux 运维服务器
在Linux中，如果你需要终止正在运行的进程，可以使用以下几种方法，以sort进程为例。以下是具体的操作步骤和注意事项：方法1：使用pkill命令pkill命令可以根据进程名称直接终止进程，无需知道具体的PID（进程ID）。pkillsort说明：pkillsort会向所有名为sort的进程发送默认的SIGTERM信号（信号编号15），请求进程终止。如果进程没有响应SIGTERM信号，可以强制终止
【Linux】Ubuntu设置root用户远程登录酒十六 Linux Linux ubuntu centos root ssh
1.修改root密码passwd2.打开此文件vim/etc/ssh/sshd_configwq保存退出3.重启机器，并使用root用户连接reboot
Java 程序员必备的 26 个 Linux 命令，常用 + 面试两手抓库库林_沙琪马 Linux linux 面试运维
有人问我：日常开发中最常用的Linux命令有哪些？我不假思索地就列出了26个，涵盖开发、调试、运维场景，每一个都值得收藏+实战。内容速览26个高频Linux命令详解面试官常问的重点命令实操建议&学习指引一、开发者高频使用的26个Linux命令1️⃣cd-切换目录cd/usr/local/bincd..cd~Tips：~表示当前用户主目录，..表示上级目录。2️⃣mkdir-创建目录mkdirmyd
国内仍然可用docker镜像源汇总，长期维护，定期更新（2025年5月26日）夏boss Docker docker 容器
文章目录可用镜像：测试镜像是否可用使用方法1、临时配置2、永久配置linux配置镜像源Windows/Mac配置镜像源本文提供多个目前可正常访问的Docker镜像，解决Docker拉取镜像失败的问题。并提供镜像源使用方法，包括Linux/Windows/Mac系统设置。所有镜像仅供学习使用，禁止不当使用。由于一些未明确的原因，各大高校及容器技术社区纷纷关闭多个镜像加速站点，对于我们这种正规的用户来
Linux进程信号 xuanzdhc C++linux 运维服务器 c++
目录概念信号种类信号生命周期信号的处理信号的清除概念进程信号是一种异步通信机制，用于进程间传递事件通知。它可以强制进程中断当前操作，转而执行预设的信号处理动作，是系统管理和进程控制的重要手段本质：信号是一个整数编号代表不同类型事件信号种类可靠信号保证传递（支持队列）可携带参数（通过siginfo_t结构体）需要显示设置sa_flags=SA_SIGINFO不可靠信号可能丢失（如短时间内多次发送同一
风车OVF 1.2：AI开发环境完全指南 - 打造Linux下的Augment与Cursor一站式解决方案 yangshuo1281 augment cursor linux 人工智能 linux 运维
风车OVF1.2：AI开发环境完全指南-打造Linux下的Augment与Cursor一站式解决方案一站式AI续杯|cursor|augment|linux|OVF|虚拟机前言在AI辅助编程工具快速发展的今天，Augment和Cursor已成为开发者不可或缺的编程助手。然而，Windows环境下的限制和复杂配置往往让用户望而却步。本文将详细介绍风车OVF1.2虚拟机系统，这是一个专为AI开发优化的
Http、Ftp、Dns和Dhcp服务器搭建 xixihaha_dddddd 计算机网络服务器 http linux
服务器搭建的要求①搭建Web服务器要求做一个简单的主页（index.html）以便测试web服务，服务器（Linux平台）ip地址配置：10.28.110.251,255.255.255.0，域名为：www.xxx.cie.net。②搭建Ftp服务器要求能够匿名登录，能够使用账号密码进行登录，能够上传和下载，服务器（Linux平台）ip地址配置：10.28.110.252,255.255.255.
mac安装linux时触控板不能用,苹果笔记本安装Win10触摸板右键无法使用的处理方法... 旁间拓式
一位用户反馈自己在苹果笔记本MacBookair上安装了Windows10系统，可是后面发现触摸板右键根本无法使用，这是怎么回事呢？其实，这个是和苹果笔记本安装win7时的设置一样的，我们需要在BootCamp中进行设置。接下来，就随小编一起看看具体方法吧！方法如下：1、首先你确定已经安装过BootCamp,如果没有去苹果官方下载相应版本BootCamp下载(注意按机型下载)。如果已经安装过，那么
Windows PowerShell 执行脚本输出文件编码格式 qr457535344 windows powershell
WindowsPowerShell输出重定向(“>”)文件编码默认为UTF-16(LE)问题对比linux中执行脚本的输出文件（txt），WindowsPowerShell的输出文件大小大约是前者的两倍。这是因为对于文件的输出重定向默认选择”UTF-16(LE)”(微软称之为Unicode编码)，而实际需要文件输出格式为”UTF-8”。如果想一劳永逸的解决这个问题，最好是直接设置WindowsPo
【Linux】定时任务 Crontab 与时间同步服务器敖云岚 linux 运维服务器
目录一、用户定时任务的创建与使用1.1用户定时任务的使用技巧1.2管理员对用户定时任务的管理1.3用户黑白名单的管理一、用户定时任务的创建与使用1.1用户定时任务的使用技巧第一步：查看服务基本信息systemctlstatuscrond.service//查看周期性计划任务的服务状态，runningsystemctlenable--nowcrond//设置周期性计划任务crond为开机自启动，并且
linux中at重定位命令,readelf命令_Linux readelf 命令用法详解：用于显示elf格式文件的信息... 默默加一 linux中at重定位命令
readelf命令用来显示一个或者多个elf格式的目标文件的信息，可以通过它的选项来控制显示哪些信息。这里的elf-file(s)就表示那些被检查的文件。可以支持32位，64位的elf格式文件，也支持包含elf文件的文档(这里一般指的是使用ar命令将一些elf文件打包之后生成的例如lib*.a之类的“静态库”文件)。这个程序和objdump提供的功能类似，但是它显示的信息更为具体，并且它不依赖BF
Linux基础命令集合牛岚风 linux 运维服务器
目录文件目录相关命令lscdcpfindmkdirmvrmtouchfiletreechattrlsattrmd5sum查看文件以及内容处理相关命令vimcatmore和headtailcutsortuniqwcgreptr文件压缩以及解压缩相关命令tarunzipgzipzip软件包管理相关命令rpmyumapt-get信息显示相关命令unamehostnameuptimestatdudftop
手动续期证书后自动上传到阿里云
要将acme.sh续期后的脚本自动传到阿里云上，可以按照以下步骤进行：安装阿里云CLI：在服务器上安装阿里云命令行工具（CLI），以便能够通过命令行与阿里云进行交互。可以使用以下命令进行安装：wgethttps://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz&&tarxzvfaliyun-cli-linux-latest-amd64
Linux系统简介 strive颖先生
操作系统（OperatingSystem,简称OS）：软件和硬件资源的管理者，他是宇宙中最复杂的软件，对下管理各种硬件，对上为应用程序的运行提供一个平台。主流操作系统PC:Windows,osx,Linux服务器(Server）:Unix/Linux,WindowsServer,OSX嵌入式设备(EmbeddedDevice):Linxu,Android,VxWorks,ios,winCE,win
Linux 内核引导参数简介数据存储张 Linux内核操作系统 kernel linux linux内核路由器
版权声明本文作者是一位开源理念的坚定支持者，所以本文虽然不是软件，但是遵照开源的精神发布。无担保：本文作者不保证作品内容准确无误，亦不承担任何由于使用此文档所导致的损失。自由使用：任何人都可以自由的阅读/链接/打印此文档，无需任何附加条件。名誉权：任何人都可以自由的转载/引用/再创作此文档，但必须保留作者署名并注明出处。其他作品本文作者十分愿意与他人分享劳动成果，如果你对我的其他翻译作品或者技术文
【Linux实训课程讲义适用于教授Linux初学者的教学老师】 lijiatu10086 Linux linux 运维服务器
内容介绍本博客用于Linux实训课程讲义，适用于对Linux初学者的教学老师。如果你自己已经使用了linux很久，但一时间不知道该对Linux初学者讲什么东西不知道从何讲起，你可以参考本博客。（部分章节内容不够详细，可以进行一定补充）一、Linux介绍以及安装Linux教程|菜鸟教程1.1介绍Linux内核最初只是由芬兰人林纳斯·托瓦兹（LinusTorvalds）在赫尔辛基大学上学时出于个人爱好
【Linux】Ubuntu中，如何创建软件的快捷方式放到桌面上 lijiatu10086 linux ubuntu 运维
本文主要介绍Ubuntu中，如何创建软件的快捷方式放到桌面上首先进入到/usr/share/applications/路径下，找到自己想要的软件，这里以我的vim为例子ricardo@DESKTOP-8T8LHV5:/usr/share/applications$lsbyobu.desktopio.snapcraft.SessionAgent.desktoppython2.7.desktopvim
掌握Linux C++轻量级Web服务器开发：TinyWebServer项目实战老光私享
本文还有配套的精品资源，点击获取简介：TinyWebServer是一个用C++编写的轻量级Web服务器，专为Linux系统设计。它提供了深入学习Web服务器工作原理和本地开发小型项目的机会。项目涵盖了网络套接字编程、多线程处理、HTTP协议解析等关键系统编程技术，并允许通过源代码分析学习和实践。开发者可以通过扩展TinyWebServer来学习更多关于Web服务器的深入知识和技术。1.Linux下
Ubuntu 22.04 庙算平台训练环境搭建指南笑衬人心。 ubuntu linux 深度学习
本文记录了基于Ubuntu22.04.3LTS的训练环境搭建过程，适用于需要部署庙算推演离线平台的用户，支持GPU（可选）。一、训练环境搭建●硬件要求操作系统：Linux（推荐Ubuntu22.04.3LTS）可选配置：NVIDIAGPU（CUDA支持）1.Linux环境搭建建议双系统安装Ubuntu，具体参考如下教程：参考教程：Windows和Ubuntu双系统的安装和卸载（哔哩哔哩）2.GPU
推荐开源项目：Embassy - 轻量级Swift HTTP服务器框架戴艺音
推荐开源项目：Embassy-轻量级SwiftHTTP服务器框架EmbassySuperlightweightasyncHTTPserverlibraryinpureSwiftrunsiniOS/MacOS/Linux项目地址:https://gitcode.com/gh_mirrors/emb/Embassy项目介绍Embassy是一个由纯Swift编写的超轻量级异步HTTP服务器，仅包含约1.
linux 互斥锁销毁_Linux系统编程 —互斥量mutex weixin_39609622 linux 互斥锁销毁
互斥量mutex前文提到，系统中如果存在资源共享，线程间存在竞争，并且没有合理的同步机制的话，会出现数据混乱的现象。为了实现同步机制，Linux中提供了多种方式，其中一种方式为互斥锁mutex(也称之为互斥量)。互斥量的具体实现方式为：每个线程在对共享资源操作前都尝试先加锁，成功加锁后才可以对共享资源进行读写操作，操作结束后解锁。互斥量不是为了消除竞争，实际上，资源还是共享的，线程间也还是竞争的，
Linux性能优化原理和实战文章汇总【建议收藏】锅锅来了 Linux性能优化原理和实战 linux 性能优化运维
Linux性能优化原理和实战【网络】Linux内核优化实战-net.ipv4.neigh.default.gc_stale_time发布于2025-07-0112:43:21【网络】Linux内核优化实战-net.ipv4.conf.default.arp_announce发布于2025-07-0112:39:34【网络】Linux内核优化实战-net.ipv4.conf.all.rp_filte
【脚本】Linux磁盘目录挂载脚本(不分区) 锅锅来了 Linux 运维进阶必备【案例分享】linux 网络运维
以下是一个不带分区，直接挂载整个磁盘到指定目录的脚本。该脚本会检查磁盘是否已挂载，自动创建文件系统（可选），并配置开机自动挂载：#!/bin/bash#磁盘直接挂载脚本（不分区）#使用方法:sudo./mount_disk_raw.sh/dev/sdb/mnt/dataext4[format]#参数检查if[$#-lt2];thenecho"用法:$0[文件系统类型=ext4][是否格式化=yes
【基础】Golang语言开发环境搭建(Linux主机) 锅锅来了 Golang 运维开发实战手册 linux 运维开发 golang
目录1.下载并安装Go语言2.配置环境变量3.验证安装4.配置Go模块5.安装常用开发工具6.配置IDE（可选）7.第一个Go程序在Linux主机上搭建Golang开发环境，你可以按照以下步骤进行操作：1.下载并安装Go语言首先从官网下载Go语言的Linux版本，然后解压到指定目录：#下载Go1.23.10版本（根据需要选择最新稳定版）wgethttps://golang.google.cn/dl
[异常解决] ubuntukylin16.04 LTS中关于flash安装和使用不了的问题解决 weixin_34413103
http://www.linuxdiyf.com/linux/25211.html归纳解决flash插件大法：启动器中找到软件更新，启动，点击其它软件，把Canonical合作伙伴前方框选上，目的把第三方合作伙伴源加上。点击终端：输入：sudoapt-getupdate目的更新源输入：sudoapt-getremoveflashplugin-install目的卸载原集成flash插件。输入：sud
VSCode - 使用 WSL（Windows Subsystem for Linux） anleng6817 开发工具 git
一开始我是只将VSCode集成的终端改成WSL的Bash，结果发现内置的GIt用的还是Windows的Git，GitHooks用的Windows的环境，上网搜了一下发现有很复杂的方式，继续翻了翻发现管饭居然有超好用的方式DevelopingintheWindowsSubsystemforLinuxwithVisualStudioCode（虽然有大神指出这种方式还有有难用的地方。。）总之安装Remo
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f