chinesesword

nutch2.2.1安装部署

http://www.promenade.me/archives/146

手工创建表webpage varchar（767）改为 varchar（255）或者 text

有一篇对应的博文博文，不过是2.1版本的，在最新的2.2.1版本中有很多问题，所以强烈建议大家一定要完全把这篇文章看完后再着手操作，不要跟着我一起走弯路。

流水账一样的配置过程。

mysql配置：

 
           1 
         
           2 
         
           3 
         
           4 
         
           5 
         
           6 
         
           7 
         
           8 
         
           9 
         
           10 
         
           11 
         
           12 
         
           13 
         
           14 
         
           15 
         
           16 
         
           17 
         
           18 
         
           19 
         
           20 
         
           21 
         
           22 
         
           23 
         
           24 
         
           25 
         
           26 
         
           27 
         
           28 
         
           29 
         
           30 
         
          CREATE 
            
          DATABASE 
           nutch 
            
          DEFAULT 
            
          CHARACTER 
            
          SET 
           utf8mb4 
            
          DEFAULT 
            
          COLLATE 
           utf8mb4_unicode_ci; 
         
          CREATE 
            
          TABLE 
           `webpage` 
           ( 
         
           `id` 
            
          varchar(767) 
            
          NOT NULL, 
         
           `headers` 
           blob, 
         
           ` 
          text` 
           mediumtext 
            
          DEFAULT 
            
          NULL, 
         
           `status` 
            
          int(11) 
            
          DEFAULT 
            
          NULL, 
         
           `markers` 
           blob, 
         
           `parseStatus` 
           blob, 
         
           `modifiedTime` 
            
          bigint(20) 
            
          DEFAULT 
            
          NULL, 
         
           `score` 
            
          float 
            
          DEFAULT 
            
          NULL, 
         
           `typ` 
            
          varchar(32) 
            
          CHARACTER 
            
          SET 
           latin1 
            
          DEFAULT 
            
          NULL, 
         
           `baseUrl` 
            
          varchar(767) 
            
          DEFAULT 
            
          NULL, 
         
           ` 
          content` 
           longblob, 
         
           `title` 
            
          varchar(2048) 
            
          DEFAULT 
            
          NULL, 
         
           `reprUrl` 
            
          varchar(767) 
            
          DEFAULT 
            
          NULL, 
         
           `fetchInterval` 
            
          int(11) 
            
          DEFAULT 
            
          NULL, 
         
           `prevFetchTime` 
            
          bigint(20) 
            
          DEFAULT 
            
          NULL, 
         
           `inlinks` 
           mediumblob, 
         
           `prevSignature` 
           blob, 
         
           `outlinks` 
           mediumblob, 
         
           `fetchTime` 
            
          bigint(20) 
            
          DEFAULT 
            
          NULL, 
         
           `retriesSinceFetch` 
            
          int(11) 
            
          DEFAULT 
            
          NULL, 
         
           `protocolStatus` 
           blob, 
         
           `signature` 
           blob, 
         
           `metadata` 
           blob, 
         
          PRIMARY 
            
          KEY 
           (`id`) 
         
           ) 
           ENGINE 
          =InnoDB 
         
           ROW_FORMAT 
          =COMPRESSED 
         
          DEFAULT 
           CHARSET 
          =utf8mb4;

ivy/ivy.xml中需要uncomment这两行，让gora支持mysql

 
           1 
         
           2 
         
          org 
          = 
          "org.apache.gora" 
            
          name 
          = 
          "gora-sql" 
            
          rev 
          = 
          "0.1.1-incubating" 
            
          conf 
          = 
          "*->default" 
           /> 
         
          org 
          = 
          "mysql" 
            
          name 
          = 
          "mysql-connector-java" 
            
          rev 
          = 
          "5.1.18" 
            
          conf 
          = 
          "*->default" 
          />

conf/gora.properties中需要写好数据库信息

 
   
 
     
      
       
           1 
         

           2 
         

           3 
         

           4 
         
 
        
          gora 
          . 
          sqlstore 
          . 
          jdbc 
          . 
          driver 
          = 
          com 
          . 
          mysql 
          . 
          jdbc 
          . 
          Driver 
         
 
          gora 
          . 
          sqlstore 
          . 
          jdbc 
          . 
          url 
          = 
          jdbc 
          : 
          mysql 
          : 
          //localhost:3306/nutch?createDatabaseIfNotExist=true 
         
 
          gora 
          . 
          sqlstore 
          . 
          jdbc 
          . 
          user 
          = 
          xxxxx 
         
 
          gora 
          . 
          sqlstore 
          . 
          jdbc 
          . 
          password 
          = 
          xxxxx 
         
 
      
 
     
   

conf/gora-sql-mapping.xml中替换两个primarykey对应的length，因为ID变为了utf8，所以数据变长了。

另外就是关于抓取的，配置conf/nutch-site.xml，加入爬虫信息：

 
   
 
     
      
       
           1 
         

           2 
         

           3 
         

           4 
         

           5 
         

           6 
         

           7 
         

           8 
         

           9 
         

           10 
         

           11 
         

           12 
         

           13 
         

           14 
         

           15 
         

           16 
         

           17 
         

           18 
         

           19 
         

           20 
         

           21 
         

           22 
         

           23 
         

           24 
         

           25 
         

           26 
         

           27 
         

           28 
         
 
        
          < 
          property 
          > 
         
 
          < 
          name 
          > 
          http 
          . 
          agent 
          . 
          name 
          < 
          / 
          name 
          > 
         
 
          < 
          value 
          > 
          Ade' 
          s 
            
          spider 
          < 
          / 
          value 
          > 
         
 
          < 
          / 
          property 
          > 
         

             
         
 
          < 
          property 
          > 
         
 
          < 
          name 
          > 
          http 
          . 
          accept 
          . 
          language 
          < 
          / 
          name 
          > 
         
 
          < 
          value 
          > 
          ja 
          - 
          jp 
          , 
            
          en 
          - 
          us 
          , 
          en 
          - 
          gb 
          , 
          en 
          ; 
          q 
          = 
          0.7 
          , 
          * 
          ; 
          q 
          = 
          0.3 
          < 
          / 
          value 
          > 
         
 
          < 
          description 
          > 
          Value  
          of  
          the 
           “ 
          Accept 
          - 
          Language” 
            
          request  
          header  
          field 
          . 
         
 
          This 
            
          allows  
          selecting  
          non 
          - 
          English  
          language  
          as 
            
          default 
            
          one  
          to 
            
          retrieve 
          . 
         
 
          It  
          is 
            
          a 
            
          useful  
          setting  
          for 
            
          search  
          engines  
          build  
          for 
            
          certain  
          national  
          group 
          . 
         
 
          < 
          / 
          description 
          > 
         
 
          < 
          / 
          property 
          > 
         

             
         
 
          < 
          property 
          > 
         
 
          < 
          name 
          > 
          parser 
          . 
          character 
          . 
          encoding 
          . 
          default 
          < 
          / 
          name 
          > 
         
 
          < 
          value 
          > 
          utf 
          - 
          8 
          < 
          / 
          value 
          > 
         
 
          < 
          description 
          > 
          The  
          character  
          encoding  
          to 
            
          fall  
          back  
          to 
            
          when  
          no  
          other  
          information 
         
 
          is 
            
          available 
          < 
          / 
          description 
          > 
         
 
          < 
          / 
          property 
          > 
         

             
         
 
          < 
          property 
          > 
         
 
          < 
          name 
          > 
          storage 
          . 
          data 
          . 
          store 
          . 
          class 
          < 
          / 
          name 
          > 
         
 
          < 
          value 
          > 
          org 
          . 
          apache 
          . 
          gora 
          . 
          sql 
          . 
          store 
          . 
          SqlStore 
          < 
          / 
          value 
          > 
         
 
          < 
          description 
          > 
          The  
          Gora  
          DataStore  
          class 
            
          for 
            
          storing  
          and 
            
          retrieving  
          data 
          . 
         
 
          Currently  
          the  
          following  
          stores  
          are  
          available 
          : 
           … 
          . 
         
 
          < 
          / 
          description 
          > 
         
 
          < 
          / 
          property 
          > 
         
 
      
 
     
   

由于还需要ivy下载一个sql connector与gora-sql，所以再ant编译一遍。

下面就可以开始抓取了：

 
           1 
         
           2 
         
           3 
         
           4 
         
          cd 
            
          . 
          / 
          runtime 
          / 
          local 
         
          mkdir 
            
          - 
          p 
            
          urls 
         
          echo 
            
          'http://www.promenade.me' 
            
          > 
            
          urls 
          / 
          seed 
          . 
          txt 
         
          bin 
          / 
          nutch  
          crawl  
          urls 
            
          - 
          depth 
            
          3 
            
          - 
          topN 
            
          5

有可能会遇到问题：

 
   
 
     
      
       
           1 
         

           2 
         

           3 
         

           4 
         

           5 
         

           6 
         

           7 
         

           8 
         

           9 
         

           10 
         

           11 
         

           12 
         
 
        
          [ 
          root 
          @ 
          AY131218101252507ad0Z  
          local 
          ] 
          # bin/nutch crawl urls -depth 3 -topN 5 
         
 
          InjectorJob 
          : 
            
          Using  
          class 
            
          org 
          .apache 
          .gora 
          .sql 
          .store 
          .SqlStore 
            
          as 
            
          the  
          Gora  
          storage  
          class 
          . 
         
 
          InjectorJob 
          : 
            
          total  
          number  
          of  
          urls  
          rejected  
          by  
          filters 
          : 
            
          0 
         
 
          InjectorJob 
          : 
            
          total  
          number  
          of  
          urls  
          injected  
          after  
          normalization  
          and 
            
          filtering 
          : 
            
          0 
         
 
          Exception  
          in 
            
          thread 
            
          "main" 
            
          java 
          .lang 
          .RuntimeException 
          : 
            
          job  
          failed 
          : 
            
          name 
          = 
          generate 
          : 
            
          null 
          , 
            
          jobid 
          = 
          job_local177967844_0002 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .util 
          .NutchJob 
          .waitForCompletion 
          ( 
          NutchJob 
          .java 
          : 
          54 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .crawl 
          .GeneratorJob 
          .run 
          ( 
          GeneratorJob 
          .java 
          : 
          199 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .crawl 
          .Crawler 
          .runTool 
          ( 
          Crawler 
          .java 
          : 
          68 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .crawl 
          .Crawler 
          .run 
          ( 
          Crawler 
          .java 
          : 
          152 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .crawl 
          .Crawler 
          .run 
          ( 
          Crawler 
          .java 
          : 
          250 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .hadoop 
          .util 
          .ToolRunner 
          .run 
          ( 
          ToolRunner 
          .java 
          : 
          65 
          ) 
         
 
               
          at 
            
          org 
          .apache 
          .nutch 
          .crawl 
          .Crawler 
          .main 
          ( 
          Crawler 
          .java 
          : 
          257 
          ) 
         
 
      
 
     
   

查看logs/hadoop.log会说是一个Utf8类传入了空值。网上有一篇Nutch2.0配置安装异常集锦，里面有对应的解释。
找到
nutch/src/java/org/apache/nutch/crawl/GeneratorReducer.java，然后看其100行左右：

 
           1 
         
           2 
         
           3 
         
           4 
         
           5 
         
           6 
         
           7 
         
           8 
         
           9 
         
           10 
         
          batchId 
            
          = 
            
          new 
            
          Utf8 
          ( 
          conf 
          . 
          get 
          ( 
          GeneratorJob 
          . 
          BATCH_ID 
          ) 
          ) 
          ; 
         
          //改为 
         
          int 
            
          randomSeed 
            
          = 
            
          Math 
          . 
          abs 
          ( 
          new 
            
          Random 
          ( 
          ) 
          . 
          nextInt 
          ( 
          ) 
          ) 
          ; 
         
          String 
            
          batchIdStr 
            
          = 
            
          ( 
          System 
          . 
          currentTimeMillis 
          ( 
          ) 
            
          / 
            
          1000 
          ) 
            
          + 
            
          "-" 
            
          + 
            
          randomSeed 
          ; 
         
          batchId 
            
          = 
            
          new 
            
          Utf8 
          ( 
            
          batchIdStr 
            
          ) 
          ; 
         
          //别忘了在最上面加上 
         
          import  
          java 
          . 
          util 
          . 
          Random 
          ;

之后需要重新编译一遍，然后再去抓取，又出现异常，查看hadoop.log:

 
   
 
     
      
       
           1 
         

           2 
         

           3 
         

           4 
         

           5 
         

           6 
         

           7 
         

           8 
         

           9 
         

           10 
         

           11 
         

           12 
         

           13 
         

           14 
         

           15 
         

           16 
         

           17 
         

           18 
         

           19 
         
 
        
          java 
          . 
          lang 
          . 
          Exception 
          : 
            
          java 
          . 
          lang 
          . 
          NoSuchMethodError 
          : 
            
          org 
          . 
          apache 
          . 
          gora 
          . 
          persistency 
          . 
          Persistent 
          . 
          getSchema 
          ( 
          ) 
          Lorg 
          / 
          apache 
          / 
          avro 
          / 
          Schema 
          ; 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapred 
          . 
          LocalJobRunner 
          $ 
          Job 
          . 
          run 
          ( 
          LocalJobRunner 
          . 
          java 
          : 
          354 
          ) 
         
 
          Caused  
          by 
          : 
            
          java 
          . 
          lang 
          . 
          NoSuchMethodError 
          : 
            
          org 
          . 
          apache 
          . 
          gora 
          . 
          persistency 
          . 
          Persistent 
          . 
          getSchema 
          ( 
          ) 
          Lorg 
          / 
          apache 
          / 
          avro 
          / 
          Schema 
          ; 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          gora 
          . 
          sql 
          . 
          store 
          . 
          SqlStore 
          . 
          put 
          ( 
          SqlStore 
          . 
          java 
          : 
          591 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          gora 
          . 
          mapreduce 
          . 
          GoraRecordWriter 
          . 
          write 
          ( 
          GoraRecordWriter 
          . 
          java 
          : 
          65 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapred 
          . 
          MapTask 
          $ 
          NewDirectOutputCollector 
          . 
          write 
          ( 
          MapTask 
          . 
          java 
          : 
          638 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapreduce 
          . 
          TaskInputOutputContext 
          . 
          write 
          ( 
          TaskInputOutputContext 
          . 
          java 
          : 
          80 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          nutch 
          . 
          crawl 
          . 
          InjectorJob 
          $ 
          UrlMapper 
          . 
          map 
          ( 
          InjectorJob 
          . 
          java 
          : 
          191 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          nutch 
          . 
          crawl 
          . 
          InjectorJob 
          $ 
          UrlMapper 
          . 
          map 
          ( 
          InjectorJob 
          . 
          java 
          : 
          88 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapreduce 
          . 
          Mapper 
          . 
          run 
          ( 
          Mapper 
          . 
          java 
          : 
          145 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapred 
          . 
          MapTask 
          . 
          runNewMapper 
          ( 
          MapTask 
          . 
          java 
          : 
          764 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapred 
          . 
          MapTask 
          . 
          run 
          ( 
          MapTask 
          . 
          java 
          : 
          364 
          ) 
         
 
               
          at  
          org 
          . 
          apache 
          . 
          hadoop 
          . 
          mapred 
          . 
          LocalJobRunner 
          $ 
          Job 
          $ 
          MapTaskRunnable 
          . 
          run 
          ( 
          LocalJobRunner 
          . 
          java 
          : 
          223 
          ) 
         
 
               
          at  
          java 
          . 
          util 
          . 
          concurrent 
          . 
          Executors 
          $ 
          RunnableAdapter 
          . 
          call 
          ( 
          Executors 
          . 
          java 
          : 
          471 
          ) 
         
 
               
          at  
          java 
          . 
          util 
          . 
          concurrent 
          . 
          FutureTask 
          $ 
          Sync 
          . 
          innerRun 
          ( 
          FutureTask 
          . 
          java 
          : 
          334 
          ) 
         
 
               
          at  
          java 
          . 
          util 
          . 
          concurrent 
          . 
          FutureTask 
          . 
          run 
          ( 
          FutureTask 
          . 
          java 
          : 
          166 
          ) 
         
 
               
          at  
          java 
          . 
          util 
          . 
          concurrent 
          . 
          ThreadPoolExecutor 
          . 
          runWorker 
          ( 
          ThreadPoolExecutor 
          . 
          java 
          : 
          1146 
          ) 
         
 
               
          at  
          java 
          . 
          util 
          . 
          concurrent 
          . 
          ThreadPoolExecutor 
          $ 
          Worker 
          . 
          run 
          ( 
          ThreadPoolExecutor 
          . 
          java 
          : 
          615 
          ) 
         
 
               
          at  
          java 
          . 
          lang 
          . 
          Thread 
          . 
          run 
          ( 
          Thread 
          . 
          java 
          : 
          701 
          ) 
         
 
      
 
     
   

突然想到在ivy/ivy.xml中有这样写道：

好吧，就在这个提示上面一行，修改一下gora-core的版本为0.2.1。再编译，再重来… 不出所料，又有问题，这回的错误是：

 
           1 
         
          Unknown  
          column 
            
          'batchId' 
            
          in 
            
          'field list'

麻利儿的检查一下数据库哪里有问题，这个batchId就应该是刚才utf8错误的那个batchId,在mysql表中加一个字段呗。

 
           1 
         
           2 
         
           3 
         
           4 
         
           5 
         
           6 
         
           7 
         
           8 
         
           9 
         
           10 
         
           11 
         
           12 
         
           13 
         
           14 
         
           15 
         
           16 
         
           17 
         
           18 
         
           19 
         
           20 
         
           21 
         
           22 
         
           23 
         
           24 
         
           25 
         
           26 
         
           27 
         
           28 
         
           29 
         
          CREATE  
          TABLE 
            
          ` 
          webpage 
          ` 
            
          ( 
         
          ` 
          id 
          ` 
            
          varchar 
          ( 
          767 
          ) 
            
          NOT 
            
          NULL 
          , 
         
          ` 
          headers 
          ` 
            
          blob 
          , 
         
          ` 
          text 
          ` 
            
          mediumtext  
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          status 
          ` 
            
          int 
          ( 
          11 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          markers 
          ` 
            
          blob 
          , 
         
          ` 
          parseStatus 
          ` 
            
          blob 
          , 
         
          ` 
          modifiedTime 
          ` 
            
          bigint 
          ( 
          20 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          score 
          ` 
            
          float 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          typ 
          ` 
            
          varchar 
          ( 
          32 
          ) 
            
          CHARACTER  
          SET  
          latin1  
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          baseUrl 
          ` 
            
          varchar 
          ( 
          767 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          content 
          ` 
            
          longblob 
          , 
         
          ` 
          title 
          ` 
            
          varchar 
          ( 
          2048 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          reprUrl 
          ` 
            
          varchar 
          ( 
          767 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          fetchInterval 
          ` 
            
          int 
          ( 
          11 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          prevFetchTime 
          ` 
            
          bigint 
          ( 
          20 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          inlinks 
          ` 
            
          mediumblob 
          , 
         
          ` 
          prevSignature 
          ` 
            
          blob 
          , 
         
          ` 
          outlinks 
          ` 
            
          mediumblob 
          , 
         
          ` 
          fetchTime 
          ` 
            
          bigint 
          ( 
          20 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          retriesSinceFetch 
          ` 
            
          int 
          ( 
          11 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          ` 
          protocolStatus 
          ` 
            
          blob 
          , 
         
          ` 
          signature 
          ` 
            
          blob 
          , 
         
          ` 
          metadata 
          ` 
            
          blob 
          , 
         
          ` 
          batchId 
          ` 
            
          varchar 
          ( 
          767 
          ) 
            
          DEFAULT 
            
          NULL 
          , 
         
          PRIMARY  
          KEY 
            
          ( 
          ` 
          id 
          ` 
          ) 
         
          ) 
            
          ENGINE 
          = 
          InnoDB 
         
          ROW_FORMAT 
          = 
          COMPRESSED 
         
          DEFAULT 
            
          CHARSET 
          = 
          utf8mb4 
          ;

好吧，再运行，居然..居然开始抓取了…

=========================

Setting up Nutch 2.1 with MySQL to handle UTF-8

These instructions assume Ubuntu 12.04 and Java 6 or 7 installed and JAVA_HOME configured.

Install MySQL Server and MySQL Client using the Ubuntu software center or sudo apt-get install mysql-server mysql-client at the command line.

As MySQL defaults to latin (are we still in the 1990s?) we need to edit sudo vi /etc/mysql/my.cnf and under [mysqld] add

innodb_file_format=barracuda
innodb_file_per_table=true
innodb_large_prefix=true
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
max_allowed_packet=500M

The innodb options are to help deal with the small primary key size restriction of MySQL. Restart your machine for the changes to take effect. The max_allowed_packet option is so you don’t run into issues as your database and the pages you store in it get larger.

Check to make sure MySQL is running by typing sudo netstat -tap | grep mysql and you should see something like

tcp 0 0 localhost:mysql *:* LISTEN

We need to set up the nutch database manually as the current Nutch/Gora/MySQL generated db schema defaults to latin. Log into mysql at the command line using your previously set up MySQL id and password type

mysql -u xxxxx -p

then in the MySQL editor type the following:

CREATE DATABASE nutch DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;

and enter followed by

use nutch;

and enter and then copy and paste the following altogether:

CREATE TABLE `webpage` ( `id` varchar(767) NOT NULL, `headers` blob, `text` mediumtext DEFAULT NULL, `status` int(11) DEFAULT NULL, `markers` blob, `parseStatus` blob, `modifiedTime` bigint(20) DEFAULT NULL, `score` float DEFAULT NULL, `typ` varchar(32) CHARACTER SET latin1 DEFAULT NULL, `baseUrl` varchar(767) DEFAULT NULL, `content` longblob, `title` varchar(2048) DEFAULT NULL, `reprUrl` varchar(767) DEFAULT NULL, `fetchInterval` int(11) DEFAULT NULL, `prevFetchTime` bigint(20) DEFAULT NULL, `inlinks` mediumblob, `prevSignature` blob, `outlinks` mediumblob, `fetchTime` bigint(20) DEFAULT NULL, `retriesSinceFetch` int(11) DEFAULT NULL, `protocolStatus` blob, `signature` blob, `metadata` blob, PRIMARY KEY (`id`) ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED DEFAULT CHARSET=utf8mb4;

Then type enter. You are done setting up the MySQL database for Nutch.

Set up Nutch 2.1 by downloading the latest version from http://www.apache.org/dyn/closer.cgi/nutch/. Untar the contents of the file you just downloaded and going forward we will refer to this folder as ${APACHE_NUTCH_HOME}.

From inside the nutch folder ensure the MySQL dependency for Nutch is available by editing the following in ${APACHE_NUTCH_HOME}/ivy/ivy.xml

default”/>

Edit the ${APACHE_NUTCH_HOME}/conf/gora.properties file either deleting or commenting out the Default SqlStore Properties using #. Then add the MySQL properties below replacing xxxxx with the user and password you set up when installing MySQL earlier.

###############################
# MySQL properties #
###############################
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=xxxxx
gora.sqlstore.jdbc.password=xxxxx

Edit the ${APACHE_NUTCH_HOME}/conf/gora-sql-mapping.xml file changing the length of the primarykey from 512 to 767 in both places.

Configure ${APACHE_NUTCH_HOME}/conf/nutch-site.xml to put in a name in the value field under http.agent.name. It can be anything but cannot be left blank. Add additional languages if you want (I have added Japanese ja-jp below) and utf-8 as default as well. You must specify Sqlstore.

http.agent.name
Your Nutch Spider

http.accept.language
ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3
Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.

parser.character.encoding.default
utf-8
The character encoding to fall back to when no other information
is available

storage.data.store.class
org.apache.gora.sql.store.SqlStore
The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ….

Install ant using the Ubuntu software center or sudo apt-get install ant at the command line.

From the command line cd to your nutch folder type ant runtime
This may take a few minutes to compile.

Start your first crawl by typing the lines below at the terminal (replace ‘http://nutch.apache.org/’ with whatever site you want to crawl):
cd ${APACHE_NUTCH_HOME}/runtime/local mkdir -p urls echo 'http://nutch.apache.org/' > urls/seed.txt bin/nutch crawl urls -depth 3 -topN 5

You can easily add more urls to search by hand in seed.txt if you want. For the crawl, depth is the number of rounds of generate/fetch/parse/update you want to do (not depth of links as you might think at first) and topNis the max number of links you want to actually parse each time. Note however Nutch keeps track of all links it encounters in the webpage table (it just limits the amount it actually parses to TopN so don’t be surprised by seeing many more rows in the webpage table than you expect by limiting with TopN).

Check your crawl results by looking at the webpage table in the nutch database.
mysql -u xxxxx -p use nutch; SELECT * FROM nutch.webpage;

You should see the results of your crawl (around 159 rows). It will be hard to read the columns so you may want to install MySQL Workbench via sudo apt-get install mysql-workbench and use that instead for viewing the data. You may also want to run the following SQL command select * from webpage where status = 2; to limit the rows in the webpage table to only urls that were actually parsed.

Set up and index with Solr If you are using Nutch 2.1 at this time you are into the bleeding edge and probably want the latest version of Solr 4.0 as well. Untar it to to $HOME/apache-solr-4.0.0-XXXX. This folder will be now referred to as ${APACHE_SOLR_HOME}.
Download http://nlp.solutions.asia/wp-content/uploads/2012/08/schema.xml and use it to replace ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.

From the terminal start solr:
cd ${APACHE_SOLR_HOME}/example java -jar start.jar

You can check this is running by opening http://localhost:8983/solr in your web browser.

Leave that terminal running and from a different terminal type the following:
cd ${APACHE_NUTCH_HOME}/runtime/local/ bin/nutch solrindex http://127.0.0.1:8983/solr/ -reindex

You can now run queries using Solr versus your crawled content. Openhttp://localhost:8983/solr/#/collection1/query and assuming you have crawled nutch.apache.org in the input box titled “q” you can do a search by inputting text:nutch and you should see something like this:

There remains a lot to configure to get a good web search going but you are at least started.

===========================================

Installing Nutch 2.2 with MySQL to handle UTF-8

Enough has changed from Nutch 2.1 to Nutch 2.2 to warrant an update to the installation instructions. These instructions assume Ubuntu 12.04 and Java 7 installed and JAVA_HOME configured.

Install MySQL Server and MySQL Client using the Ubuntu software center or sudo apt-get install mysql-server mysql-client at the command line.

As MySQL defaults to latin we need to edit sudo vi /etc/mysql/my.cnf and under [mysqld] add

innodb_file_format=barracuda
innodb_file_per_table=true
innodb_large_prefix=true
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
max_allowed_packet=500M

The innodb options are to help deal with the small primary key size restriction of MySQL. The character and collation settings are to handle Unicode correctly.The max_allowed_packet settings is optional and only necessary for very large sizes. Restart your machine for the changes to take effect.

Check to make sure MySQL is running by typing sudo netstat -tap | grep mysql and you should see something like

tcp 0 0 localhost:mysql *:* LISTEN

mysql -u xxxxx -p

then in the MySQL editor type the following:

CREATE DATABASE nutch DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;

and enter followed by

use nutch;

and enter and then copy and paste the following altogether:

CREATE TABLE `webpage` ( `id` varchar(767) NOT NULL, `headers` blob, `text` longtext DEFAULT NULL, `status` int(11) DEFAULT NULL, `markers` blob, `parseStatus` blob, `modifiedTime` bigint(20) DEFAULT NULL, `prevModifiedTime` bigint(20) DEFAULT NULL, `score` float DEFAULT NULL, `typ` varchar(32) CHARACTER SET latin1 DEFAULT NULL, `batchId` varchar(32) CHARACTER SET latin1 DEFAULT NULL, `baseUrl` varchar(767) DEFAULT NULL, `content` longblob, `title` varchar(2048) DEFAULT NULL, `reprUrl` varchar(767) DEFAULT NULL, `fetchInterval` int(11) DEFAULT NULL, `prevFetchTime` bigint(20) DEFAULT NULL, `inlinks` mediumblob, `prevSignature` blob, `outlinks` mediumblob, `fetchTime` bigint(20) DEFAULT NULL, `retriesSinceFetch` int(11) DEFAULT NULL, `protocolStatus` blob, `signature` blob, `metadata` blob, PRIMARY KEY (`id`) ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED DEFAULT CHARSET=utf8mb4;

Then type enter. You are done setting up the MySQL database for Nutch.

Set up Nutch 2.2 by downloading the apache-nutch-2.2-src.tar.gz version fromhttp://www.apache.org/dyn/closer.cgi/nutch/. Untar the contents of the file you just downloaded to a folder we will refer to going forward as ${APACHE_NUTCH_HOME}. In my particular case I prefer to use it with Eclipse so I untar it in the Eclipse workspace but this is not necessary.

From inside the nutch folder ensure the MySQL dependency for Nutch is available by editing the following in ${APACHE_NUTCH_HOME}/ivy/ivy.xml

change
default”/>
to
default”/>

and uncomment the gora-sql
default” />

and uncomment the mysql connector

default”/>

Edit the ${APACHE_NUTCH_HOME}/conf/gora-sql-mapping.xml file changing the length of the primarykey from 512 to 767 in both places.

http.agent.name
YourNutchSpider

parser.character.encoding.default
utf-8
The character encoding to fall back to when no other information
is available

storage.data.store.class
org.apache.gora.sql.store.SqlStore
The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ….

Install ant using the Ubuntu software center or sudo apt-get install ant at the command line.

From the command line cd to your nutch folder

If you are using Eclipse type ant eclipse. When that is finished start up Eclipse and go to File -> Import -> Existing Projects into Workspace -> Browse and add ${APACHE_NUTCH_HOME}. Go to the new project in the Eclipse project explorer and scroll down until you find ant.xml. Right click on ant.xml and select run as -> 1 ant build. This may take a little while to compile.

If you are not using Eclipse after you have cd to ${APACHE_NUTCH_HOME} simply type ant runtime
This may take a few minutes to compile.

Start your first crawl by typing the lines below at the terminal (replace ‘http://nutch.apache.org/’ with whatever site you want to crawl):
Inject a URL into the DB
cd ${APACHE_NUTCH_HOME}/runtime/local mkdir -p urls echo 'http://nutch.apache.org/' > urls/seed.txt

Start crawling (you will want to create your own script later but manually just to see what is happening type the following into the command line)
bin/nutch inject urls
bin/nutch generate -topN 20 bin/nutch fetch -all bin/nutch parse -all bin/nutch updatedb

Repeat the last four commands (generate, fetch, parse and updatedb) again.

For the generate command, topN is the max number of links you want to actually parse each time. The first time there is only one URL (the one we injected from seed.txt) but after that there are many more. Note, however, Nutch keeps track of all links it encounters in the webpage table. It just limits the amount it actually parses to TopN so don’t be surprised by seeing many more rows in the webpage table than you expect by limiting with TopN.

Check your crawl results by looking at the webpage table in the nutch database.
mysql -u xxxxx -p use nutch; SELECT * FROM nutch.webpage;

You should see the results of your crawl (around 320 rows). It will be hard to read the columns so you may want to install MySQL Workbench via sudo apt-get install mysql-workbench and use that instead for viewing the data. You may also want to run the following SQL command select * from webpage where status = 2; to limit the rows in the webpage table to only urls that were actually parsed.

You can easily add more urls to search by hand in seed.txt if you want and then use the command bin/nutch inject urls .

Set up and index with Solr If you are using Nutch 2.2 at this time you are into the bleeding edge and probably want the latest version of Solr 4 as well. Untar it to to $HOME/apache-solr-4.X.X-XXXX. This folder will be now referred to as ${APACHE_SOLR_HOME}.
Download this link and use it to replace ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.

From the terminal start solr:
cd ${APACHE_SOLR_HOME}/example java -jar start.jar

You can check this is running by opening http://localhost:8983/solr in your web browser. Select collection1 from the core selector.

Leave that terminal running and from a different terminal type the following:
cd ${APACHE_NUTCH_HOME}/runtime/local/ bin/nutch solrindex http://127.0.0.1:8983/solr/ -reindex

There remains a lot to configure to get a good web search going but you are at least started.

你可能感兴趣的:(nutch2.2.1安装部署)

Python 实现图片裁剪（附代码） | Python工具剑客阿良_ALiang
前言本文提供将图片按照自定义尺寸进行裁剪的工具方法，一如既往的实用主义。环境依赖ffmpeg环境安装，可以参考我的另一篇文章：windowsffmpeg安装部署_阿良的博客-CSDN博客本文主要使用到的不是ffmpeg，而是ffprobe也在上面这篇文章中的zip包中。ffmpy安装：pipinstallffmpy-ihttps://pypi.douban.com/simple代码不废话了，上代码
openssl+keepalived安装部署 _小亦_ 项目部署 keepalived openssl
文章目录OpenSSL安装下载地址编译安装修改系统配置版本Keepalived安装下载地址安装遇到问题安装完成配置文件keepalived运行检查运行状态查看系统日志修改服务service重新加载systemd检查配置文件语法错误OpenSSL安装下载地址考虑到后面设备可能没法连接到外网，所以采用安装包的方式进行部署，下载地址：https://www.openssl.org/source/old/
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
Linux学习-Ansible（一）丢爸 Linux linux 学习 ansible
环境-Rocky-Linux8.6安装部署Ansible#安装ansible[root@harboransible]#dnfinstall-yansible-core#查看安装信息[root@harboransible]#ansible-doc--versionansible-doc[core2.12.2]configfile=/root/ansible/ansible.cfgconfigured
DMDPC安装部署丷月亮是指路牌数据库
1.DMDPC架构DMDPC架构由计划生成节点SP、数据存储节点BP和元数据服务器节点MP三部分组成：1、SP对外提供分布式数据库服务，用户可以登录到任意一个SP节点，获得完整的数据库服务；2、BP负责存储数据，执行SP的调度指令并将执行结果返回给SP；3、MP负责存储元数据并向SP、BP提供元数据服务。本文是在1台Linux虚拟机上通过创建四个实例的方式进行集群搭建.2.DMDPC集群规划DMD
银河麒麟高级服务器操作系统V10SP2（X86）PXE服务端部署以及测试熊08 服务器运维 linux kylin
银河麒麟高级服务器操作系统V10SP2（X86）PXE服务端部署以及测试一系统环境1.1服务端/客户端系统版本1.2服务端环境信息二组件安装部署2.1安装软件包2.1.1uefi配置2.1.2legacy配置2.2dhcp服务配置2.2.1uefi配置2.2.2legacy配置2.3xinetd服务配置2.4http服务配置2.4.1uefi配置2.4.2legacy配置2.5grub配置2.5.
阿里云服务器Zookeeper单机安装部署（单机伪集群稍后更新）不减30斤不改名_TC
首先服务器系统版本配置如下：40e2c9e8ed734139ac1f99197a01dad8.pngzookeeper下载链接：http://mirror.bit.edu.cn/apache/zookeeper/38f2d0f359d7426faf44a6b8047f4a94.pngzookeeper版本很重要，我下载的是3.4.14版本，至于为什么不用新的，后面会讲到。下载后解压放置路径：028
【Docker】MongoDB 容器化部署轻云UC 运维部署 mongodb docker 数据库
MongoDBdocker容器化部署MongoDB标准软件基于BitnamiMongoDB构建。当前版本未7.0.5你可以通过轻云UC部署工具直接安装部署，也可以手动按如下文档操作配置文件地址:https://gitee.com/qingplus/qingcloud-platformQinghubStudio在线体验初始化一个新实例使用Docker容器网络，应用程序容器可以轻松访问容器内运行的Mo
01-Flink安装部署及入门案例（仅供学习），音视频时代你还不会NDK开发小猪佩琪962 2024年程序员学习 flink 学习大数据
先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年最新大数据全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵
如何在虚拟机中安装部署K8S？桃林春风一杯酒 kubernetes java docker
教程参考：centos7安装k8s1.28版本，基于科学-CSDN博客环境准备：准备三台机器，都做以下操作，或者只准备一个机器，最后再克隆两台。yum：换源，这是阿里云的源sudowget-O/etc/yum.repos.d/CentOS-Base.repohttp://mirrors.aliyun.com/repo/Centos-7.repo然后清缓存，建立元数据yumcleanallyumma
Linux - Linux安装部署Maven以及环境变量配置 MinggeQingchun Linux Maven linux maven 运维
1、下载Maven：前往官网下载最新版的Maven：Maven–DownloadApacheMaven2、解压Maven：将下载的压缩包解压到你希望安装Maven的目录，例如：tar-zxvfapache-maven-3.8.6-bin.tar.gz-C/opt/maven3、配置环境变量：编辑.bashrc或.bash_profile文件，添加以下内容：exportM2_HOME=/opt/ma
史上最全LVS+keepalived+nginx基于DR模式高可用安装部署徐兴强的博客负载均衡服务器 linux
史上最全LVS+keepalived+nginx基于DR模式高可用安装部署lvs+keepAlived→效率最高的负载均衡在大型网站中一般服务端会做集群，同时利用负载均衡器做负载均衡。这样有利于将大量的请求分散到各个服务器上，提升网站的响应速度。当然为了解决单点故障的问题，还会做热备份方案。这里演示利用LVS做负载均衡器，同时利用Keepalived保证其高可用，基于LVS的DR模式构建Nginx
helm2和helm3的区别及helm2的安装使用 Rami
1.从Helm2到Helm3helm是一个k8s应用包管理工具，负责云原生应用的安装部署和生命周期管理2.Helm2两个主要的组件Tiller：helm的服务端，部署在k8s里面的一个pod，通常在kube-system这个名称空间里，主要负责部署helmcharts，管理release，跟k8sAPIserver通信。helmClient：helm的客户端，主要负责从公有或者私有charts仓库
Aarch64架构OpenEuler服务器安装部署.NetCore站点東云dove .netcore linux 服务器 visual studio
下载对应的离线包（SDK或者runtime）在OpenEuler中找到/usr/share/并创建dotnet目录，并将下载的离线包解压添加环境变量exportPATH=$PATH:/usr/share/dotnetsource~/.bashrcdotnet--info将.NetCore站点发布为linux-arm64（旧版本的VS没有这个选项，我用的是VS2022发布的）进入对应的文件夹，运行d
隐语课程 SecretFlow的安装部署皓月雪学习笔记
SecretFlow安装：1.先开启电脑上的虚拟机功能，以管理员身份打开PowerShell并运行：dism.exe/online/enable-feature/featurename:VirtualMachinePlatform/all/norestart2.在Windows上安装WSL2，安装WSL|MicrosoftLearn3.将WSL2设置为默认版本，以管理员身份打开PowerShell
Docker无介绍快使用，docker数据卷Volumes挂载的使用（三）时间是一种毒药
问题背景上篇介绍了Docker指令，本章介绍一下数据卷VolumesDocker无介绍快使用，安装部署hello测试（一）Docker无介绍快使用，docker指令使用（二）Docker无介绍快使用，docker数据卷Volumes挂载的使用（三）Docker数据卷Volumes概述：数据卷是宿主机中的一个目录或文件，当容器目录和数据卷目录绑定后，对方的修改会立即同步一个数据卷可以被多个容器同时挂
MySQL gh-ost DDL 变更工具 Bing@DBA MySQL mysql 数据库
文章目录1.MDL锁介绍2.变更工具3.gh-ost原理解析4.安装部署5.操作演示5.1.重点参数介绍5.2.执行变更5.3.动态控制6.风险提示1.MDL锁介绍MySQL的锁可以分为四类：MDL锁、表锁、行锁、GAP锁，其中除了MDL锁是在Server层加的之外，其它三种都是在InnoDB层加的。下面主要介绍一下：MDL元数据锁，主要作用就是维护DDL过程中数据的安全性&正确性。当对一个表进行
【网络架构】ELK 云计算稿手网络架构架构 elk
目录一、ELK介绍1.1ELK是什么1.2Elasticsearch1.3Logstash1.4Kibana1.5为什么使用ELK二、ELK原理2.1工作原理2.2ELK的应用架构图三、ELK的安装部署3.1环境部署3.2前期环境3.3安装Elasticsearch3.3.1修改配置3.3.2查看节点信息3.4安装Logstash3.4.1安装服务3.4.2使用Logstash3.4.3对接ela
powerdns mysql_PowerDNS安装部署泰坦V powerdns mysql
centos安装epel检验epel[[email protected]]#lsCentOS-Base.repoCentOS-fasttrack.repoCentOS-Vault.repoCentOS-CR.repoCentOS-Media.repoepel.repoCentOS-Debuginfo.repoCentOS-Sources.repoepel-testing.repo安装
WebService环境安装+编写+发布部署+调用时光书签 c#专题 c#webservice iis
内容目录：1.WebService环境安装2.WebService编写3.WebService的发布及部署4.WebService调用一、WebService环境安装部署IIS环境。win7中打开控制面板--->程序和功能--->打开或关闭Window功能，按以下安装好，就部署好IIS环境了。二、WebService编写1.我使用的是VS2019.首先，在解决方案中，右击解决方案->添加->新建项
Python 全栈系列266 Kafka服务的Docker搭建 yukai08008 全栈 python kafka docker
说明在大量数据处理任务下的缓存与分发这个算是来自顾同学的助攻+1，我有点java绝缘体的体质，碰到和java相关的安装部署总会碰到点奇怪的问题，不过现在已经搞定了。测试也接近了kafka官方标称的性能。考虑到网络、消息的大小等因素，可以简单认为kafka的速度是10万/秒级的。本次文章的目的是：1搭建一个平时工作中常用的队列服务2方便自己或者其他同事再次搭建内容1搭建过程共要搭建两个服务：zook
国产海量数据库安装部署文档人生不过大梦一场 db 数据库 windows linux
本系统为麒麟V10系统,安装用户为普通用户给予sudo权限，主机内存16G（8G安装没装起来）#默认端口5432对外端口数据库对外提供访问5433对内端口Basebackup与复制流协议系统环境设置#关闭防火墙（二者执行一个）sudosystemctlstopfirewalld.service&&sudosystemctldisablefirewalld&&sudosystemctlstatusf
数据仓库之【商品订单数据数仓】10：数据可视化工具：Zeppelin安装部署、Zeppelin使用做一个有趣的人Zz hadoop hive 数据仓库 hive big data
一、数据可视化数据可视化这块不是项目的重点，不过为了让大家能有一个更加直观的感受，我们可以选择一些现成的数据可视化工具实现。咱们前面分析过，想要查询hive中的数据可以使用hue，不过hue无法自动生成图表。所以我们可以考虑使用Zeppelin针对一些复杂的图表，可以选择定制开发，使用echarts、finebi组件实现。二、Zeppelin安装部署注意：不要使用Zeppelin0.8.2版本，这
K8S安装部署一心只为学 kubernetes linux 容器
一、准备工作在vmware中安装，创建三台centos服务器主机ip配置master192.168.42.1884核，6g内存，40g硬盘node1192.168.42.1892核，2g内存，20g硬盘node2192.168.42.1902核，2g内存，20g硬盘二、环境配置1、修改hosts配置（所有节点执行）vim/etc/hosts192.168.42.188master192.168.4
K8S笔记：Kubernetes安装部署(CentOS7) arksea devops kubernetes k8s 运维
目录文章目录目录一、使用minikube启动minikube的LoadBalancer支持指定minikube自带docker的docker0网桥网段minikube使用本地image二、使用kubeadm手工安装1、安装前准备2、允许iptables检查桥接流量3、安装kubeletkubeadmkubectl4、kubelet使用cgroup驱动5、确认主机名/域名可被访问6、关闭Swap或忽
centos安装k8s集群（kubeadm方式） MssGuo kubernetes centos kubernetes Linux运维
目录前言生产环境部署k8s集群的两种方式服务器初始化、环境准备使用kubeadm安装k8s（本篇讲解使用kubeadm安装k8s）后记-k8s使用外部etcd集群前言环境：centos7.9docker-ce-20.10.9kubernetes-versionv1.22.17本篇来讲解如何在centos下安装部署k8s集群生产环境部署k8s集群的两种方式kubeadmkubeadm是一个工具，提供
MySQL中如何用全库备份恢复单张表 As before@ mysql 数据库运维
一、概述模拟场景：下面模拟逻辑备份与物理备份两种方式恢复单表方法，利用完整备份+binlog恢复单表，某天某张表意外删除，如何从全备中恢复单表？二、安装部署2.1Mysql安装（我的模拟环境数据版本为：5.7.36，此处省略相关操作）2.2xtrabackup-2.4.24安装（物理备份）官网地址：https://www.percona.com/downloads/Percona-XtraBack
gitlab使用 happy_king_zi 运维自动化代码库 CICD gitlab 运维 git ci/cd
一、gitlab的搭建一）gitlab安装部署、升级gitlab安装部署、备份升级：https://www.cnblogs.com/straycats/p/7707359.html二）gitlab软件目录结构（yum安装）1、主要的四个目录1/opt/gitlab/##主目录2/etc/gitlab/##放置配置文件3/var/opt/gitlab/##各个组件4/var/log/gitlab/#
项目部署实现开机自动启动tomcat和谷歌浏览器并全屏访问自己部署的项目 Mystery、Yu 项目部署流程 web项目部署开机启动tomcat 开机全屏打开谷歌浏览器
项目部署实现开机自动启动tomcat和谷歌浏览器并全屏访问自己部署的项目安装部署流程：1：jdk安装（环境变量配置）JAVA_HOMEPATHclasspath2：tomcate安装（环境变量设置）CATALINA_HOME（tomcate所在目录）CATALINA_BASE（tomcate所在目录）Path(tomcatebin所在目录)3：整合Tomcat到服务器中注意是配置jdk和tomca
Ambari安装部署教程 Aries_Chen_CSDN 大数据 hadoop ambari
前言之前尝试过使用Ambri安装HDFS集群，整个搭建的流程应该是记录的比较详细的，重新把资料整理了一下发布到博客，希望可以帮到有需要的人吧。环境准备以下操作使用root用户执行：1.关闭防火墙chkconfigiptablesoffserviceiptablesstopchkconfig--listiptables2.关闭SElinuxvi/etc/sysconfig/selinux将SELIN
springmvc 下 freemarker页面枚举的遍历输出杨白白 enum freemarker
spring mvc freemarker 中遍历枚举 1枚举类型有一个本地方法叫values（），这个方法可以直接返回枚举数组。所以可以利用这个遍历。 enum public enum BooleanEnum { TRUE(Boolean.TRUE, "是"), FALSE(Boolean.FALSE, "否");
实习简要总结 byalias 工作
来白虹不知不觉中已经一个多月了，因为项目还在需求分析及项目架构阶段，自己在这段时间都是在学习相关技术知识，现在对这段时间的工作及学习情况做一个总结：（1）工作技能方面大体分为两个阶段，Java Web 基础阶段和Java EE阶段 1）Java Web阶段在这个阶段，自己主要着重学习了 JSP, Servlet, JDBC, MySQL，这些知识的核心点都过了一遍，也
Quartz——DateIntervalTrigger触发器 eksliang quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208559 一.概述 simpleTrigger 内部实现机制是通过计算间隔时间来计算下次的执行时间，这就导致他有不适合调度的定时任务。例如我们想每天的 1：00AM 执行任务，如果使用 SimpleTrigger，间隔时间就是一天。注意这里就会有一个问题，即当有 misfired 的任务并且恢复执行时，该执行时间
Unix快捷键 18289753290 unix Unix；快捷键;
复制，删除，粘贴： dd:删除光标所在的行 &nbs
获取Android设备屏幕的相关参数酷的飞上天空 android
包含屏幕的分辨率以及屏幕宽度的最大dp 高度最大dp TextView text = (TextView)findViewById(R.id.text); DisplayMetrics dm = new DisplayMetrics(); text.append("getResources().ge
要做物联网？先保护好你的数据蓝儿唯美数据
根据Beecham Research的说法，那些在行业中希望利用物联网的关键领域需要提供更好的安全性。在Beecham的物联网安全威胁图谱上，展示了那些可能产生内外部攻击并且需要通过快速发展的物联网行业加以解决的关键领域。 Beecham Research的技术主管Jon Howes说：“之所以我们目前还没有看到与物联网相关的严重安全事件，是因为目前还没有在大型客户和企业应用中进行部署，也就
Java取模（求余）运算随便小屋 java
整数之间的取模求余运算很好求，但几乎没有遇到过对负数进行取模求余，直接看下面代码： /** * * @author Logic * */ public class Test { public static void main(String[] args) { // TODO A
SQL注入介绍 aijuans sql注入
二、SQL注入范例这里我们根据用户登录页面 <form action="" > 用户名：<input type="text" name="username"><br/> 密码：<input type="password" name="passwor
优雅代码风格 aoyouzi 代码
总结了几点关于优雅代码风格的描述：代码简单：不隐藏设计者的意图，抽象干净利落，控制语句直截了当。接口清晰：类型接口表现力直白，字面表达含义，API 相互呼应以增强可测试性。依赖项少：依赖关系越少越好，依赖少证明内聚程度高，低耦合利于自动测试，便于重构。没有重复：重复代码意味着某些概念或想法没有在代码中良好的体现，及时重构消除重复。战术分层：代码分层清晰，隔离明确，
布尔数组百合不是茶 java 布尔数组
androi中提到了布尔数组; 布尔数组默认的是false, 并且只会打印false或者是true 布尔数组的例子; 根据字符数组创建布尔数组 char[] c = {'p','u','b','l','i','c'}; //根据字符数组的长度创建布尔数组的个数 boolean[] b = new bool
web.xml之welcome-file-list、error-page bijian1013 java web.xml servlet error-page
welcome-file-list 1.定义： <welcome-file-list> <welcome-file>login.jsp</welcome> </welcome-file-list> 2.作用：用来指定WEB应用首页名称。 error-page1.定义： <error-page&g
richfaces 4 fileUpload组件删除上传的文件 sunjing clear Richfaces 4 fileupload
页面代码 <h:form id="fileForm"> <rich:
技术文章备忘 bit1129 技术文章
Zookeeper http://wenku.baidu.com/view/bab171ffaef8941ea76e05b8.html http://wenku.baidu.com/link?url=8thAIwFTnPh2KL2b0p1V7XSgmF9ZEFgw4V_MkIpA9j8BX2rDQMPgK5l3wcs9oBTxeekOnm5P3BK8c6K2DWynq9nfUCkRlTt9uV
org.hibernate.hql.ast.QuerySyntaxException: unexpected token: on near line 1解决方案白糖_ Hibernate
文章摘自：http://blog.csdn.net/yangwawa19870921/article/details/7553181 在编写HQL时，可能会出现这种代码： select a.name,b.age from TableA a left join TableB b on a.id=b.id 如果这是HQL，那么这段代码就是错误的，因为HQL不支持
sqlserver按照字段内容进行排序 bozch 按照内容排序
在做项目的时候，遇到了这样的一个需求：从数据库中取出的数据集，首先要将某个数据或者多个数据按照地段内容放到前面显示，例如:从学生表中取出姓李的放到数据集的前面； select * fro
编程珠玑-第一章-位图排序 bylijinnan java 编程珠玑
import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.Random; public class BitMapSearch {
Java关于==和equals chenbowen00 java
关于==和equals概念其实很简单，一个是比较内存地址是否相同，一个比较的是值内容是否相同。虽然理解上不难，但是有时存在一些理解误区，如下情况： 1、 String a = "aaa"; a=="aaa"; ==> true 2、 new String("aaa")==new String("aaa
[IT与资本]软件行业需对外界投资热情保持警惕 comsci it
我还是那个看法,软件行业需要增强内生动力,尽量依靠自有资金和营业收入来进行经营,避免在资本市场上经受各种不同类型的风险,为企业自主研发核心技术和产品提供稳定,温和的外部环境... 如果我们在自己尚未掌握核心技术之前,企图依靠上市来筹集资金,然后使劲往某个领域砸钱,然
oracle 数据块结构 daizj oracle 块数据块块结构行目录
oracle 数据块是数据库存储的最小单位，一般为操作系统块的N倍。其结构为：块头－－〉空行－－〉数据，其实际为纵行结构。块的标准大小由初始化参数DB_BLOCK_SIZE指定。具有标准大小的块称为标准块（Standard Block）。块的大小和标准块的大小不同的块叫非标准块（Nonstandard Block）。同一数据库中，Oracle9i及以上版本支持同一数据库中同时使用标
github上一些觉得对自己工作有用的项目收集 dengkane github
github上一些觉得对自己工作有用的项目收集技能类 markdown语法中文说明回到顶部全文检索 elasticsearch bigdesk elasticsearch管理插件回到顶部 nosql mapdb 支持亿级别map, list, 支持事务. 可考虑做为缓存使用 C
初二上学期难记单词二 dcj3sjt126com english word
dangerous 危险的 panda 熊猫 lion 狮子 elephant 象 monkey 猴子 tiger 老虎 deer 鹿 snake 蛇 rabbit 兔子 duck 鸭 horse 马 forest 森林 fall 跌倒；落下 climb 爬；攀登 finish 完成；结束 cinema 电影院；电影 seafood 海鲜；海产食品 bank 银行
8、mysql外键(FOREIGN KEY)的简单使用 dcj3sjt126com mysql
一、基本概念 1、MySQL中“键”和“索引”的定义相同，所以外键和主键一样也是索引的一种。不同的是MySQL会自动为所有表的主键进行索引，但是外键字段必须由用户进行明确的索引。用于外键关系的字段必须在所有的参照表中进行明确地索引，InnoDB不能自动地创建索引。 2、外键可以是一对一的，一个表的记录只能与另一个表的一条记录连接，或者是一对多的，一个表的记录与另一个表的多条记录连接。 3、如
java循环标签 Foreach shuizhaosi888 标签 java循环 foreach
1. 简单的for循环 public static void main(String[] args) { for (int i = 1, y = i + 10; i < 5 && y < 12; i++, y = i * 2) { System.err.println("i=" + i + " y="
Spring Security（05）——异常信息本地化 234390216 exception Spring Security 异常信息本地化
异常信息本地化 Spring Security支持将展现给终端用户看的异常信息本地化，这些信息包括认证失败、访问被拒绝等。而对于展现给开发者看的异常信息和日志信息（如配置错误）则是不能够进行本地化的，它们是以英文硬编码在Spring Security的代码中的。在Spring-Security-core-x
DUBBO架构服务端告警Failed to send message Response javamingtingzhao 架构 DUBBO
废话不多说，警告日志如下，不知道有哪位遇到过，此异常在服务端抛出(服务器启动第一次运行会有这个警告)，后续运行没问题，找了好久真心不知道哪里错了。 WARN 2015-07-18 22:31:15,272 com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(84)
JS中Date对象中几个用法 leeqq JavaScript Date 最后一天
近来工作中遇到这样的两个需求 1. 给个Date对象，找出该时间所在月的第一天和最后一天 2. 给个Date对象，找出该时间所在周的第一天和最后一天需求1中的找月第一天很简单，我记得api中有setDate方法可以使用使用setDate方法前，先看看getDate var date = new Date(); console.log(date); // Sat J
MFC中使用ado技术操作数据库你不认识的休道人 sql mfc
1.在stdafx.h中导入ado动态链接库 #import"C:\Program Files\Common Files\System\ado\msado15.dll" no_namespace rename("EOF","end")2.在CTestApp文件的InitInstance()函数中domodal之前写::CoIniti
Android Studio加速 rensanning android studio
Android Studio慢、吃内存！启动时后会立即通过Gradle来sync & build工程。（1）设置Android Studio a) 禁用插件 File -> Settings... Plugins 去掉一些没有用的插件。比如：Git Integration、GitHub、Google Cloud Testing、Google Cloud
各数据库的批量Update操作 tomcat_oracle java oracle sql mysql sqlite
MyBatis的update元素的用法与insert元素基本相同，因此本篇不打算重复了。本篇仅记录批量update操作的 sql语句，懂得SQL语句，那么MyBatis部分的操作就简单了。　　注意：下列批量更新语句都是作为一个事务整体执行，要不全部成功，要不全部回滚。 MSSQL的SQL语句　WITH R AS（　　SELECT 'John' as name, 18 as
html禁止清除input文本输入缓存 xp9802 input
多数浏览器默认会缓存input的值，只有使用ctl+F5强制刷新的才可以清除缓存记录。如果不想让浏览器缓存input的值，有2种方法：方法一：在不想使用缓存的input中添加 autocomplete="off"; eg: <input type="text" autocomplete="off" name