waveeee

Heritrix man

Heritrix使用的初步总结

http://jason823.iteye.com/blog/84206

http://blog.sina.com.cn/s/blog_4ef8aa560100bxop.html

1. Introduction

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler.

This document explains how to create, configure and run crawls using Heritrix. It is intended for users of the software and presumes that they possess at least a general familiarity with the concept of web crawling.

For a general overview on Heritrix, see An Introduction to Heritrix.

If you want to build Heritrix from source or if you'd like to make contributions and would like to know about contribution conventions, etc., see instead the Developer's Manual.

2. Installing and running Heritrix

This chapter will explain how to set up Heritrix.

Because Heritrix is a pure Java program it can (in theory anyway) be run on any platform that has a Java 5.0 VM. However we are only committed to supporting its operation on Linux and so this chapter only covers setup on that platform. Because of this, what follows assumes basic Linux administration skills. Other chapters in the user manual are platform agnostic.

This chapter also only covers installing and running the prepackaged binary distributions of Heritrix. For information about downloading and compiling the source see the Developer's Manual.

2.1. Obtaining and installing Heritrix

The packaged binary can be downloaded from the project's sourceforge home page. Each release comes in four flavors, packaged as .tar.gz or .zip and including source or not.

For installation on Linux get the file heritrix-?.?.?.tar.gz (where ?.?.? is the most recent version number).

The packaged binary comes largely ready to run. Once downloaded it can be untarred into the desired directory.

  % tar xfz heritrix-?.?.?.tar.gz

Once you have downloaded and untarred the correct file you can move on to the next step.

2.1.1. System requirements

2.1.1.1. Java Runtime Environment

The Heritrix crawler is implemented purely in Java. This means that the only true requirement for running it is that you have a JRE installed (Building will require a JDK).

The Heritrix crawler, since release 1.10.0, makes use of Java 5.0 features so your JRE must be at least of a 5.0 (1.5.0+) pedigree.

We currently include all of the free/open source third-party libraries necessary to run Heritrix in the distribution package. See dependencies for the complete list (Licenses for all of the listed libraries are listed in the dependencies section of the raw project.xml at the root of the src download or on Sourceforge).

2.1.1.1.1. Installing Java

If you do not have Java installed you can download Java from:

Sun -- java.sun.com
IBM -- www.ibm.com/java

2.1.1.2. Hardware

A default java heap is 256MB RAM, which is usually suitable for crawls that range over hundreds of hosts. Assign more -- see Section 2.2.1.3, “JAVA_OPTS” for how -- of your available RAM to the heap if you are crawling thousands of hosts or experience Java out-of-memory problems.

2.1.1.3. Linux

The Heritrix crawler has been built and tested primarily on Linux. It has seen some informal use on Macintosh, Windows 2000 and Windows XP, but is not tested, packaged, nor supported on platforms other than Linux at this time.

2.2. Running Heritrix

To run Heritrix, first do the following:

  % export HERITRIX_HOME=/PATH/TO/BUILT/HERITRIX

...where $HERITRIX_HOME is the location of your untarred heritrix.?.?.?.tar.gz.

Next run:

  % cd $HERITRIX_HOME
  % chmod u+x $HERITRIX_HOME/bin/heritrix
  % $HERITRIX_HOME/bin/heritrix --help

This should give you usage output like the following:

  Usage: heritrix --help
  Usage: heritrix --nowui ORDER.XML
  Usage: heritrix [--port=#] [--run] [--bind=IP,IP...] --admin=LOGIN:PASSWORD \
      [ORDER.XML]
  Usage: heritrix [--port=#] --selftest[=TESTNAME]
  Version: @VERSION@
  Options:
   -b,--bind       Comma-separated list of IP addresses or hostnames for web
                   server to listen on.  Set to / to listen on all available
                   network interfaces.  Default is 127.0.0.1.
   -a,--admin      Login and password for web user interface administration.
                   Required (unless passed via the 'heritrix.cmdline.admin'
                   system property).  Pass value of the form 'LOGIN:PASSWORD'.
   -h,--help       Prints this message and exits.
   -n,--nowui      Put heritrix into run mode and begin crawl using ORDER.XML. Do
                   not put up web user interface.
   -p,--port       Port to run web user interface on.  Default: 8080.
   -r,--run        Put heritrix into run mode. If ORDER.XML begin crawl.
   -s,--selftest   Run the integrated selftests. Pass test name to test it only
                   (Case sensitive: E.g. pass 'Charset' to run charset selftest).
  Arguments:
   ORDER.XML       Crawl order to run.

Launch the crawler with the UI enabled by doing the following:

  % $HERITRIX_HOME/bin/heritrix --admin=LOGIN:PASSWORD

This will start up Heritrix printing out a startup message that looks like the following:

  [b116-dyn-60 619] heritrix-0.4.0 > ./bin/heritrix
  Tue Feb 10 17:03:01 PST 2004 Starting heritrix...
  Tue Feb 10 17:03:05 PST 2004 Heritrix 0.4.0 is running.
  Web UI is at: http://b116-dyn-60.archive.org:8080/admin
  Login and password: admin/letmein

Note

By default, as of version 1.10.x, Heritrix binds to localhost only. This means that you need to be running Heritrix on the same machine as your browser to access the Heritrix UI. Read about the--bind argument above if you need to access the Heritrix UI over a network.

See Section 3, “Web based user interface” and Section 4, “A quick guide to running your first crawl job” to get your first crawl up and running.

2.2.1. Environment variables

Below are environment variables that effect Heritrix operation.

2.2.1.1. HERITRIX_HOME

Set this environment variable to point at the Heritrix home directory. For example, if you've unpacked Heritrix in your home directory and Heritrix is sitting in the heritrix-1.0.0 directory, you'd set HERITRIX_HOME as follows. Assuming your shell is bash:

  % export HERITRIX_HOME=~/heritrix-1.0.0

If you don't set this environment variable, the Heritrix start script makes a guess at the home for Heritrix. It doesn't always guess correctly.

2.2.1.2. JAVA_HOME

This environment variable may already exist. It should point to the Java installation on the machine. An example of how this might be set (assuming your shell is bash):

  % export JAVA_HOME=/usr/local/java/jre/

2.2.1.3. JAVA_OPTS

Pass options to the Heritrix JVM by populating the JAVA_OPTS environment variable with values. For example, if you want to have Heritrix run with a larger heap, say 512 megs, you could do either of the following (assuming your shell is bash):

  % export JAVA_OPTS="-Xmx512M"
% $HERITRIX_HOME/bin/heritrix

Or, you could do it all on the one line as follows:

  % JAVA_OPTS="-Xmx512m" $HERITRIX_HOME/bin/heritrix

2.2.2. System properties

Below we document the system properties passed on the command-line that can influence Heritrix's behavior. If you are using the /bin/heritrix script to launch Heritrix you may have to edit it to change/set these properties or else pass them as part of JAVA_OPTS.

2.2.2.1. heritrix.properties

Set this property to point at an alternate heritrix.properties file -- e.g.: -Dheritrix.properties=/tmp/alternate.properties -- when you want heritrix to use a properties file other than that found at conf/heritrix.properties.

2.2.2.2. heritrix.context

Provide an alternate context for the Heritrix admin UI. Usually the admin webapp is mounted on root: i.e. '/'.

2.2.2.3. heritrix.development

Set this property when you want to run the crawler from eclipse. This property takes no arguments. When this property is set, the conf and webapps directories will be found in their development locations and startup messages will show on the text console (standard out).

2.2.2.4. heritrix.home

Where heritrix is homed usually passed by the heritrix launch script.

2.2.2.5. heritrix.out

Where stdout/stderr are sent, usually heritrix_out.log and passed by the heritrix launch script.

2.2.2.6. heritrix.version

Version of heritrix set by the heritrix build into heritrix.properties.

2.2.2.7. heritrix.jobsdir

Where to drop heritrix jobs. Usually empty. Default location is ${HERITRIX_HOME}/jobs.

2.2.2.8. heritrix.conf

Specify an alternate configuration directory other than the default $HERITRIX_HOME/conf.

2.2.2.9. heritrix.cmdline

This set of system properties are rarely used. They are for use when Heritrix has NOT been started from the command-line -- e.g. its been embedded in another application -- and the startup configuration that is set usually by command-line options, instead needs to be done via system properties alone.

2.2.2.9.1. heritrix.cmdline.admin

Value is a colon-delimited String user name and password for admin GUI

2.2.2.9.2. heritrix.cmdline.nowui

If set to true, will prevent embedded web server crawler control interface from starting up.

2.2.2.9.3. heritrix.cmdline.order

If set to to a string file path, will use the specified crawl order XML file.

2.2.2.9.4. heritrix.cmdline.port

Value is the port to run the GUI on.

2.2.2.9.5. heritrix.cmdline.run

If true, crawler is set into run mode on startup.

2.2.2.10. javax.net.ssl.trustStore

Heritrix has its own trust store at conf/heritrix.cacerts that it uses if the FetcherHTTP is configured to use a trust level of other than open (open is the default setting). In the unusual case where you'd like to have Heritrix use an alternate truststore, point at the alternate by supplying the JSSE javax.net.ssl.trustStore property on the command line: e.g.

2.2.2.11. java.util.logging.config.file

The Heritrix conf directory includes a file named heritrix.properties. A section of this file specifies the default Heritrix logging configuration. To override these settings, point java.util.logging.config.file at a properties file with an alternate logging configuration. Below we reproduce the default heritrix.propertiesfor reference:

  # Basic logging setup; to console, all levels
handlers= java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level= ALL

# Default global logging level: only warnings or higher
.level= WARNING

# currently necessary (?) for standard logs to work
crawl.level= INFO
runtime-errors.level= INFO
uri-errors.level= INFO
progress-statistics.level= INFO
recover.level= INFO

# HttpClient is too chatty... only want to hear about severe problems
org.apache.commons.httpclient.level= SEVERE

Here's an example of how you might specify an override:

  % JAVA_OPTS="-Djava.util.logging.config.file=heritrix.properties" \
      ./bin/heritrix --no-wui order.xml

Alternatively you could edit the default file.

2.2.2.12. java.io.tmpdir

Specify an alternate tmp directory. Default is /tmp.

2.2.2.13. com.sun.management.jmxremote.port

What port to start up JMX Agent on. Default is 8849. See also the environment variable JMX_PORT.

2.3. Security Considerations

The crawler is a large and active network application which presents security implications, both local to the machine where it operates, and remotely for machines it contacts.

2.3.1. Local to the Crawling Machine

It is important to recognize that the web UI (discussed in Section 3, “Web based user interface”) and JMX agent (discussed in Section 9.5, “Remote Monitoring and Control”) allow remote control of the crawler process in ways that might potentially disrupt a crawl, change the crawler's behavior, read or write locally-accessible files, and perform or trigger other actions in the Java VM or local machine.

The administrative login and password are currently only a very mild protection against unauthorized access, unless you take additional steps to prevent access to the crawler machine. We strongly recommend some combination of the following practices:

First, use network configuration tools, like a firewall, to only allow trusted remote hosts to contact the web UI and, if applicable, JMX agent ports. (The default web UI port is 8080; JMX is 8849.)

Second, use a strong and unique username/password combination to secure the web UI and JMX agent. However, keep in mind that the default administrative web server uses plain HTTP for access, so these values are susceptible to eavesdropping in transit if network links between your browser and the crawler are compromised. (An upcoming update will change the default to HTTPS.) Also, setting the username/password on the command-line may result in their values being visible to other users of the crawling machine, and they are additionally printed to the console and heritrix_out.log for operator reference.

Third, run the crawler as a user with the minimum privileges necessary for its operation, so that in the event of unauthorized access to the web UI or JMX agent, the potential damage is limited.

Successful unauthorized access to the web UI or JMX agent could trivially end or corrupt a crawl, or change the crawler's behavior to be a nuisance to other network hosts. By adjusting configuration paths, unauthorized access could potentially delete, corrupt, or replace files accessible to the crawler process, and thus cause more extensive problems on the crawler machine.

Another potential risk is that some worst-case or maliciously-crafted crawled content might, in combination with crawler bugs, disrupt the crawl or other files or operations of the local system. For example, in the past, even without malicious intent, some rich-media content has caused runaway memory use in 3rd-party libraries used by the crawler, resulting in a memory-exhaustion condition that can stop or corrupt a crawl in progress. Similarly, atypical input patterns have at times caused runaway CPU use by crawler link-extraction regular expressions, severely slowing crawls. Crawl operators should monitor their crawls closely and stay informed via the project discussion list and bug database for any newly discovered similar bugs.

3. Web based user interface

After Heritrix has been launched from the command line, the web based user interface (WUI) becomes accessible.

The URI to access the WUI is printed on the text console from which the program was launched (typicallyhttp://<host>:8080/admin/).

The WUI is password protected. There is no default login for access; one must be specified using either the '-a'/'--admin' command-line option at startup or by setting the 'heritrix.cmdline.admin' system property. The currently valid username and password combination will be printed out to the console, along with the access URL for the WUI, at startup.

The WUI can be accessed via any web browser. While we've endeavoured to make certain that it functions in all recent browsers, Mozilla 5 or newer is recommended. IE 6 or newer should also work without problems.

The initial login page takes the username/password combination discussed above. Logins will time out after a period of non-use.

Caution

By default, communication with the WUI is not done over an encrypted HTTPS connection! Passwords will be submitted over the network in plain text, so you should take additional steps to protect your crawler administrative interface from unauthorized access, as described in theSection 2.3, “Security Considerations” section.

4. A quick guide to running your first crawl job

Once you've installed Heritrix and logged into the WUI (see above) you are presented with the web Console page. Near the top there is a row of tabs.

Step 1. Create a job

To create a new job choose the Jobs tab, this will take you to the Jobs page. Once there you are presented with three options for creating a new job. Select 'With defaults'. This will create a new job based on the default profile (see Section 5.2, “Profile”).

On the screen that comes next you will be asked to supply a name, description and a seed list for the new job.

For a name supply a short text with no special characters or spaces (except dash and underscore). You can skip the description if you like. In the seeds list type in the URL of the sites you are interested in harvesting. One URL to a line.

Creating a job is covered in greater detail in Section 5, “Creating jobs and profiles”.

Step 2. Configure the job

Once you've entered this information in you are ready to go to the configuration pages. Click the Modulesbutton in the row of buttons at the bottom of the page.

This will take you to the modules configuration page (more details in Section 6.1, “Modules (Scope, Frontier, and Processors)”). For now we are only interested in the option second from the top named Select crawl scope. It allows you to specify the limits of the crawl. By default it is limited to the domains that your seeds span. This may be suitable for your purposes. If not you can choose a broad scope (not limited to the domains of its seeds) or the more restrictive host scope that limits the crawl to the hosts that its seeds span. For more on scopes refer to Section 6.1.1, “Crawl Scope”.

To change scopes, select the new one from the combobox and click the Change button.

Next turn your attention to the second row of tabs at the top of the page, below the usual tabs. You are currently on the far left tab. Now select the tab called Settings near the middle of the row.

This takes you to the Settings page. It allows you to configure various details of the crawl. Exhaustive coverage of this page can be found in Section 6.3, “Settings”. For now we are only interested in the two settings under http-headers. These are the user-agent and from field of the HTTP headers in the crawlers requests. You must set them to valid values before a crawl can be run. The current values upper-case what needs replacing. If you have trouble with that please refer to Section 6.3.1.3, “HTTP headers” for what's regarded as valid values.

Once you've set the http-headers settings to proper values (and made any other desired changes), you can click the Submit job tab at the far right of the second row of tabs. The crawl job is now configured and ready to run.

Configuring a job is covered in greater detail in Section 6, “Configuring jobs and profiles”.

Step 3. Running the job

Submitted new jobs are placed in a queue of pending jobs. The crawler does not start processing jobs from this queue until the crawler is started. While the crawler is stopped, jobs are simply held.

To start the crawler, click on the Console tab. Once on the Console page, you will find the option Start at the top of the Crawler Status box, just to the right of the indicator of current status. Clicking this option will put the crawling into Crawling Jobs mode, where it will begin crawling any next pending job, such as the job you just created and configured.

The Console will update to display progress information about the on-going crawl. Click the Refresh option (or the top-left Heritrix logo) to update this information.

For more information about running a job see Section 7, “Running a job”.

Detailed information about evaluating the progress of a job can be found in Section 8, “Analysis of jobs”.

5. Creating jobs and profiles

In order to run a crawl a configuration must be created that defines it. In Heritrix such a configuration is called a crawl job.

5.1. Crawl job

A crawl job encompasses the configurations needed to run a single crawl. It also contains some additional elements such as file locations, status etc.

Once logged onto the WUI new jobs can be created by going to the Jobs tab. Once the Jobs page loads users can create jobs by choosing of the following three options:

Based on existing job

This option allows the user to create a job by basing it on any existing job, regardless of whether it has been crawled or not. Can be useful for repeating crawls or recovering a crawl that had problems. (See Section 9.3, “Recovery of Frontier State and recover.gz”
Based on a profile

This option allows the user to create a job by basing it on any existing profiles.
With defaults

This option creates a new crawl job based on the default profile.

Options 1 and 2 will display a list of available options. Initially there are two profiles and no existing jobs.

All crawl jobs are created by basing them on profiles (see Section 5.2, “Profile”) or existing jobs.

Once the proper profile/job has been chosen to base the new job on, a simple page will appear asking for the new job's:

Name

The name must only contain letters, numbers, dash (-) and underscore (_). No other characters are allowed. This name will be used to identify the crawl in the WUI but it need not be unique. The name can not be changed later
Description

A short description of the job. This is a freetext input box and can be edited later.
Seeds

The seed URIs to use for the job. This list can be edited later along with the general configurations.

Below these input fields there are several buttons. The last one Submit job will immediately submit the job and (assuming it is properly configured) it will be ready to run (see Section 7, “Running a job”). The other buttons will take the user to the relevant configuration pages (those are covered in detail in Section 6, “Configuring jobs and profiles”). Once all desired changes have been made to the configuration, click the 'Submit job' tab (usually displayed top and bottom right) to submit it to the list of waiting jobs.

Note

Changes made afterwards to the original jobs or profiles that a new job is based on will not in any way affect the newly created job.

Note

Jobs based on the default profile provided with Heritrix are not ready to run as is. Their HTTP header information must be set to valid values. See Section 6.3.1.3, “HTTP headers” for details.

5.2. Profile

A profile is a template for a crawl job. It contains all the configurations that a crawl job would, but is not considered to be 'crawlable'. That is Heritrix will not allow you to directly crawl a profile, only jobs based on profiles. The reason for this is that while profiles may in fact be complete, they may also not be.

A common example is leaving the HTTP headers (user-agent, from) in an illegal state in a profile to force the user to input valid data. This applies to the default (default) profile that comes with Heritrix. Other examples would be leaving the seeds list empty, not specifying some processors (such as the writer/indexer) etc.

In general there is less error checking of profiles.

To manage profiles, go to the Profiles tab in the WUI. That page will display a list of existing profiles. To create a new profile select the option of creating a "New profile based on it" from the existing profile to use as a template. Much like jobs, profiles can only be created based on other profiles. It is not possible to create profiles based on existing jobs.

The process from there on mirrors the creation of jobs. A page will ask for the new profiles name, description and seeds list. Unlike job names, profile names must be unique from other profile names - jobs and a profile can share the same name - otherwise the same rules apply.

The user then proceeds to the configuration pages (see Section 6, “Configuring jobs and profiles”) to modify the behavior of the new profile from that of the parent profile.

Note

Even though profiles are based on other profiles, changes made to the original profiles afterwards will not affect the new ones.

你可能感兴趣的:(xml,Web,linux,UI,Access)

Ubuntu，centos下源码安装cmake指定版本你若盛开，清风自来！ ubuntu centos linux
网址：Indexof/files/v3.23常规安装出错1.先把安装包cmake-3.12.4-Linux-x86_64.tar.gz复制到指定目录2.解压tar-zxvfcmake-3.12.4-Linux-x86_64.tar.gz3.进入解压之后的文件夹cdcmake-3.12.4-Linux-x86_64.tar.gz4.运行下面命令出错bash:./bootstrap:Nosuchfil
web前端常见面试题 JackieDYH 程序猿面试题前端 javascript vue 面试题
html文件开头DOCTYPE作用DOCTYPE（文档类型）是HTML文档的开头，它指定了HTML文档使用的HTML版本及文档类型，告诉浏览器以哪种规范来解析HTML文档。它的作用有以下几个方面：声明HTML版本：DOCTYPE声明可以让浏览器知道使用哪个HTML版本来解析当前文档，从而根据规范来处理文档中的元素和属性。帮助浏览器正确解析文档：DOCTYPE声明可以确保浏览器以标准模式渲染页面，而
【2025年】全国CTF夺旗赛-从零基础入门到竞赛，看这一篇就稳了！白帽黑客鹏哥 web安全 CTF 网络安全大赛 python Linux
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包目录一、CTF简介二、CTF竞赛模式三、CTF各大题型简介四、CTF学习路线4.1、初期1、html+css+js（2-3天）2、apache+php（4-5天）3、mysql（2-3天）4、python(2-3天)5、burpsuite（1-2天）4.2、中期1、SQL注入（7-8天）2、文件上传（7-8天）3、其他漏洞（14-15
【现代前端框架中本地图片资源的处理方案】 Gazer_S 前端框架前端缓存 javascript chrome
现代前端框架中本地图片资源的处理方案前言在前端开发中，正确引用本地图片资源是一个常见但容易被忽视的问题。我们不能像在HTML中那样简单地使用相对路径，因为JavaScript模块中的路径解析规则与HTML不同，且现代构建工具对静态资源有特殊的处理机制。本文将详细探讨在webpack和Vite等构建工具中处理本地图片引用的各种方法。传统方式的局限性在传统开发中，我们可能习惯这样引用图片：constl
javaweb将上传的图片保存在项目文件webapp下的upload文件夹下 yuren_xia 后端技术前端技术 web app java tomcat eclipse
前端HTML表单(upload.html)首先，创建一个HTML页面，允许用户选择并上传图片。图片上传上传图片注意：表单的method设置为"post"，enctype需设置成"multipart/form-data"2.后端Servlet(UploadServlet.java)接下来，创建一个Servlet来处理文件上传请求，并将上传的图片保存到webapp/load目录下。packagecom
非常实用的linux操作系统一键巡检脚本我科绝伦（Huanhuan Zhou） linux linux chrome 运维
[root@localhost~]#chmod+xsystem_check.sh[root@localhost~]#./system_check.sh[root@localhost~]#cat/root/check_log/check-20250227.txt脚本内容：#!/bin/bash#@Author:zhh#beseemCentOS6.XCentOS7.X#date:20250224#检查
【linux自动化实践】linux shell 脚本替换某文本忙碌的菠萝 linux自动化实践 linux 自动化运维
在Linuxshell脚本中，可以使用sed命令来替换文本。以下是一个基本的例子，它将在文件example.txt中查找文本old_text并将其替换为new_textsed-i's/old_text/new_text/g'example.txt解释：sed:是streameditor的缩写，用于处理文本数据。-i:表示直接修改文件内容。s:表示替换操作。old_text:要被替换的文本。new_
Spring 核心技术解析【纯干货版】- XII：Spring 数据访问模块 Spring-R2dbc 模块精讲 m0_74825003 面试学习路线阿里巴巴 spring java 后端
在现代应用架构中，高并发、低延迟的需求推动了响应式编程的发展，而传统的JDBC由于其同步阻塞机制，在高吞吐场景下可能成为瓶颈。R2DBC（ReactiveRelationalDatabaseConnectivity）作为响应式关系型数据库访问标准，正是为了解决这一问题而诞生的。SpringR2DBC作为Spring生态对R2DBC的封装，提供了非阻塞、异步的数据库访问能力，并与SpringWebF
网络安全工具 AWVS 与 Nmap：原理、使用及代码示例阿贾克斯的黎明网络安全安全 web安全网络
目录网络安全工具AWVS与Nmap：原理、使用及代码示例AWVS：Web漏洞扫描的利器1.工具概述2.工作原理3.使用方法4.代码示例（Python调用AWVSAPI进行扫描）Nmap：网络探测与端口扫描的神器1.工具概述2.工作原理3.使用方法4.代码示例（Python调用Nmap进行扫描）总结在网络安全领域，AWVS（AcunetixWebVulnerabilityScanner）和Nmap是
深入剖析 Weblogic、ThinkPHP、Jboss、Struct2 历史漏洞阿贾克斯的黎明网络安全 web安全
目录深入剖析Weblogic、ThinkPHP、Jboss、Struct2历史漏洞一、Weblogic漏洞（一）漏洞原理（二）漏洞利用代码（Python示例）（三）防范措施二、ThinkPHP漏洞（一）漏洞原理（二）漏洞利用代码（示例，假设存在漏洞的代码片段）（三）防范措施三、Jboss漏洞（一）漏洞原理（二）漏洞利用代码（Java示例，用于构造恶意序列化数据）（三）防范措施四、Struct2漏洞
Qt5.6在Linux中无法切换中文输入法问题解决糯米藕片经验分享 qt linux 开发语言
注意Qt5.6.1要编译1.0.6版本源码chmod777赋权复制两个地方so重启QtCreatorsudocplibfcitxplatforminputcontextplugin.so/home/shen/Qt5.6.1/Tools/QtCreator/lib/Qt/plugins/platforminputcontextssudocplibfcitxplatforminputcontextpl
Transformer 代码剖析2 - 模型训练（pytorch实现） lczdyx Transformer代码剖析 transformer pytorch 深度学习人工智能 python
一、模型初始化模块参考：项目代码1.1参数统计函数defcount_parameters(model):returnsum(p.numel()forpinmodel.parameters()ifp.requires_grad)遍历模型参数筛选可训练参数统计参数数量返回总数技术解析：numel()方法计算张量元素总数requires_grad筛选需要梯度更新的参数统计结果反映模型复杂度，典型Tran
【Python专栏】Python的发展历程雾岛心情 Python入门到精通 python 开发语言
Python的创始人为吉多·范罗苏姆（GuidovanRossum），人称龟叔1989年，为了打发圣诞节假期，Guido开始写Python语言的编译器。Python这个名字，来自Guido所挚爱的电视剧MontyPython’sFlyingCircus。他希望这个新的叫做Python的语言，能符合他的理想：创造一种C和shell之间，功能全面，易学易用，可拓展的语言。Python的具体发展历史和版
Spring Cloud Alibaba Spring Cloud Spring Boot 版本对应关系马丁半只瞄 java spring spring boot spring cloud
版本不对应可能有以下报错：Failedtobindpropertiesundermybatis-plus.configuration.result-maps[0]NoClassDefFoundError:reactor/netty/http/server/WebsocketServerSpec$Builderreactor.netty.resources.ConnectionProvider.el
Llama.cpp 服务器安装指南（使用 Docker，GPU 专用）田猿笔记 AI 高级应用 llama 服务器 docker llama.cpp
前置条件在开始之前，请确保你的系统满足以下要求：操作系统：Ubuntu20.04/22.04（或支持Docker的Linux系统）。硬件：NVIDIAGPU（例如RTX4090）。内存：16GB+系统内存，GPU需12GB+显存（RTX4090有24GB）。存储：15GB+可用空间（用于源码、镜像和模型文件）。网络：需要互联网连接以下载源码和依赖。软件：已安装并运行Docker。已安装NVIDIA
Spring Boot@Component注解下的类无法@Autowired的问题 Micrle_007 springboot Spring
这个问题心累(确实)在把我的一个非Web程序迁移从Spring迁移到SpringBoot时，出现了在@Component注解下@Autowired的类为null的情况，也就是没注入成功，或者说是此类在bean加载之前就被调用了。试了各种办法，修改扫描包，修改@Component注解等等，皆未成功，后来看到了一个方法，探究了一下。@ComponentpublicclassComponentClass
ArrayList 源码分析 2401_85327573 java 开发语言
ArrayList简介ArrayList的底层是数组队列，相当于动态数组。与Java中的数组相比，它的容量能动态增长。在添加大量元素前，应用程序可以使用ensureCapacity操作来增加ArrayList实例的容量。这可以减少递增式再分配的数量。ArrayList继承于AbstractList，实现了List,RandomAccess,Cloneable,java.io.Serializabl
javaweb文件上传：@MultipartConfig注解与Apache Commons FileUpload对比 yuren_xia 后端技术 apache java tomcat
在JavaWeb应用中处理文件上传时，可以选择使用@MultipartConfig注解或第三方库如ApacheCommonsFileUpload（通常简称为fileupload）。以下是两者的比较和建议：使用@MultipartConfig注解简介：@MultipartConfig是JavaServlet规范中用于处理multipart/form-data请求（通常是文件上传）的注解。它简化了在S
Composer如何通过GitHub Personal Access Token安装私有包：完整教程 lihuang319 composer github php
使用Composer安全管理您的PHP私有依赖包一、前言在PHP开发中，我们经常需要将内部工具包托管为私有仓库。传统的账号密码验证方式存在安全隐患，而GitHubPersonalAccessToken（PAT）提供了一种更安全的鉴权方案。本文将通过4个核心步骤+3个避坑指南，手把手教您在Composer中优雅地使用PAT安装私有包。二、为什么要用PAT？安全性：细粒度权限控制（可设置过期时间/单仓
Golang的Aes加解密工具类张声录1 golang 开发语言后端
packagemainimport("bytes""crypto/aes""crypto/sha1""encoding/binary""encoding/hex""fmt")//SHA1PRNG模拟Java的SHA1PRNG算法typeSHA1PRNGstruct{state[sha1.Size]bytecounteruint32indexint}//NewSHA1PRNG使用种子初始化SHA1P
驱动开发系列39 - Linux Graphics 3D 绘制流程（二）- 设置渲染管线黑不溜秋的 GPU驱动专栏驱动开发
一：概述Intel的Iris驱动是Mesa中的Gallium驱动，主要用于IntelGen8+GPU（Broadwell及更新架构）。它负责与i915内核DRM驱动交互，并通过Vulkan（ANV）、OpenGL（IrisGallium）、或OpenCL（Clover）来提供3D加速。在Iris驱动中，GPUPipeline设置涉及多个部分，包括编译和上传着色器、设置渲染目标、绑定缓冲区、配置固定
Linux驱动开发: USB驱动开发 DS小龙哥 Linux系统编程与驱动开发 linux USB驱动嵌入式
一、USB简介1.1什么是USB?USB是连接计算机系统与外部设备的一种串口总线标准，也是一种输入输出接口的技术规范，被广泛地应用于个人电脑和移动设备等信息通讯产品，USB就是简写，中文叫通用串行总线。最早出现在1995年，伴随着奔腾机发展而来。自微软在Windows98中加入对USB接口的支持后，USB接口才推广开来，USB设备也日渐增多，如数码相机、摄像头、扫描仪、游戏杆、打印机、键盘、鼠标等
关闭linux系统端口占用,关闭linux系统端口的两种方法爱吃面的喵关闭linux系统端口占用
1、通过杀掉进程的方法来关闭端口每个端口都有一个守护进程，kill掉这个守护进程就可以了每个端口都是一个进程占用着，第一步、用下面命令netstat-anp|grep端口找出占用这个端口的进程，第二步、用下面命令kill-9PID杀掉就行了2、通过开启关闭服务的方法来开启/关闭端口因为每个端口都有对应的服务，因此要关闭端口只要关闭相应的服务就可以了。linux中开机自动启动的服务一般都存放在两个地
Linux 查看端口占用命令酒酿小圆子～ linux 运维服务器
文章目录1、lsof-i:端口号2、netstat命令2.1netstat-tunlp命令2.2netstat-anp命令1、lsof-i:端口号用于查看某一端口的占用情况，比如查看5000端口使用情况：sudolsof-i:5000注意：这里最好使用sudo开启管理员权限，未开启管理员权限时，可能会检测不到相关进程。（并非所有进程都能被检测到，所有非本用户的进程信息将不会显示，如果想看到所有信息
Linux Device Driver 3rd 上 xiaozi63 linux 内核驱动程序
第一章设备驱动程序的简介处于上层应用与底层硬件设备的软件层区分机制和策略是Linux最好的思想之一，机制指的是需要提供什么功能，策略指的是如何使用这个功能！通常不同的环境需要不同的方式来使用硬件，则驱动应当尽可能地不实现策略．驱动程序设计需要考虑一下几个方面的因素：提供给用户尽量多的选项编写驱动程序所占用的时间，驱动程序的操作耗时需要尽量缩减．尽量保持程序简单内核概览：进程管理:负责创建和销毁进程
最通用的跨平台引擎：ShiVa 3D引擎 pizi0475 图形图像其它文章图形引擎游戏引擎引擎跨平台脚本服务器 ssl soap
ShiVa3D引擎是最通用的跨平台引擎，可以在Web浏览器运行并且也支持Windows，Mac，Linux，Wii，iPhone，iPad，Android，WebOS和AirplaySDK。该引擎支持SSL–securized插件扩展，很像PhysX引擎，FMOD声音库，ARToolkit和ScaleformHUD引擎。ClassicGeometry经典的图形处理支持多边形网，其中包括：-静态网格
Linux系统如何排查端口占用程序猿000001号 linux 运维服务器
如何在Linux系统中排查端口占用在Linux系统中，当您遇到网络服务无法启动或响应异常的情况时，可能是因为某个特定的端口已经被其他进程占用。这时，您需要进行端口占用情况的排查来解决问题。本文将介绍几种常用的命令行工具和方法，帮助您快速定位并解决端口占用的问题。1.使用netstat命令netstat是一个网络统计工具，它可以显示网络连接、路由表、接口统计等信息。要检查端口占用情况，可以使用以下命
Linux查看端口占用情况的几种方式 liu_caihong linux 服务器网络
Linux查看端口占用情况的几种方式概述测试环境为Centos7.9，本文简单给出了几种检测端口的例子。一、查看本机端口占用1、netstat#安装netstatyum-yinstallnet-tools#检测端口占用netstat-npl|grep"端口"[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FFUW0j6I-1655191692938)(C:%5CUse
探索React的深度应用：React Survey——构建问卷神器傅尉艺Maggie
探索React的深度应用：ReactSurvey——构建问卷神器去发现同类优质开源项目:https://gitcode.com/在浩瀚的前端开发世界里，React与Redux已成为构建复杂Web应用的得力助手，但它们的强大往往隐藏在基础教程之后。今天，让我们一同探索【ReactSurvey】，一个将React与Redux之力发挥至极致的开源项目，教你如何轻松打造专业的在线问卷系统。项目介绍Reac
推荐使用：react-native-cn-quill - 为React Native打造的富文本编辑器秋玥多
推荐使用：react-native-cn-quill-为ReactNative打造的富文本编辑器react-native-cn-quillQuillrich-texteditorforreact-native项目地址:https://gitcode.com/gh_mirrors/re/react-native-cn-quill项目介绍react-native-cn-quill是一个基于QuillA
312个免费高速HTTP代理IP（能隐藏自己真实IP地址） yangshangchuan 高速免费 superword HTTP代理
124.88.67.20:843 190.36.223.93:8080 117.147.221.38:8123 122.228.92.103:3128 183.247.211.159:8123 124.88.67.35:81 112.18.51.167:8123 218.28.96.39:3128 49.94.160.198:3128 183.20
pull解析和json编码百合不是茶 android pull解析 json
n.json文件: [{name:java,lan:c++,age:17},{name:android,lan:java,age:8}] pull.xml文件 <?xml version="1.0" encoding="utf-8"?> <stu> <name>java
[能源与矿产]石油与地球生态系统 comsci 能源
按照苏联的科学界的说法,石油并非是远古的生物残骸的演变产物,而是一种可以由某些特殊地质结构和物理条件生产出来的东西,也就是说,石油是可以自增长的.... 那么我们做一个猜想: 石油好像是地球的体液,我们地球具有自动产生石油的某种机制,只要我们不过量开采石油,并保护好
类与对象浅谈沐刃青蛟 java 基础
类，字面理解，便是同一种事物的总称，比如人类，是对世界上所有人的一个总称。而对象，便是类的具体化，实例化，是一个具体事物，比如张飞这个人，就是人类的一个对象。但要注意的是：张飞这个人是对象，而不是张飞，张飞只是他这个人的名字，是他的属性而已。而一个类中包含了属性和方法这两兄弟，他们分别用来描述对象的行为和性质（感觉应该是
新站开始被收录后，我们应该做什么？ IT独行者 PHP seo
新站开始被收录后，我们应该做什么？百度终于开始收录自己的网站了，作为站长，你是不是觉得那一刻很有成就感呢，同时，你是不是又很茫然，不知道下一步该做什么了？至少我当初就是这样，在这里和大家一份分享一下新站收录后，我们要做哪些工作。至于如何让百度快速收录自己的网站，可以参考我之前的帖子《新站让百
oracle 连接碰到的问题文强chu oracle
Unable to find a java Virtual Machine－－安装64位版Oracle11gR2后无法启动SQLDeveloper的解决方案作者：草根IT网来源：未知人气：813标签：导读：安装64位版Oracle11gR2后发现启动SQLDeveloper时弹出配置java.exe的路径，找到Oracle自带java.exe后产生的路径“C:\app\用户名\prod
Swing中按ctrl键同时移动鼠标拖动组件（类中多借口共享同一数据）小桔子 java 继承 swing 接口监听
都知道java中类只能单继承，但可以实现多个接口，但我发现实现多个接口之后，多个接口却不能共享同一个数据，应用开发中想实现：当用户按着ctrl键时，可以用鼠标点击拖动组件，比如说文本框。编写一个监听实现KeyListener,NouseListener,MouseMotionListener三个接口，重写方法。定义一个全局变量boolea
linux常用的命令 aichenglong linux 常用命令
1 startx切换到图形化界面 2 man命令:查看帮助信息 man 需要查看的命令,man命令提供了大量的帮助信息,一般可以分成4个部分 name:对命令的简单说明 synopsis:命令的使用格式说明 description:命令的详细说明信息 options:命令的各项说明 3 date:显示时间语法：date [OPTION]... [+FORMAT]
eclipse内存优化 AILIKES java eclipse jvm jdk
一基本说明在JVM中，总体上分2块内存区,默认空余堆内存小于 40%时，JVM就会增大堆直到-Xmx的最大限制；空余堆内存大于70%时，JVM会减少堆直到-Xms的最小限制。 1)堆内存(Heap memory):堆是运行时数据区域，所有类实例和数组的内存均从此处分配,是Java代码可及的内存，是留给开发人
关键字的使用探讨百合不是茶关键字
//关键字的使用探讨/*访问关键词private 只能在本类中访问public 只能在本工程中访问protected 只能在包中和子类中访问默认的只能在包中访问*//*final 类方法变量 final 类不能被继承 final 方法不能被子类覆盖，但可以继承 final 变量只能有一次赋值，赋值后不能改变 final 不能用来修饰构造方法*///this()
JS中定义对象的几种方式 bijian1013 js
1. 基于已有对象扩充其对象和方法(只适合于临时的生成一个对象)： <html> <head> <title>基于已有对象扩充其对象和方法(只适合于临时的生成一个对象)</title> </head> <script> var obj = new Object();
表驱动法实例 bijian1013 java 表驱动法 TDD
获得月的天数是典型的直接访问驱动表方式的实例，下面我们来展示一下： MonthDaysTest.java package com.study.test; import org.junit.Assert; import org.junit.Test; import com.study.MonthDays; public class MonthDaysTest { @T
LInux启停重启常用服务器的脚本 bit1129 linux
启动，停止和重启常用服务器的Bash脚本，对于每个服务器，需要根据实际的安装路径做相应的修改 #! /bin/bash Servers=(Apache2, Nginx, Resin, Tomcat, Couchbase, SVN, ActiveMQ, Mongo); Ops=(Start, Stop, Restart); currentDir=$(pwd); echo
【HBase六】REST操作HBase bit1129 hbase
HBase提供了REST风格的服务方便查看HBase集群的信息，以及执行增删改查操作 1. 启动和停止HBase REST 服务 1.1 启动REST服务前台启动（默认端口号8080） [hadoop@hadoop bin]$ ./hbase rest start 后台启动 hbase-daemon.sh start rest 启动时指定
大话zabbix 3.0设计假设 ronin47
What’s new in Zabbix 2.0? 去年开始使用Zabbix的时候，是1.8.X的版本，今年Zabbix已经跨入了2.0的时代。看了2.0的release notes，和performance相关的有下面几个： :: Performance improvements::Trigger related da
http错误码大全 byalias http协议 javaweb
响应码由三位十进制数字组成，它们出现在由HTTP服务器发送的响应的第一行。响应码分五种类型，由它们的第一位数字表示： 1）1xx：信息，请求收到，继续处理 2）2xx：成功，行为被成功地接受、理解和采纳 3）3xx：重定向，为了完成请求，必须进一步执行的动作 4）4xx：客户端错误，请求包含语法错误或者请求无法实现 5）5xx：服务器错误，服务器不能实现一种明显无效的请求
J2EE设计模式-Intercepting Filter bylijinnan java 设计模式数据结构
Intercepting Filter类似于职责链模式有两种实现其中一种是Filter之间没有联系，全部Filter都存放在FilterChain中，由FilterChain来有序或无序地把把所有Filter调用一遍。没有用到链表这种数据结构。示例如下： package com.ljn.filter.custom; import java.util.ArrayList;
修改jboss端口 chicony jboss
修改jboss端口 %JBOSS_HOME%\server\{服务实例名}\conf\bindingservice.beans\META-INF\bindings-jboss-beans.xml 中找到 <!-- The ports-default bindings are obtained by taking the base bindin
c++ 用类模版实现数组类 CrazyMizzz C++
最近c++学到数组类，写了代码将他实现，基本具有vector类的功能 #include<iostream> #include<string> #include<cassert> using namespace std; template<class T> class Array { public: //构造函数
hadoop dfs.datanode.du.reserved 预留空间配置方法 daizj hadoop 预留空间
对于datanode配置预留空间的方法为：在hdfs-site.xml添加如下配置 <property> <name>dfs.datanode.du.reserved</name> <value>10737418240</value>
mysql远程访问的设置 dcj3sjt126com mysql 防火墙
第一步: 激活网络设置你需要编辑mysql配置文件my.cnf. 通常状况，my.cnf放置于在以下目录： /etc/mysql/my.cnf (Debian linux) /etc/my.cnf （Red Hat Linux/Fedora Linux) /var/db/mysql/my.cnf (FreeBSD) 然后用vi编辑my.cnf，修改内容从以下行： [mysqld] 你所需要: 1
ios 使用特定的popToViewController返回到相应的Controller dcj3sjt126com controller
1、取navigationCtroller中的Controllers NSArray * ctrlArray = self.navigationController.viewControllers; 2、取出后，执行， [self.navigationController popToViewController:[ctrlArray objectAtIndex:0] animated:YES
Linux正则表达式和通配符的区别 eksliang 正则表达式通配符和正则表达式的区别通配符
转载请出自出处：http://eksliang.iteye.com/blog/1976579 首先得明白二者是截然不同的通配符只能用在shell命令中,用来处理字符串的的匹配。判断一个命令是否为bash shell(linux 默认的shell)的内置命令 type -t commad 返回结果含义 file 表示为外部命令 alias 表示该
Ubuntu Mysql Install and CONF gengzg Install
http://www.navicat.com.cn/download/navicat-for-mysql Step1: 下载Navicat ，网址：http://www.navicat.com/en/download/download.html Step2：进入下载目录，解压压缩包：tar -zxvf navicat11_mysql_en.tar.gz
批处理，删除文件bat huqiji windows dos
@echo off ::演示：删除指定路径下指定天数之前（以文件名中包含的日期字符串为准）的文件。 ::如果演示结果无误，把del前面的echo去掉，即可实现真正删除。 ::本例假设文件名中包含的日期字符串（比如：bak-2009-12-25.log） rem 指定待删除文件的存放路径 set SrcDir=C:/Test/BatHome rem 指定天数 set DaysAgo=1
跨浏览器兼容的HTML5视频音频播放器天梯梦 html5
HTML5的video和audio标签是用来在网页中加入视频和音频的标签，在支持html5的浏览器中不需要预先加载Adobe Flash浏览器插件就能轻松快速的播放视频和音频文件。而html5media.js可以在不支持html5的浏览器上使video和audio标签生效。 How to enable <video> and <audio> tags in
Bundle自定义数据传递 hm4123660 android Serializable 自定义数据传递 Bundle Parcelable
我们都知道Bundle可能过put****()方法添加各种基本类型的数据，Intent也可以通过putExtras(Bundle)将数据添加进去，然后通过startActivity()跳到下一下Activity的时候就把数据也传到下一个Activity了。如传递一个字符串到下一个Activity 把数据放到Intent
C＃：异步编程和线程的使用（.NET 4.5 ） powertoolsteam .net 线程 C#异步编程
异步编程和线程处理是并发或并行编程非常重要的功能特征。为了实现异步编程，可使用线程也可以不用。将异步与线程同时讲，将有助于我们更好的理解它们的特征。本文中涉及关键知识点 1. 异步编程 2. 线程的使用 3. 基于任务的异步模式 4. 并行编程 5. 总结异步编程什么是异步操作？异步操作是指某些操作能够独立运行，不依赖主流程或主其他处理流程。通常情况下，C＃程序
spark 查看 job history 日志 Stark_Summer 日志 spark history job
SPARK_HOME/conf 下: spark-defaults.conf 增加如下内容 spark.eventLog.enabled true spark.eventLog.dir hdfs://master:8020/var/log/spark spark.eventLog.compress true spark-env.sh 增加如下内容 export SP
SSH框架搭建 wangxiukai2015eye spring Hibernate struts
MyEclipse搭建SSH框架 Struts Spring Hibernate 1、new一个web project。 2、右键项目，为项目添加Struts支持。选择Struts2 Core Libraries -<MyEclipes-Library> 点击Finish。src目录下多了struts