weixin_30586085

【转载】Hadoop可视化分析利器之Hue

http://qindongliang.iteye.com/blog/2212619

先来看下hue的架构图：

（1）Hue是什么？

Hue是一个可快速开发和调试Hadoop生态系统各种应用的一个基于浏览器的图形化用户接口。

（2）Hue能干什么？

1，访问HDFS和文件浏览
2，通过web调试和开发hive以及数据结果展示
3，查询solr和结果展示，报表生成
4，通过web调试和开发impala交互式SQL Query
5，spark调试和开发
6，Pig开发和调试
7，oozie任务的开发，监控，和工作流协调调度
8，Hbase数据查询和修改，数据展示
9，Hive的元数据（metastore）查询
10，MapReduce任务进度查看，日志追踪
11，创建和提交MapReduce，Streaming，Java job任务
12，Sqoop2的开发和调试
13，Zookeeper的浏览和编辑
14，数据库（MySQL，PostGres，SQlite，Oracle）的查询和展示

（3）Hue怎么用或者什么时候应该用？

如果你们公司用的是CDH的hadoop，那么很幸运，Hue也是出自CDH公司，自家的东西用起来当然很爽。

如果你们公司用的是Apache Hadoop或者是HDP的hadoop，那么也没事，Hue是开源的，而且支持任何版本的hadoop。

关于什么时候用，这纯属一个锦上添花的功能，你完全可以不用hue，因为各种开源项目都有自己的使用方式和开发接口，hue只不过是统一了各个项目的开发方式在一个接口里而已，这样比较方便而已，不用你一会准备使用hive，就开一个hive的cli终端，一会用pig，你就得开一个pig的grunt，或者你又想查Hbase，又得需要开一个Hbase的shell终端。如果你们使用hadoop生态系统的组件很多的情况下，使用hue还是比较方便的，另外一个好处就是hue提供了一个web的界面来开发和调试任务，不用我们再频繁登陆Linux来操作了。

你可以在任何时候，只要能上网，就可以通过hue来开发和调试数据，不用再装Linux的客户端来远程登陆操作了，这也是B/S架构的好处。

（4）如何下载，安装和编译Hue？

centos系统，执行命令：
yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel

1，hue的依赖（centos系统）

Java代码

ant
asciidoc
cyrus-sasl-devel
cyrus-sasl-gssapi
gcc
gcc-c++
krb5-devel
libtidy (for unit tests only)
libxml2-devel
libxslt-devel
make
mvn (from maven package or maven3 tarball)
mysql
mysql-devel
openldap-devel
python-devel
sqlite-devel
openssl-devel (for version 7+)

2，散仙的在安装hue前，centos上已经安装好了，jdk，maven，ant，hadoop，hive，oozie等，环境变量如下：

Java代码

user="search"
# java
export JAVA_HOME="/usr/local/jdk"
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin
# ant
export ANT_HOME=/usr/local/ant
export CLASSPATH=$CLASSPATH:$ANT_HOME/lib
export PATH=$PATH:$ANT_HOME/bin
# maven
export MAVEN_HOME="/usr/local/maven"
export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
export PATH=$PATH:$MAVEN_HOME/bin
##Hadoop2.2的变量设置
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_HOME=/home/search/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME
# Hive
export HIVE_HOME=/home/search/hive
export HIVE_CONF_DIR=/home/search/hive/conf
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf
export OOZIE_HOME="/home/search/oozie-4.1.0"
export PATH=$PATH:$OOZIE_HOME/sbin:$OOZIE_HOME/bin

3,本文散仙主要是采用tar包的方式安装hue，除了tar包的方式，hue还能采用cm安装，当然这就与cdh的系统依赖比较大了。

hue最新的版本是3.8.1，散仙这里用的3.7.0的版本
下载地址：https://github.com/cloudera/hue/releases

hue的github地址：https://github.com/cloudera/hue

4，下载完后，解压tar包，并进入hue的根目录执行命令
make apps编译

5，编译成功后，需要配置/home/search/hue/desktop/conf/pseudo-distributed.ini文件，里面包含了hdfs，yarn，mapreduce，hive，oozie，pig，spark，solr等的ip地址和端口号配置，可根据自己的情况设置，如果没有安装某个应用，那就无须配置，只不过这个应用在web上不能使用而已，并不会影响其他框架的使用。

一个例子如下：

Java代码

#####################################
# DEVELOPMENT EDITION
#####################################
# Hue configuration file
# ===================================
#
# For complete documentation about the contents of this file, run
# $ /build/env/bin/hue config_help
#
# All .ini files under the current directory are treated equally. Their
# contents are merged to form the Hue configuration, which can
# can be viewed on the Hue at
# http://:/dump_config
###########################################################################
# General configuration for core Desktop features (authentication, etc)
###########################################################################
[desktop]
send_dbug_messages=1
# To show database transactions, set database_logging to 1
database_logging=0
# Set this to a random string, the longer the better.
# This is used for secure hashing in the session store.
secret_key=search
# Webserver listens on this address and port
http_host=0.0.0.0
http_port=8000
# Time zone name
time_zone=Asia/Shanghai
# Enable or disable Django debug mode
## django_debug_mode=true
# Enable or disable backtrace for server error
## http_500_debug_mode=true
# Enable or disable memory profiling.
## memory_profiler=false
# Server email for internal error messages
## django_server_email='[email protected]'
# Email backend
## django_email_backend=django.core.mail.backends.smtp.EmailBackend
# Webserver runs as this user
server_user=search
server_group=search
# This should be the Hue admin and proxy user
default_user=search
# This should be the hadoop cluster admin
default_hdfs_superuser=search
# If set to false, runcpserver will not actually start the web server.
# Used if Apache is being used as a WSGI container.
## enable_server=yes
# Number of threads used by the CherryPy web server
## cherrypy_server_threads=10
# Filename of SSL Certificate
## ssl_certificate=
# Filename of SSL RSA Private Key
## ssl_private_key=
# List of allowed and disallowed ciphers in cipher list format.
# See http://www.openssl.org/docs/apps/ciphers.html for more information on cipher list format.
## ssl_cipher_list=DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2
# LDAP username and password of the hue user used for LDAP authentications.
# Set it to use LDAP Authentication with HiveServer2 and Impala.
## ldap_username=hue
## ldap_password=
# Default encoding for site data
## default_site_encoding=utf-8
# Help improve Hue with anonymous usage analytics.
# Use Google Analytics to see how many times an application or specific section of an application is used, nothing more.
## collect_usage=true
# Support for HTTPS termination at the load-balancer level with SECURE_PROXY_SSL_HEADER.
## secure_proxy_ssl_header=false
# Comma-separated list of Django middleware classes to use.
# See https://docs.djangoproject.com/en/1.4/ref/middleware/ for more details on middlewares in Django.
## middleware=desktop.auth.backend.LdapSynchronizationBackend
# Comma-separated list of regular expressions, which match the redirect URL.
# For example, to restrict to your local domain and FQDN, the following value can be used:
# ^\/.*$,^http:\/\/www.mydomain.com\/.*$
## redirect_whitelist=
# Comma separated list of apps to not load at server startup.
# e.g.: pig,zookeeper
## app_blacklist=
# The directory where to store the auditing logs. Auditing is disable if the value is empty.
# e.g. /var/log/hue/audit.log
## audit_event_log_dir=
# Size in KB/MB/GB for audit log to rollover.
## audit_log_max_file_size=100MB
#poll_enabled=false
# Administrators
# ----------------
[[django_admins]]
## [[[admin1]]]
## name=john
## email=john@doe.com
# UI customizations
# -------------------
[[custom]]
# Top banner HTML code
#banner_top_html=Search Team Hadoop Manager
# Configuration options for user authentication into the web application
# ------------------------------------------------------------------------
[[auth]]
# Authentication backend. Common settings are:
# - django.contrib.auth.backends.ModelBackend (entirely Django backend)
# - desktop.auth.backend.AllowAllBackend (allows everyone)
# - desktop.auth.backend.AllowFirstUserDjangoBackend
# (Default. Relies on Django and user manager, after the first login)
# - desktop.auth.backend.LdapBackend
# - desktop.auth.backend.PamBackend
# - desktop.auth.backend.SpnegoDjangoBackend
# - desktop.auth.backend.RemoteUserDjangoBackend
# - libsaml.backend.SAML2Backend
# - libopenid.backend.OpenIDBackend
# - liboauth.backend.OAuthBackend
# (New oauth, support Twitter, Facebook, Google+ and Linkedin
## backend=desktop.auth.backend.AllowFirstUserDjangoBackend
# The service to use when querying PAM.
## pam_service=login
# When using the desktop.auth.backend.RemoteUserDjangoBackend, this sets
# the normalized name of the header that contains the remote user.
# The HTTP header in the request is converted to a key by converting
# all characters to uppercase, replacing any hyphens with underscores
# and adding an HTTP_ prefix to the name. So, for example, if the header
# is called Remote-User that would be configured as HTTP_REMOTE_USER
#
# Defaults to HTTP_REMOTE_USER
## remote_user_header=HTTP_REMOTE_USER
# Ignore the case of usernames when searching for existing users.
# Only supported in remoteUserDjangoBackend.
## ignore_username_case=false
# Ignore the case of usernames when searching for existing users to authenticate with.
# Only supported in remoteUserDjangoBackend.
## force_username_lowercase=false
# Users will expire after they have not logged in for 'n' amount of seconds.
# A negative number means that users will never expire.
## expires_after=-1
# Apply 'expires_after' to superusers.
## expire_superusers=true
# Configuration options for connecting to LDAP and Active Directory
# -------------------------------------------------------------------
[[ldap]]
# The search base for finding users and groups
## base_dn="DC=mycompany,DC=com"
# URL of the LDAP server
## ldap_url=ldap://auth.mycompany.com
# A PEM-format file containing certificates for the CA's that
# Hue will trust for authentication over TLS.
# The certificate for the CA that signed the
# LDAP server certificate must be included among these certificates.
# See more here http://www.openldap.org/doc/admin24/tls.html.
## ldap_cert=
## use_start_tls=true
# Distinguished name of the user to bind as -- not necessary if the LDAP server
# supports anonymous searches
## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"
# Password of the bind user -- not necessary if the LDAP server supports
# anonymous searches
## bind_password=
# Pattern for searching for usernames -- Use for the parameter
# For use when using LdapBackend for Hue authentication
## ldap_username_pattern="uid=,ou=People,dc=mycompany,dc=com"
# Create users in Hue when they try to login with their LDAP credentials
# For use when using LdapBackend for Hue authentication
## create_users_on_login = true
# Synchronize a users groups when they login
## sync_groups_on_login=false
# Ignore the case of usernames when searching for existing users in Hue.
## ignore_username_case=false
# Force usernames to lowercase when creating new users from LDAP.
## force_username_lowercase=false
# Use search bind authentication.
## search_bind_authentication=true
# Choose which kind of subgrouping to use: nested or suboordinate (deprecated).
## subgroups=suboordinate
# Define the number of levels to search for nested members.
## nested_members_search_depth=10
[[[users]]]
# Base filter for searching for users
## user_filter="objectclass=*"
# The username attribute in the LDAP schema
## user_name_attr=sAMAccountName
[[[groups]]]
# Base filter for searching for groups
## group_filter="objectclass=*"
# The username attribute in the LDAP schema
## group_name_attr=cn
[[[ldap_servers]]]
## [[[[mycompany]]]]
# The search base for finding users and groups
## base_dn="DC=mycompany,DC=com"
# URL of the LDAP server
## ldap_url=ldap://auth.mycompany.com
# A PEM-format file containing certificates for the CA's that
# Hue will trust for authentication over TLS.
# The certificate for the CA that signed the
# LDAP server certificate must be included among these certificates.
# See more here http://www.openldap.org/doc/admin24/tls.html.
## ldap_cert=
## use_start_tls=true
# Distinguished name of the user to bind as -- not necessary if the LDAP server
# supports anonymous searches
## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"
# Password of the bind user -- not necessary if the LDAP server supports
# anonymous searches
## bind_password=
# Pattern for searching for usernames -- Use for the parameter
# For use when using LdapBackend for Hue authentication
## ldap_username_pattern="uid=,ou=People,dc=mycompany,dc=com"
## Use search bind authentication.
## search_bind_authentication=true
## [[[[[users]]]]]
# Base filter for searching for users
## user_filter="objectclass=Person"
# The username attribute in the LDAP schema
## user_name_attr=sAMAccountName
## [[[[[groups]]]]]
# Base filter for searching for groups
## group_filter="objectclass=groupOfNames"
# The username attribute in the LDAP schema
## group_name_attr=cn
# Configuration options for specifying the Desktop Database. For more info,
# see http://docs.djangoproject.com/en/1.4/ref/settings/#database-engine
# ------------------------------------------------------------------------
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a a path to the filename. For other backends, it is the database name.
# Note for Oracle, options={ 'threaded':true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "port=0" and then "name=:/".
## engine=sqlite3
## host=
## port=
## user=
## password=
## name=desktop/desktop.db
## options={}
# Configuration options for specifying the Desktop session.
# For more info, see https://docs.djangoproject.com/en/1.4/topics/http/sessions/
# ------------------------------------------------------------------------
[[session]]
# The cookie containing the users' session ID will expire after this amount of time in seconds.
# Default is 2 weeks.
## ttl=1209600
# The cookie containing the users' session ID will be secure.
# Should only be enabled with HTTPS.
## secure=false
# The cookie containing the users' session ID will use the HTTP only flag.
## http_only=false
# Use session-length cookies. Logs out the user when she closes the browser window.
## expire_at_browser_close=false
# Configuration options for connecting to an external SMTP server
# ------------------------------------------------------------------------
[[smtp]]
# The SMTP server information for email notification delivery
host=localhost
port=25
user=
password=
# Whether to use a TLS (secure) connection when talking to the SMTP server
tls=no
# Default email address to use for various automated notification from Hue
## default_from_email=hue@localhost
# Configuration options for Kerberos integration for secured Hadoop clusters
# ------------------------------------------------------------------------
[[kerberos]]
# Path to Hue's Kerberos keytab file
## hue_keytab=
# Kerberos principal name for Hue
## hue_principal=hue/hostname.foo.com
# Path to kinit
## kinit_path=/path/to/kinit
# Configuration options for using OAuthBackend (Core) login
# ------------------------------------------------------------------------
[[oauth]]
# The Consumer key of the application
## consumer_key=XXXXXXXXXXXXXXXXXXXXX
# The Consumer secret of the application
## consumer_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# The Request token URL
## request_token_url=https://api.twitter.com/oauth/request_token
# The Access token URL
## access_token_url=https://api.twitter.com/oauth/access_token
# The Authorize URL
## authenticate_url=https://api.twitter.com/oauth/authorize
###########################################################################
# Settings to configure SAML
###########################################################################
[libsaml]
# Xmlsec1 binary path. This program should be executable by the user running Hue.
## xmlsec_binary=/usr/local/bin/xmlsec1
# Entity ID for Hue acting as service provider.
# Can also accept a pattern where '' will be replaced with server URL base.
## entity_id="/saml2/metadata/"
# Create users from SSO on login.
## create_users_on_login=true
# Required attributes to ask for from IdP.
# This requires a comma separated list.
## required_attributes=uid
# Optional attributes to ask for from IdP.
# This requires a comma separated list.
## optional_attributes=
# IdP metadata in the form of a file. This is generally an XML file containing metadata that the Identity Provider generates.
## metadata_file=
# Private key to encrypt metadata with.
## key_file=
# Signed certificate to send along with encrypted metadata.
## cert_file=
# A mapping from attributes in the response from the IdP to django user attributes.
## user_attribute_mapping={ 'uid':'username'}
# Have Hue initiated authn requests be signed and provide a certificate.
## authn_requests_signed=false
# Have Hue initiated logout requests be signed and provide a certificate.
## logout_requests_signed=false
# Username can be sourced from 'attributes' or 'nameid'.
## username_source=attributes
# Performs the logout or not.
## logout_enabled=true
###########################################################################
# Settings to configure OpenId
###########################################################################
[libopenid]
# (Required) OpenId SSO endpoint url.
## server_endpoint_url=https://www.google.com/accounts/o8/id
# OpenId 1.1 identity url prefix to be used instead of SSO endpoint url
# This is only supported if you are using an OpenId 1.1 endpoint
## identity_url_prefix=https://app.onelogin.com/openid/your_company.com/
# Create users from OPENID on login.
## create_users_on_login=true
# Use email for username
## use_email_for_username=true
###########################################################################
# Settings to configure OAuth
###########################################################################
[liboauth]
# NOTE:
# To work, each of the active (i.e. uncommented) service must have
# applications created on the social network.
# Then the "consumer key" and "consumer secret" must be provided here.
#
# The addresses where to do so are:
# Twitter: https://dev.twitter.com/apps
# Google+ : https://cloud.google.com/
# Facebook: https://developers.facebook.com/apps
# Linkedin: https://www.linkedin.com/secure/developer
#
# Additionnaly, the following must be set in the application settings:
# Twitter: Callback URL (aka Redirect URL) must be set to http://YOUR_HUE_IP_OR_DOMAIN_NAME/oauth/social_login/oauth_authenticated
# Google+ : CONSENT SCREEN must have email address
# Facebook: Sandbox Mode must be DISABLED
# Linkedin: "In OAuth User Agreement", r_emailaddress is REQUIRED
# The Consumer key of the application
## consumer_key_twitter=
## consumer_key_google=
## consumer_key_facebook=
## consumer_key_linkedin=
# The Consumer secret of the application
## consumer_secret_twitter=
## consumer_secret_google=
## consumer_secret_facebook=
## consumer_secret_linkedin=
# The Request token URL
## request_token_url_twitter=https://api.twitter.com/oauth/request_token
## request_token_url_google=https://accounts.google.com/o/oauth2/auth
## request_token_url_linkedin=https://www.linkedin.com/uas/oauth2/authorization
## request_token_url_facebook=https://graph.facebook.com/oauth/authorize
# The Access token URL
## access_token_url_twitter=https://api.twitter.com/oauth/access_token
## access_token_url_google=https://accounts.google.com/o/oauth2/token
## access_token_url_facebook=https://graph.facebook.com/oauth/access_token
## access_token_url_linkedin=https://api.linkedin.com/uas/oauth2/accessToken
# The Authenticate URL
## authenticate_url_twitter=https://api.twitter.com/oauth/authorize
## authenticate_url_google=https://www.googleapis.com/oauth2/v1/userinfo?access_token=
## authenticate_url_facebook=https://graph.facebook.com/me?access_token=
## authenticate_url_linkedin=https://api.linkedin.com/v1/people/~:(email-address)?format=json&oauth2_access_token=
# Username Map. Json Hash format.
# Replaces username parts in order to simplify usernames obtained
# Example: { "@sub1.domain.com":"_S1", "@sub2.domain.com":"_S2"}
# converts '[email protected]' to 'email_S1'
## username_map={}
# Whitelisted domains (only applies to Google OAuth). CSV format.
## whitelisted_domains_google=
###########################################################################
# Settings for the RDBMS application
###########################################################################
[librdbms]
# The RDBMS app can have any number of databases configured in the databases
# section. A database is known by its section name
# (IE sqlite, mysql, psql, and oracle in the list below).
[[databases]]
# sqlite configuration.
## [[[sqlite]]]
# Name to show in the UI.
## nice_name=SQLite
# For SQLite, name defines the path to the database.
## name=/tmp/sqlite.db
# Database backend to use.
## engine=sqlite
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
# mysql, oracle, or postgresql configuration.
## [[[mysql]]]
# Name to show in the UI.
## nice_name="My SQL DB"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
## name=mysqldb
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
## engine=mysql
# IP or hostname of the database to connect to.
## host=localhost
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
## port=3306
# Username to authenticate with when connecting to the database.
## user=example
# Password matching the username to authenticate with when
# connecting to the database.
## password=example
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
###########################################################################
# Settings to configure your Hadoop cluster.
###########################################################################
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://h1:8020
# NameNode logical name.
logical_name=h1
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://h1:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
security_enabled=false
# Default umask for file and directory creation, specified in an octal value.
umask=022
hadoop_conf_dir=/home/search/hadoop/etc/hadoop
# Configuration for YARN (MR2)
# ------------------------------------------------------------------------
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=h1
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://h1:8088
# URL of the ProxyServer API
proxy_api_url=http://h1:8088
# URL of the HistoryServer API
history_server_api_url=http://h1:19888
# HA support by specifying multiple clusters
# e.g.
# [[[ha]]]
# Resource Manager logical name (required for HA)
## logical_name=my-rm-name
# Configuration for MapReduce (MR1)
# ------------------------------------------------------------------------
[[mapred_clusters]]
[[[default]]]
# Enter the host on which you are running the Hadoop JobTracker
jobtracker_host=h1
# The port where the JobTracker IPC listens on
#jobtracker_port=8021
# JobTracker logical name for HA
## logical_name=
# Thrift plug-in port for the JobTracker
## thrift_port=9290
# Whether to submit jobs to this cluster
submit_to=False
# Change this if your MapReduce cluster is Kerberos-secured
## security_enabled=false
# HA support by specifying multiple clusters
# e.g.
# [[[ha]]]
# Enter the logical name of the JobTrackers
# logical_name=my-jt-name
###########################################################################
# Settings to configure the Filebrowser app
###########################################################################
[filebrowser]
# Location on local filesystem where the uploaded archives are temporary stored.
## archive_upload_tempdir=/tmp
###########################################################################
# Settings to configure liboozie
###########################################################################
[liboozie]
# The URL where the Oozie service runs on. This is required in order for
# users to submit jobs. Empty value disables the config check.
## oozie_url=http://localhost:11000/oozie
oozie_url=http://h1:11000/oozie
# Requires FQDN in oozie_url if enabled
## security_enabled=false
# Location on HDFS where the workflows/coordinator are deployed when submitted.
remote_deployement_dir=/user/hue/oozie/deployments
###########################################################################
# Settings to configure the Oozie app
###########################################################################
[oozie]
# Location on local FS where the examples are stored.
local_data_dir=apps/oozie/examples/
# Location on local FS where the data for the examples is stored.
## sample_data_dir=...thirdparty/sample_data
# Location on HDFS where the oozie examples and workflows are stored.
remote_data_dir=apps/oozie/workspaces
# Maximum of Oozie workflows or coodinators to retrieve in one API call.
oozie_jobs_count=100
# Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
## enable_cron_scheduling=true
enable_cron_scheduling=true
###########################################################################
# Settings to configure Beeswax with Hive
###########################################################################
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=h1
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/home/search/hive/conf
# Timeout in seconds for thrift calls to Hive service
server_conn_timeout=120
# Set a LIMIT clause when browsing a partitioned table.
# A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
browse_partitioned_table_limit=250
# A limit to the number of rows that can be downloaded from a query.
# A value of -1 means there will be no limit.
# A maximum of 65,000 is applied to XLS downloads.
download_row_limit=1000000
# Hue will try to close the Hive query when the user leaves the editor page.
# This will free all the query resources in HiveServer2, but also make its results inaccessible.
## close_queries=false
# Thrift version to use when communicating with HiveServer2
## thrift_version=5
[[ssl]]
# SSL communication enabled for this server.
## enabled=false
# Path to Certificate Authority certificates.
## cacerts=/etc/hue/cacerts.pem
# Path to the private key file.
## key=/etc/hue/key.pem
# Path to the public certificate file.
## cert=/etc/hue/cert.pem
# Choose whether Hue should validate certificates received from the server.
## validate=true
###########################################################################
# Settings to configure Pig
###########################################################################
[pig]
# Location of piggybank.jar on local filesystem.
local_sample_dir=/home/search/hue/apps/pig/examples
# Location piggybank.jar will be copied to in HDFS.
remote_data_dir=/home/search/pig/examples
###########################################################################
# Settings to configure Sqoop
###########################################################################
[sqoop]
# For autocompletion, fill out the librdbms section.
# Sqoop server URL
server_url=http://h1:12000/sqoop
###########################################################################
# Settings to configure Proxy
###########################################################################
[proxy]
# Comma-separated list of regular expressions,
# which match 'host:port' of requested proxy target.
## whitelist=(localhost|127\.0\.0\.1):(50030|50070|50060|50075)
# Comma-separated list of regular expressions,
# which match any prefix of 'host:port/path' of requested proxy target.
# This does not support matching GET parameters.
## blacklist=
###########################################################################
# Settings to configure Impala
###########################################################################
[impala]
# Host of the Impala Server (one of the Impalad)
## server_host=localhost
# Port of the Impala Server
## server_port=21050
# Kerberos principal
## impala_principal=impala/hostname.foo.com
# Turn on/off impersonation mechanism when talking to Impala
## impersonation_enabled=False
# Number of initial rows of a result set to ask Impala to cache in order
# to support re-fetching them for downloading them.
# Set to 0 for disabling the option and backward compatibility.
## querycache_rows=50000
# Timeout in seconds for thrift calls
## server_conn_timeout=120
# Hue will try to close the Impala query when the user leaves the editor page.
# This will free all the query resources in Impala, but also make its results inaccessible.
## close_queries=true
# If QUERY_TIMEOUT_S > 0, the query will be timed out (i.e. cancelled) if Impala does not do any work
# (compute or send back results) for that query within QUERY_TIMEOUT_S seconds.
## query_timeout_s=600
###########################################################################
# Settings to configure HBase Browser
###########################################################################
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
# Use full hostname with security.
## hbase_clusters=(Cluster|localhost:9090)
# HBase configuration directory, where hbase-site.xml is located.
## hbase_conf_dir=/etc/hbase/conf
# Hard limit of rows or columns per row fetched before truncating.
## truncate_limit = 500
# 'buffered' is the default of the HBase Thrift Server and supports security.
# 'framed' can be used to chunk up responses,
# which is useful when used in conjunction with the nonblocking server in Thrift.
## thrift_transport=buffered
###########################################################################
# Settings to configure Solr Search
###########################################################################
[search]
# URL of the Solr Server
solr_url=http://172.21.50.41:8983/solr/
# Requires FQDN in solr_url if enabled
## security_enabled=false
## Query sent when no term is entered
## empty_query=*:*
###########################################################################
# Settings to configure Solr Indexer
###########################################################################
[indexer]
# Location of the solrctl binary.
## solrctl_path=/usr/bin/solrctl
# Location of the solr home.
## solr_home=/usr/lib/solr
# Zookeeper ensemble.
## solr_zk_ensemble=localhost:2181/solr
# The contents of this directory will be copied over to the solrctl host to its temporary directory.
## config_template_path=/../hue/desktop/libs/indexer/src/data/solr_configs
###########################################################################
# Settings to configure Job Designer
###########################################################################
[jobsub]
# Location on local FS where examples and template are stored.
## local_data_dir=..../data
# Location on local FS where sample data is stored
## sample_data_dir=...thirdparty/sample_data
###########################################################################
# Settings to configure Job Browser
###########################################################################
[jobbrowser]
# Share submitted jobs information with all users. If set to false,
# submitted jobs are visible only to the owner and administrators.
## share_jobs=true
###########################################################################
# Settings to configure the Zookeeper application.
###########################################################################
[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=zk1:2181
# The URL of the REST contrib service (required for znode browsing)
## rest_url=http://localhost:9998
###########################################################################
# Settings to configure the Spark application.
###########################################################################
[spark]
# URL of the REST Spark Job Server.
server_url=http://h1:8080/
###########################################################################
# Settings for the User Admin application
###########################################################################
[useradmin]
# The name of the default user group that users will be a member of
## default_user_group=default
###########################################################################
# Settings for the Sentry lib
###########################################################################
[libsentry]
# Hostname or IP of server.
## hostname=localhost
# Port the sentry service is running on.
## port=8038
# Sentry configuration directory, where sentry-site.xml is located.
## sentry_conf_dir=/etc/sentry/conf

编译好的目录如下：

Java代码

-rw-rw-r-- 1 search search 2782 5月 19 06:04 app.reg
-rw-rw-r-- 1 search search 2782 5月 19 05:41 app.reg.bak
drwxrwxr-x 22 search search 4096 5月 20 01:05 apps
drwxrwxr-x 3 search search 4096 5月 19 05:41 build
drwxr-xr-x 2 search search 4096 5月 19 05:40 data
drwxrwxr-x 7 search search 4096 5月 20 01:29 desktop
drwxrwxr-x 2 search search 4096 5月 19 05:41 dist
drwxrwxr-x 7 search search 4096 5月 19 05:40 docs
drwxrwxr-x 3 search search 4096 5月 19 05:40 ext
-rw-rw-r-- 1 search search 11358 5月 19 05:38 LICENSE.txt
drwxrwxr-x 2 search search 4096 5月 20 01:29 logs
-rw-rw-r-- 1 search search 8121 5月 19 05:41 Makefile
-rw-rw-r-- 1 search search 8505 5月 19 05:41 Makefile.sdk
-rw-rw-r-- 1 search search 3093 5月 19 05:40 Makefile.tarball
-rw-rw-r-- 1 search search 3498 5月 19 05:41 Makefile.vars
-rw-rw-r-- 1 search search 2302 5月 19 05:41 Makefile.vars.priv
drwxrwxr-x 2 search search 4096 5月 19 05:41 maven
-rw-rw-r-- 1 search search 801 5月 19 05:40 NOTICE.txt
-rw-rw-r-- 1 search search 4733 5月 19 05:41 README.rst
-rw-rw-r-- 1 search search 52 5月 19 05:38 start.sh
-rw-rw-r-- 1 search search 65 5月 19 05:41 stop.sh
drwxrwxr-x 9 search search 4096 5月 19 05:38 tools
-rw-rw-r-- 1 search search 932 5月 19 05:41 VERSION

6，启动hue，执行命令：build/env/bin/supervisor

Java代码

[search@h1 hue]$ build/env/bin/supervisor
[INFO] Not running as root, skipping privilege drop
starting server with options { 'ssl_certificate': None, 'workdir': None, 'server_name': 'localhost', 'host': '0.0.0.0', 'daemonize': False, 'threads': 10, 'pidfile': None, 'ssl_private_key': None, 'server_group': 'search', 'ssl_cipher_list': 'DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2', 'port': 8000, 'server_user': 'search'}

然后我们就可以访问安装机ip+8000端口来查看了：

工具箱界面：

hive的界面：

在配置hive（散仙这里是0.13的版本）的时候，需要注意以下几个方面：

hive的metastrore的服务和hiveserver2服务都需要启动
执行下面命令
bin/hive --service metastore
bin/hiveserver2
除此之外，还需要关闭的hive的SAL认证，否则，使用hue访问会出现问题。
注意下面三项的配置

Java代码

hive.metastore.warehouse.dir
/user/hive/warehouse
location of default database for the warehouse
hive.server2.thrift.port
10000
Port number of HiveServer2 Thrift interface.
Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
hive.server2.thrift.bind.host
h1
Bind host on which to run the HiveServer2 Thrift interface.
Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST
hive.server2.authentication
NOSASL
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module.

除了上面的配置外，还需要把hive.server2.long.polling.timeout的参数值，默认是5000L给改成5000,否则使用beenline连接时候，会出错，这是hive的一个bug。

pig的界面：

solr的界面如下：

最后需要注意一点，hue也需要在hadoop的core-site.xml里面配置相应的代理用户，示例如下：

Java代码

hadoop.proxyuser.hue.hosts
*
hadoop.proxyuser.hue.groups
*

ok至此，我们的hue已经能完美工作了，我们可以根据自己的需要，定制相应的app插件，非常灵活！

转载于:https://www.cnblogs.com/bluejoe/p/5213641.html

你可能感兴趣的:(ldap,java,大数据)

Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
LocalDateTime 转 String igotyback java 开发语言
importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat
Linux下QT开发的动态库界面弹出操作（SDL2） 13jjyao QT类 qt 开发语言 sdl2 linux
需求：操作系统为linux，开发框架为qt，做成需带界面的qt动态库，调用方为java等非qt程序难点：调用方为java等非qt程序，也就是说调用方肯定不带QApplication::exec()，缺少了这个，QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出)；这与qt调用本身qt库是有本质的区别的思路：1.调用方缺QApplication::exec()，那么我们在接口
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
【华为OD机试真题2023B卷 JAVA&JS】We Are A Team 若博豆 java 算法华为 javascript
华为OD2023（B卷）机试题库全覆盖，刷题指南点这里WeAreATeam时间限制：1秒|内存限制：32768K|语言限制：不限题目描述：总共有n个人在机房，每个人有一个标号（1<=标号<=n），他们分成了多个团队，需要你根据收到的m条消息判定指定的两个人是否在一个团队中，具体的：1、消息构成为：abc，整数a、b分别代
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
Java 重写(Override)与重载(Overload) 叨唧唧的
Java重写(Override)与重载(Overload)重写(Override)重写是子类对父类的允许访问的方法的实现过程进行重新编写,返回值和形参都不能改变。即外壳不变，核心重写！重写的好处在于子类可以根据需要，定义特定于自己的行为。也就是说子类能够根据需要实现父类的方法。重写方法不能抛出新的检查异常或者比被重写方法申明更加宽泛的异常。例如：父类的一个方法申明了一个检查异常IOExceptio
简单了解 JVM 记得开心一点啊 jvm
目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称，意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统（如：JVM、VMwave、VirtualBox）。JVM和其他两个虚拟机
1分钟解决 -bash: mvn: command not found，在Centos 7中安装Maven Energet!c 开发语言
1分钟解决-bash:mvn:commandnotfound，在Centos7中安装Maven检查Java环境1下载Maven2解压Maven3配置环境变量4验证安装5常见问题与注意事项6总结检查Java环境Maven依赖Java环境，请确保系统已经安装了Java并配置了环境变量。可以通过以下命令检查：java-version如果未安装，请先安装Java。1下载Maven从官网下载：前往Apach
Java企业面试题3 马龙强_ java
1.break和continue的作用(智*图)break：用于完全退出一个循环（如for,while）或一个switch语句。当在循环体内遇到break语句时，程序会立即跳出当前循环体，继续执行循环之后的代码。continue：用于跳过当前循环体中剩余的部分，并开始下一次循环。如果是在for循环中使用continue，则会直接进行条件判断以决定是否执行下一轮循环。2.if分支语句和switch分
JVM、JRE和 JDK：理解Java开发的三大核心组件 Y雨何时停T Java java
Java是一门跨平台的编程语言，它的成功离不开背后强大的运行环境与开发工具的支持。在Java的生态中，JVM（Java虚拟机）、JRE（Java运行时环境）和JDK（Java开发工具包）是三个至关重要的核心组件。本文将探讨JVM、JDK和JRE的区别，帮助你更好地理解Java的运行机制。1.JVM：Java虚拟机（JavaVirtualMachine）什么是JVM？JVM，即Java虚拟机，是Ja
Java面试题精选：消息队列(二) 芒果不是芒 Java面试题精选 java kafka
一、Kafka的特性1.消息持久化：消息存储在磁盘，所以消息不会丢失2.高吞吐量：可以轻松实现单机百万级别的并发3.扩展性：扩展性强，还是动态扩展4.多客户端支持：支持多种语言（Java、C、C++、GO、）5.KafkaStreams（一个天生的流处理）:在双十一或者销售大屏就会用到这种流处理。使用KafkaStreams可以快速的把销售额统计出来6.安全机制：Kafka进行生产或者消费的时候会
白骑士的Java教学基础篇 2.5 控制流语句白骑士所长 Java 教学 java 开发语言
欢迎继续学习Java编程的基础篇！在前面的章节中，我们了解了Java的变量、数据类型和运算符。接下来，我们将探讨Java中的控制流语句。控制流语句用于控制程序的执行顺序，使我们能够根据特定条件执行不同的代码块，或重复执行某段代码。这是编写复杂程序的基础。通过学习这一节内容，你将掌握如何使用条件语句和循环语句来编写更加灵活和高效的代码。条件语句条件语句用于根据条件的真假来执行不同的代码块。if语句‘
python语法——三目运算符 HappyRocking python python 三目运算符
在java中，有三目运算符，如：intc=(a>b)?a:b表示c取两者中的较大值。但是在python，不能直接这样使用，估计是因为冒号在python有分行的关键作用。那么在python中，如何实现类似功能呢？可以使用ifelse语句，也是一行可以完成，格式为：aifbelsec表示如果b为True，则表达式等于a，否则等于c。如：c=(aif(a>b)elseb)同样是完成了取最大值的功能。
ArrayList 源码解析程序猿进阶 Java基础 ArrayList List java 面试性能优化架构设计 idea
ArrayList是Java集合框架中的一个动态数组实现，提供了可变大小的数组功能。它继承自AbstractList并实现了List接口，是顺序容器，即元素存放的数据与放进去的顺序相同，允许放入null元素，底层通过数组实现。除该类未实现同步外，其余跟Vector大致相同。每个ArrayList都有一个容量capacity，表示底层数组的实际大小，容器内存储元素的个数不能多于当前容量。当向容器中添
Java爬虫框架（一）--架构设计狼图腾-狼之传说 java 框架 java 任务 html解析器存储电子商务
一、架构图那里搜网络爬虫框架主要针对电子商务网站进行数据爬取，分析，存储，索引。爬虫：爬虫负责爬取，解析，处理电子商务网站的网页的内容数据库：存储商品信息索引：商品的全文搜索索引Task队列：需要爬取的网页列表Visited表：已经爬取过的网页列表爬虫监控平台：web平台可以启动，停止爬虫，管理爬虫，task队列，visited表。二、爬虫1.流程1)Scheduler启动爬虫器，TaskMast
Java：爬虫框架 dingcho Java java 爬虫
一、ApacheNutch2【参考地址】Nutch是一个开源Java实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。Nutch致力于让每个人能很容易,同时花费很少就可以配置世界一流的Web搜索引擎.为了完成这一宏伟的目标,Nutch必须能够做到:每个月取几十亿网页为这些网页维护一个索引对索引文件进行每秒上千次的搜索提供高质量的搜索结果简单来说Nutch支持分
python怎么将png转为tif_png转tif weixin_39977276
发国外的文章要求图片是tif，cmyk色彩空间的。大小尺寸还有要求。比如网上大神多，找到了一段代码，感谢！https://www.jianshu.com/p/ec2af4311f56https://github.com/KevinZc007/image2Tifimportjava.awt.image.BufferedImage;importjava.io.File;importjava.io.Fi
JavaScript 中，深拷贝（Deep Copy）和浅拷贝（Shallow Copy）跳房子的前端前端面试 javascript 开发语言 ecmascript
在JavaScript中，深拷贝（DeepCopy）和浅拷贝（ShallowCopy）是用于复制对象或数组的两种不同方法。了解它们的区别和应用场景对于避免潜在的bugs和高效地处理数据非常重要。以下是对深拷贝和浅拷贝的详细解释，包括它们的概念、用途、优缺点以及实现方式。1.浅拷贝（ShallowCopy）概念定义：浅拷贝是指创建一个新的对象或数组，其中包含了原对象或数组的基本数据类型的值和对引用数
JAVA·一个简单的登录窗口 MortalTom java 开发语言学习
文章目录概要整体架构流程技术名词解释技术细节资源概要JavaSwing是Java基础类库的一部分，主要用于开发图形用户界面（GUI）程序整体架构流程新建项目，导入sql.jar包（链接放在了文末），编译项目并运行技术名词解释一、特点丰富的组件提供了多种可视化组件，如按钮（JButton）、文本框（JTextField）、标签（JLabel）、下拉列表（JComboBox）等，可以满足不同的界面设计
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
博客网站制作教程 2401_85194651 java maven
首先就是技术框架：后端：Java+SpringBoot数据库：MySQL前端：Vue.js数据库连接：JPA(JavaPersistenceAPI)1.项目结构blog-app/├──backend/│├──src/main/java/com/example/blogapp/││├──BlogApplication.java││├──config/│││└──DatabaseConfig.java
00. 这里整理了最全的爬虫框架（Java + Python）有一只柴犬爬虫系列爬虫 java python
目录1、前言2、什么是网络爬虫3、常见的爬虫框架3.1、java框架3.1.1、WebMagic3.1.2、Jsoup3.1.3、HttpClient3.1.4、Crawler4j3.1.5、HtmlUnit3.1.6、Selenium3.2、Python框架3.2.1、Scrapy3.2.2、BeautifulSoup+Requests3.2.3、Selenium3.2.4、PyQuery3.2
JAVA学习笔记之23种设计模式学习 victorfreedom Java技术设计模式 android java 常用设计模式
博主最近买了《设计模式》这本书来学习，无奈这本书是以C++语言为基础进行说明，整个学习流程下来效率不是很高，虽然有的设计模式通俗易懂，但感觉还是没有充分的掌握了所有的设计模式。于是博主百度了一番，发现有大神写过了这方面的问题，于是博主迅速拿来学习。一、设计模式的分类总体来说设计模式分为三大类：创建型模式，共五种：工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式。结构型模式，共七种：适配器
JavaScript `Map` 和 `WeakMap`详细解释跳房子的前端 JavaScript 原生方法 javascript 前端开发语言
在JavaScript中，Map和WeakMap都是用于存储键值对的数据结构，但它们有一些关键的不同之处。MapMap是一种可以存储任意类型的键值对的集合。它保持了键值对的插入顺序，并且可以通过键快速查找对应的值。Map提供了一些非常有用的方法和属性来操作这些数据对：set(key,value):将一个键值对添加到Map中。如果键已经存在，则更新其对应的值。get(key):获取指定键的值。如果键
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
怎么样才能成为专业的程序员？ cocos2d-x小菜编程 PHP
如何要想成为一名专业的程序员？仅仅会写代码是不够的。从团队合作去解决问题到版本控制，你还得具备其他关键技能的工具包。当我们询问相关的专业开发人员，那些必备的关键技能都是什么的时候，下面是我们了解到的情况。关于如何学习代码，各种声音很多，然后很多人就被误导为成为专业开发人员懂得一门编程语言就够了？！呵呵，就像其他工作一样，光会一个技能那是远远不够的。如果你想要成为
java web开发高并发处理 BreakingBad java Web 并发开发处理高
java处理高并发高负载类网站中数据库的设计方法（java教程,java处理大量数据，java高负载数据）一：高并发高负载类网站关注点之数据库没错,首先是数据库,这是大多数应用所面临的首个SPOF。尤其是Web2.0的应用，数据库的响应是首先要解决的。一般来说MySQL是最常用的，可能最初是一个mysql主机，当数据增加到100万以上，那么，MySQL的效能急剧下降。常用的优化措施是M-S（
mysql批量更新 ekian mysql
mysql更新优化：一版的更新的话都是采用update set的方式，但是如果需要批量更新的话，只能for循环的执行更新。或者采用executeBatch的方式，执行更新。无论哪种方式，性能都不见得多好。三千多条的更新，需要3分多钟。查询了批量更新的优化，有说replace into的方式，即： replace into tableName(id,status) values
微软BI（3） 18289753290 微软BI SSIS
1) Q：该列违反了完整性约束错误；已获得 OLE DB 记录。源:“Microsoft SQL Server Native Client 11.0” Hresult: 0x80004005 说明:“不能将值 NULL 插入列 'FZCHID'，表 'JRB_EnterpriseCredit.dbo.QYFZCH'；列不允许有 Null 值。INSERT 失败。”。 A：一般这类问题的存在是
Java中的List g21121 java
List是一个有序的 collection（也称为序列）。此接口的用户可以对列表中每个元素的插入位置进行精确地控制。用户可以根据元素的整数索引（在列表中的位置）访问元素，并搜索列表中的元素。与 set 不同，列表通常允许重复
读书笔记永夜-极光读书笔记
1. K是一家加工厂,需要采购原材料,有A,B,C,D 4家供应商,其中A给出的价格最低,性价比最高,那么假如你是这家企业的采购经理,你会如何决策? 传统决策: A:100%订单 B,C,D:0% &nbs
centos 安装 Codeblocks 随便小屋 codeblocks
1.安装gcc,需要c和c++两部分,默认安装下,CentOS不安装编译器的,在终端输入以下命令即可yum install gccyum install gcc-c++ 2.安装gtk2-devel,因为默认已经安装了正式产品需要的支持库,但是没有安装开发所需要的文档.yum install gtk2* 3. 安装wxGTK yum search w
23种设计模式的形象比喻 aijuans 设计模式
1、ABSTRACT FACTORY—追MM少不了请吃饭了，麦当劳的鸡翅和肯德基的鸡翅都是MM爱吃的东西，虽然口味有所不同，但不管你带MM去麦当劳或肯德基，只管向服务员说“来四个鸡翅”就行了。麦当劳和肯德基就是生产鸡翅的Factory 　　工厂模式：客户类和工厂类分开。消费者任何时候需要某种产品，只需向工厂请求即可。消费者无须修改就可以接纳新产品。缺点是当产品修改时，工厂类也要做相应的修改。如：
开发管理 CheckLists aoyouzi 开发管理 CheckLists
开发管理 CheckLists(23) -使项目组度过完整的生命周期开发管理 CheckLists(22) -组织项目资源开发管理 CheckLists(21) -控制项目的范围开发管理 CheckLists(20) -项目利益相关者责任开发管理 CheckLists(19) -选择合适的团队成员开发管理 CheckLists(18) -敏捷开发 Scrum Master 工作开发管理 C
js实现切换百合不是茶 JavaScript 栏目切换
js主要功能之一就是实现页面的特效,窗体的切换可以减少页面的大小,被门户网站大量应用思路: 1,先将要显示的设置为display:bisible 否则设为none 2,设置栏目的id ,js获取栏目的id,如果id为Null就设置为显示 3,判断js获取的id名字;再设置是否显示代码实现: html代码: <di
周鸿祎在360新员工入职培训上的讲话 bijian1013 感悟项目管理人生职场
这篇文章也是最近偶尔看到的，考虑到原博客发布者可能将其删除等原因，也更方便个人查找，特将原文拷贝再发布的。“学东西是为自己的，不要整天以混的姿态来跟公司博弈，就算是混，我觉得你要是能在混的时间里，收获一些别的有利于人生发展的东西，也是不错的，看你怎么把握了”，看了之后，对这句话记忆犹新。 &
前端Web开发的页面效果 Bill_chen html Web Microsoft
1.IE6下png图片的透明显示： <img src="图片地址" border="0" style="Filter.Alpha(Opacity)=数值(100),style=数值(3)"/> 或在<head></head>间加一段JS代码让透明png图片正常显示。 2.<li>标
【JVM五】老年代垃圾回收：并发标记清理GC(CMS GC) bit1129 垃圾回收
CMS概述并发标记清理垃圾回收(Concurrent Mark and Sweep GC）算法的主要目标是在GC过程中，减少暂停用户线程的次数以及在不得不暂停用户线程的请夸功能，尽可能短的暂停用户线程的时间。这对于交互式应用，比如web应用来说，是非常重要的。 CMS垃圾回收针对新生代和老年代采用不同的策略。相比同吞吐量垃圾回收，它要复杂的多。吞吐量垃圾回收在执
Struts2技术总结白糖_ struts2
必备jar文件早在struts2.0.*的时候，struts2的必备jar包需要如下几个： commons-logging-*.jar Apache旗下commons项目的log日志包 freemarker-*.jar
Jquery easyui layout应用注意事项 bozch jquery 浏览器 easyui layout
在jquery easyui中提供了easyui-layout布局，他的布局比较局限，类似java中GUI的border布局。下面对其使用注意事项作简要介绍：如果在现有的工程中前台界面均应用了jquery easyui，那么在布局的时候最好应用jquery eaysui的layout布局，否则在表单页面（编辑、查看、添加等等）在不同的浏览器会出
java-拷贝特殊链表：有一个特殊的链表，其中每个节点不但有指向下一个节点的指针pNext，还有一个指向链表中任意节点的指针pRand，如何拷贝这个特殊链表？ bylijinnan java
public class CopySpecialLinkedList { /** * 题目：有一个特殊的链表，其中每个节点不但有指向下一个节点的指针pNext，还有一个指向链表中任意节点的指针pRand，如何拷贝这个特殊链表？拷贝pNext指针非常容易，所以题目的难点是如何拷贝pRand指针。假设原来链表为A1 -> A2 ->... -> An，新拷贝
color Chen.H JavaScript html css
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML> <HEAD>&nbs
[信息与战争]移动通讯与网络 comsci 网络
两个坚持:手机的电池必须可以取下来光纤不能够入户,只能够到楼宇建议大家找这本书看看:<&
oracle flashback query(闪回查询) daizj oracle flashback query flashback table
在Oracle 10g中，Flash back家族分为以下成员： Flashback Database Flashback Drop Flashback Table Flashback Query(分Flashback Query,Flashback Version Query，Flashback Transaction Query) 下面介绍一下Flashback Drop 和Flas
zeus持久层DAO单元测试 deng520159 单元测试
zeus代码测试正紧张进行中,但由于工作比较忙,但速度比较慢.现在已经完成读写分离单元测试了,现在把几种情况单元测试的例子发出来,希望有人能进出意见,让它走下去. 本文是zeus的dao单元测试: 1.单元测试直接上代码 package com.dengliang.zeus.webdemo.test; import org.junit.Test; import o
C语言学习三printf函数和scanf函数学习 dcj3sjt126com c printf scanf language
printf函数 /* 2013年3月10日20:42:32 地点：北京潘家园功能：目的：测试%x %X %#x %#X的用法 */ # include <stdio.h> int main(void) { printf("哈哈！\n"); // \n表示换行 int i = 10; printf
那你为什么小时候不好好读书? dcj3sjt126com life
dady, 我今天捡到了十块钱, 不过我还给那个人了 good girl! 那个人有没有和你讲thank you啊没有啦....他拉我的耳朵我才把钱还给他的, 他哪里会和我讲thank you 爸爸, 如果地上有一张5块一张10块你拿哪一张呢.... 当然是拿十块的咯... 爸爸你很笨的, 你不会两张都拿爸爸为什么上个月那个人来跟你讨钱, 你告诉他没
iptables开放端口 Fanyucai linux iptables 端口
1，找到配置文件 vi /etc/sysconfig/iptables 2，添加端口开放，增加一行，开放18081端口 -A INPUT -m state --state NEW -m tcp -p tcp --dport 18081 -j ACCEPT 3，保存 ESC :wq! 4，重启服务 service iptables
Ehcache（05）——缓存的查询 234390216 排序 ehcache 统计 query
缓存的查询目录 1. 使Cache可查询 1.1 基于Xml配置 1.2 基于代码的配置 2 指定可搜索的属性 2.1 可查询属性类型 2.2 &
通过hashset找到数组中重复的元素 jackyrong hashset
如何在hashset中快速找到重复的元素呢?方法很多，下面是其中一个办法： int[] array = {1,1,2,3,4,5,6,7,8,8}; Set<Integer> set = new HashSet<Integer>(); for(int i = 0
使用ajax和window.history.pushState无刷新改变页面内容和地址栏URL lanrikey history
后退时关闭当前页面 <script type="text/javascript"> jQuery(document).ready(function ($) { if (window.history && window.history.pushState) {
应用程序的通信成本 netkiller.github.com 虚拟机应用服务器陈景峰 netkiller neo
应用程序的通信成本什么是通信一个程序中两个以上功能相互传递信号或数据叫做通信。什么是成本这是是指时间成本与空间成本。时间就是传递数据所花费的时间。空间是指传递过程耗费容量大小。都有哪些通信方式全局变量线程间通信共享内存共享文件管道 Socket 硬件（串口，USB）等等全局变量全局变量是成本最低通信方法，通过设置
一维数组与二维数组的声明与定义恋洁e生二维数组一维数组定义声明初始化
/** * */ package test20111005; /** * @author FlyingFire * @date:2011-11-18 上午04:33:36 * @author ：代码整理 * @introduce :一维数组与二维数组的初始化 *summary： */ public c
Spring Mybatis独立事务配置 toknowme mybatis
在项目中有很多地方会使用到独立事务，下面以获取主键为例（1）修改配置文件spring-mybatis.xml  <tx:annotation-driven transaction-manager="transactionManager" /> &n
更新Anadroid SDK Tooks之后，Eclipse提示No update were found xp9802 eclipse
使用Android SDK Manager 更新了Anadroid SDK Tooks 之后，打开eclipse提示 This Android SDK requires Android Developer Toolkit version 23.0.0 or above, 点击Check for Updates 检测一会后提示 No update were found