qindongliang1922

Hadoop可视化分析利器之Hue

先来看下hue的架构图：

（1）Hue是什么？

Hue是一个可快速开发和调试Hadoop生态系统各种应用的一个基于浏览器的图形化用户接口。

（2）Hue能干什么？

1，访问HDFS和文件浏览
2，通过web调试和开发hive以及数据结果展示
3，查询solr和结果展示，报表生成
4，通过web调试和开发impala交互式SQL Query
5，spark调试和开发
6，Pig开发和调试
7，oozie任务的开发，监控，和工作流协调调度
8，Hbase数据查询和修改，数据展示
9，Hive的元数据（metastore）查询
10，MapReduce任务进度查看，日志追踪
11，创建和提交MapReduce，Streaming，Java job任务
12，Sqoop2的开发和调试
13，Zookeeper的浏览和编辑
14，数据库（MySQL，PostGres，SQlite，Oracle）的查询和展示

（3）Hue怎么用或者什么时候应该用？

如果你们公司用的是CDH的hadoop，那么很幸运，Hue也是出自CDH公司，自家的东西用起来当然很爽。

如果你们公司用的是Apache Hadoop或者是HDP的hadoop，那么也没事，Hue是开源的，而且支持任何版本的hadoop。

关于什么时候用，这纯属一个锦上添花的功能，你完全可以不用hue，因为各种开源项目都有自己的使用方式和开发接口，hue只不过是统一了各个项目的开发方式在一个接口里而已，这样比较方便而已，不用你一会准备使用hive，就开一个hive的cli终端，一会用pig，你就得开一个pig的grunt，或者你又想查Hbase，又得需要开一个Hbase的shell终端。如果你们使用hadoop生态系统的组件很多的情况下，使用hue还是比较方便的，另外一个好处就是hue提供了一个web的界面来开发和调试任务，不用我们再频繁登陆Linux来操作了。

你可以在任何时候，只要能上网，就可以通过hue来开发和调试数据，不用再装Linux的客户端来远程登陆操作了，这也是B/S架构的好处。

（4）如何下载，安装和编译Hue？

1，hue的依赖（centos系统）

ant
asciidoc
cyrus-sasl-devel
cyrus-sasl-gssapi
gcc
gcc-c++
krb5-devel
libtidy (for unit tests only)
libxml2-devel
libxslt-devel
make
mvn (from maven package or maven3 tarball)
mysql
mysql-devel
openldap-devel
python-devel
sqlite-devel
openssl-devel (for version 7+)

2，散仙的在安装hue前，centos上已经安装好了，jdk，maven，ant，hadoop，hive，oozie等，环境变量如下：

user="search"

# java
export JAVA_HOME="/usr/local/jdk"
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin

# ant 
export ANT_HOME=/usr/local/ant
export CLASSPATH=$CLASSPATH:$ANT_HOME/lib
export PATH=$PATH:$ANT_HOME/bin

# maven 
export MAVEN_HOME="/usr/local/maven"
export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
export PATH=$PATH:$MAVEN_HOME/bin



##Hadoop2.2的变量设置
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_HOME=/home/search/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME



# Hive 

export HIVE_HOME=/home/search/hive  
export HIVE_CONF_DIR=/home/search/hive/conf  
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib  
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf 


export OOZIE_HOME="/home/search/oozie-4.1.0"
export PATH=$PATH:$OOZIE_HOME/sbin:$OOZIE_HOME/bin

3,本文散仙主要是采用tar包的方式安装hue，除了tar包的方式，hue还能采用cm安装，当然这就与cdh的系统依赖比较大了。

hue最新的版本是3.8.1，散仙这里用的3.7.0的版本
下载地址： https://github.com/cloudera/hue/releases

hue的github地址： https://github.com/cloudera/hue

4，下载完后，解压tar包，并进入hue的根目录执行命令
make apps编译

5，编译成功后，需要配置/home/search/hue/desktop/conf/pseudo-distributed.ini文件，里面包含了hdfs，yarn，mapreduce，hive，oozie，pig，spark，solr等的ip地址和端口号配置，可根据自己的情况设置，如果没有安装某个应用，那就无须配置，只不过这个应用在web上不能使用而已，并不会影响其他框架的使用。

一个例子如下：

#####################################
# DEVELOPMENT EDITION
#####################################

# Hue configuration file
# ===================================
#
# For complete documentation about the contents of this file, run
#       $ <hue_root>/build/env/bin/hue config_help
#
# All .ini files under the current directory are treated equally.  Their
# contents are merged to form the Hue configuration, which can
# can be viewed on the Hue at
#       http://<hue_host>:<port>/dump_config


###########################################################################
# General configuration for core Desktop features (authentication, etc)
###########################################################################

[desktop]

  send_dbug_messages=1

  # To show database transactions, set database_logging to 1
  database_logging=0

  # Set this to a random string, the longer the better.
  # This is used for secure hashing in the session store.
  secret_key=search

  # Webserver listens on this address and port
  http_host=0.0.0.0
  http_port=8000

  # Time zone name
  time_zone=Asia/Shanghai

  # Enable or disable Django debug mode
  ## django_debug_mode=true

  # Enable or disable backtrace for server error
  ## http_500_debug_mode=true

  # Enable or disable memory profiling.
  ## memory_profiler=false

  # Server email for internal error messages
  ## django_server_email='[email protected]'

  # Email backend
  ## django_email_backend=django.core.mail.backends.smtp.EmailBackend

  # Webserver runs as this user
  server_user=search
  server_group=search

  # This should be the Hue admin and proxy user
  default_user=search

  # This should be the hadoop cluster admin
  default_hdfs_superuser=search

  # If set to false, runcpserver will not actually start the web server.
  # Used if Apache is being used as a WSGI container.
  ## enable_server=yes

  # Number of threads used by the CherryPy web server
  ## cherrypy_server_threads=10

  # Filename of SSL Certificate
  ## ssl_certificate=

  # Filename of SSL RSA Private Key
  ## ssl_private_key=

  # List of allowed and disallowed ciphers in cipher list format.
  # See http://www.openssl.org/docs/apps/ciphers.html for more information on cipher list format.
  ## ssl_cipher_list=DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2

  # LDAP username and password of the hue user used for LDAP authentications.
  # Set it to use LDAP Authentication with HiveServer2 and Impala.
  ## ldap_username=hue
  ## ldap_password=

  # Default encoding for site data
  ## default_site_encoding=utf-8

  # Help improve Hue with anonymous usage analytics.
  # Use Google Analytics to see how many times an application or specific section of an application is used, nothing more.
  ## collect_usage=true

  # Support for HTTPS termination at the load-balancer level with SECURE_PROXY_SSL_HEADER.
  ## secure_proxy_ssl_header=false

  # Comma-separated list of Django middleware classes to use.
  # See https://docs.djangoproject.com/en/1.4/ref/middleware/ for more details on middlewares in Django.
  ## middleware=desktop.auth.backend.LdapSynchronizationBackend

  # Comma-separated list of regular expressions, which match the redirect URL.
  # For example, to restrict to your local domain and FQDN, the following value can be used:
  # ^\/.*$,^http:\/\/www.mydomain.com\/.*$
  ## redirect_whitelist=

  # Comma separated list of apps to not load at server startup.
  # e.g.: pig,zookeeper
  ## app_blacklist=

  # The directory where to store the auditing logs. Auditing is disable if the value is empty.
  # e.g. /var/log/hue/audit.log
  ## audit_event_log_dir=

  # Size in KB/MB/GB for audit log to rollover.
  ## audit_log_max_file_size=100MB

#poll_enabled=false

  # Administrators
  # ----------------
  [[django_admins]]
    ## [[[admin1]]]
    ## name=john
    ## [email protected]

  # UI customizations
  # -------------------
  [[custom]]

  # Top banner HTML code
  #banner_top_html=Search Team Hadoop Manager

  # Configuration options for user authentication into the web application
  # ------------------------------------------------------------------------
  [[auth]]

    # Authentication backend. Common settings are:
    # - django.contrib.auth.backends.ModelBackend (entirely Django backend)
    # - desktop.auth.backend.AllowAllBackend (allows everyone)
    # - desktop.auth.backend.AllowFirstUserDjangoBackend
    #     (Default. Relies on Django and user manager, after the first login)
    # - desktop.auth.backend.LdapBackend
    # - desktop.auth.backend.PamBackend
    # - desktop.auth.backend.SpnegoDjangoBackend
    # - desktop.auth.backend.RemoteUserDjangoBackend
    # - libsaml.backend.SAML2Backend
    # - libopenid.backend.OpenIDBackend
    # - liboauth.backend.OAuthBackend
    #     (New oauth, support Twitter, Facebook, Google+ and Linkedin
    ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend

    # The service to use when querying PAM.
    ## pam_service=login

    # When using the desktop.auth.backend.RemoteUserDjangoBackend, this sets
    # the normalized name of the header that contains the remote user.
    # The HTTP header in the request is converted to a key by converting
    # all characters to uppercase, replacing any hyphens with underscores
    # and adding an HTTP_ prefix to the name. So, for example, if the header
    # is called Remote-User that would be configured as HTTP_REMOTE_USER
    #
    # Defaults to HTTP_REMOTE_USER
    ## remote_user_header=HTTP_REMOTE_USER

    # Ignore the case of usernames when searching for existing users.
    # Only supported in remoteUserDjangoBackend.
    ## ignore_username_case=false

    # Ignore the case of usernames when searching for existing users to authenticate with.
    # Only supported in remoteUserDjangoBackend.
    ## force_username_lowercase=false

    # Users will expire after they have not logged in for 'n' amount of seconds.
    # A negative number means that users will never expire.
    ## expires_after=-1

    # Apply 'expires_after' to superusers.
    ## expire_superusers=true

  # Configuration options for connecting to LDAP and Active Directory
  # -------------------------------------------------------------------
  [[ldap]]

    # The search base for finding users and groups
    ## base_dn="DC=mycompany,DC=com"

    # URL of the LDAP server
    ## ldap_url=ldap://auth.mycompany.com

    # A PEM-format file containing certificates for the CA's that
    # Hue will trust for authentication over TLS.
    # The certificate for the CA that signed the
    # LDAP server certificate must be included among these certificates.
    # See more here http://www.openldap.org/doc/admin24/tls.html.
    ## ldap_cert=
    ## use_start_tls=true

    # Distinguished name of the user to bind as -- not necessary if the LDAP server
    # supports anonymous searches
    ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"

    # Password of the bind user -- not necessary if the LDAP server supports
    # anonymous searches
    ## bind_password=

    # Pattern for searching for usernames -- Use <username> for the parameter
    # For use when using LdapBackend for Hue authentication
    ## ldap_username_pattern="uid=<username>,ou=People,dc=mycompany,dc=com"

    # Create users in Hue when they try to login with their LDAP credentials
    # For use when using LdapBackend for Hue authentication
    ## create_users_on_login = true

    # Synchronize a users groups when they login
    ## sync_groups_on_login=false

    # Ignore the case of usernames when searching for existing users in Hue.
    ## ignore_username_case=false

    # Force usernames to lowercase when creating new users from LDAP.
    ## force_username_lowercase=false

    # Use search bind authentication.
    ## search_bind_authentication=true

    # Choose which kind of subgrouping to use: nested or suboordinate (deprecated).
    ## subgroups=suboordinate

    # Define the number of levels to search for nested members.
    ## nested_members_search_depth=10

    [[[users]]]

      # Base filter for searching for users
      ## user_filter="objectclass=*"

      # The username attribute in the LDAP schema
      ## user_name_attr=sAMAccountName

    [[[groups]]]

      # Base filter for searching for groups
      ## group_filter="objectclass=*"

      # The username attribute in the LDAP schema
      ## group_name_attr=cn

    [[[ldap_servers]]]

      ## [[[[mycompany]]]]

        # The search base for finding users and groups
        ## base_dn="DC=mycompany,DC=com"

        # URL of the LDAP server
        ## ldap_url=ldap://auth.mycompany.com

        # A PEM-format file containing certificates for the CA's that
        # Hue will trust for authentication over TLS.
        # The certificate for the CA that signed the
        # LDAP server certificate must be included among these certificates.
        # See more here http://www.openldap.org/doc/admin24/tls.html.
        ## ldap_cert=
        ## use_start_tls=true

        # Distinguished name of the user to bind as -- not necessary if the LDAP server
        # supports anonymous searches
        ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"

        # Password of the bind user -- not necessary if the LDAP server supports
        # anonymous searches
        ## bind_password=

        # Pattern for searching for usernames -- Use <username> for the parameter
        # For use when using LdapBackend for Hue authentication
        ## ldap_username_pattern="uid=<username>,ou=People,dc=mycompany,dc=com"

        ## Use search bind authentication.
        ## search_bind_authentication=true

        ## [[[[[users]]]]]

          # Base filter for searching for users
          ## user_filter="objectclass=Person"

          # The username attribute in the LDAP schema
          ## user_name_attr=sAMAccountName

        ## [[[[[groups]]]]]

          # Base filter for searching for groups
          ## group_filter="objectclass=groupOfNames"

          # The username attribute in the LDAP schema
          ## group_name_attr=cn

  # Configuration options for specifying the Desktop Database. For more info,
  # see http://docs.djangoproject.com/en/1.4/ref/settings/#database-engine
  # ------------------------------------------------------------------------
  [[database]]
    # Database engine is typically one of:
    # postgresql_psycopg2, mysql, sqlite3 or oracle.
    #
    # Note that for sqlite3, 'name', below is a a path to the filename. For other backends, it is the database name.
    # Note for Oracle, options={'threaded':true} must be set in order to avoid crashes.
    # Note for Oracle, you can use the Oracle Service Name by setting "port=0" and then "name=<host>:<port>/<service_name>".
    ## engine=sqlite3
    ## host=
    ## port=
    ## user=
    ## password=
    ## name=desktop/desktop.db
    ## options={}

  # Configuration options for specifying the Desktop session.
  # For more info, see https://docs.djangoproject.com/en/1.4/topics/http/sessions/
  # ------------------------------------------------------------------------
  [[session]]
    # The cookie containing the users' session ID will expire after this amount of time in seconds.
    # Default is 2 weeks.
    ## ttl=1209600

    # The cookie containing the users' session ID will be secure.
    # Should only be enabled with HTTPS.
    ## secure=false

    # The cookie containing the users' session ID will use the HTTP only flag.
    ## http_only=false

    # Use session-length cookies. Logs out the user when she closes the browser window.
    ## expire_at_browser_close=false


  # Configuration options for connecting to an external SMTP server
  # ------------------------------------------------------------------------
  [[smtp]]

    # The SMTP server information for email notification delivery
    host=localhost
    port=25
    user=
    password=

    # Whether to use a TLS (secure) connection when talking to the SMTP server
    tls=no

    # Default email address to use for various automated notification from Hue
    ## default_from_email=hue@localhost


  # Configuration options for Kerberos integration for secured Hadoop clusters
  # ------------------------------------------------------------------------
  [[kerberos]]

    # Path to Hue's Kerberos keytab file
    ## hue_keytab=
    # Kerberos principal name for Hue
    ## hue_principal=hue/hostname.foo.com
    # Path to kinit
    ## kinit_path=/path/to/kinit


  # Configuration options for using OAuthBackend (Core) login
  # ------------------------------------------------------------------------
  [[oauth]]
    # The Consumer key of the application
    ## consumer_key=XXXXXXXXXXXXXXXXXXXXX

    # The Consumer secret of the application
    ## consumer_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    # The Request token URL
    ## request_token_url=https://api.twitter.com/oauth/request_token

    # The Access token URL
    ## access_token_url=https://api.twitter.com/oauth/access_token

    # The Authorize URL
    ## authenticate_url=https://api.twitter.com/oauth/authorize


###########################################################################
# Settings to configure SAML
###########################################################################

[libsaml]
  # Xmlsec1 binary path. This program should be executable by the user running Hue.
  ## xmlsec_binary=/usr/local/bin/xmlsec1

  # Entity ID for Hue acting as service provider.
  # Can also accept a pattern where '<base_url>' will be replaced with server URL base.
  ## entity_id="<base_url>/saml2/metadata/"

  # Create users from SSO on login.
  ## create_users_on_login=true

  # Required attributes to ask for from IdP.
  # This requires a comma separated list.
  ## required_attributes=uid

  # Optional attributes to ask for from IdP.
  # This requires a comma separated list.
  ## optional_attributes=

  # IdP metadata in the form of a file. This is generally an XML file containing metadata that the Identity Provider generates.
  ## metadata_file=

  # Private key to encrypt metadata with.
  ## key_file=

  # Signed certificate to send along with encrypted metadata.
  ## cert_file=

  # A mapping from attributes in the response from the IdP to django user attributes.
  ## user_attribute_mapping={'uid':'username'}

  # Have Hue initiated authn requests be signed and provide a certificate.
  ## authn_requests_signed=false

  # Have Hue initiated logout requests be signed and provide a certificate.
  ## logout_requests_signed=false

  # Username can be sourced from 'attributes' or 'nameid'.
  ## username_source=attributes

  # Performs the logout or not.
  ## logout_enabled=true


###########################################################################
# Settings to configure OpenId
###########################################################################

[libopenid]
  # (Required) OpenId SSO endpoint url.
  ## server_endpoint_url=https://www.google.com/accounts/o8/id

  # OpenId 1.1 identity url prefix to be used instead of SSO endpoint url
  # This is only supported if you are using an OpenId 1.1 endpoint
  ## identity_url_prefix=https://app.onelogin.com/openid/your_company.com/

  # Create users from OPENID on login.
  ## create_users_on_login=true

  # Use email for username
  ## use_email_for_username=true


###########################################################################
# Settings to configure OAuth
###########################################################################

[liboauth]
  # NOTE:
  # To work, each of the active (i.e. uncommented) service must have
  # applications created on the social network.
  # Then the "consumer key" and "consumer secret" must be provided here.
  #
  # The addresses where to do so are:
  # Twitter:  https://dev.twitter.com/apps
  # Google+ : https://cloud.google.com/
  # Facebook: https://developers.facebook.com/apps
  # Linkedin: https://www.linkedin.com/secure/developer
  #
  # Additionnaly, the following must be set in the application settings:
  # Twitter:  Callback URL (aka Redirect URL) must be set to http://YOUR_HUE_IP_OR_DOMAIN_NAME/oauth/social_login/oauth_authenticated
  # Google+ : CONSENT SCREEN must have email address
  # Facebook: Sandbox Mode must be DISABLED
  # Linkedin: "In OAuth User Agreement", r_emailaddress is REQUIRED

  # The Consumer key of the application
  ## consumer_key_twitter=
  ## consumer_key_google=
  ## consumer_key_facebook=
  ## consumer_key_linkedin=

  # The Consumer secret of the application
  ## consumer_secret_twitter=
  ## consumer_secret_google=
  ## consumer_secret_facebook=
  ## consumer_secret_linkedin=

  # The Request token URL
  ## request_token_url_twitter=https://api.twitter.com/oauth/request_token
  ## request_token_url_google=https://accounts.google.com/o/oauth2/auth
  ## request_token_url_linkedin=https://www.linkedin.com/uas/oauth2/authorization
  ## request_token_url_facebook=https://graph.facebook.com/oauth/authorize

  # The Access token URL
  ## access_token_url_twitter=https://api.twitter.com/oauth/access_token
  ## access_token_url_google=https://accounts.google.com/o/oauth2/token
  ## access_token_url_facebook=https://graph.facebook.com/oauth/access_token
  ## access_token_url_linkedin=https://api.linkedin.com/uas/oauth2/accessToken

  # The Authenticate URL
  ## authenticate_url_twitter=https://api.twitter.com/oauth/authorize
  ## authenticate_url_google=https://www.googleapis.com/oauth2/v1/userinfo?access_token=
  ## authenticate_url_facebook=https://graph.facebook.com/me?access_token=
  ## authenticate_url_linkedin=https://api.linkedin.com/v1/people/~:(email-address)?format=json&oauth2_access_token=

  # Username Map. Json Hash format.
  # Replaces username parts in order to simplify usernames obtained
  # Example: {"@sub1.domain.com":"_S1", "@sub2.domain.com":"_S2"}
  # converts '[email protected]' to 'email_S1'
  ## username_map={}

  # Whitelisted domains (only applies to Google OAuth). CSV format.
  ## whitelisted_domains_google=

###########################################################################
# Settings for the RDBMS application
###########################################################################

[librdbms]
  # The RDBMS app can have any number of databases configured in the databases
  # section. A database is known by its section name
  # (IE sqlite, mysql, psql, and oracle in the list below).

  [[databases]]
    # sqlite configuration.
    ## [[[sqlite]]]
      # Name to show in the UI.
      ## nice_name=SQLite

      # For SQLite, name defines the path to the database.
      ## name=/tmp/sqlite.db

      # Database backend to use.
      ## engine=sqlite

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en/1.4/ref/databases/
      ## options={}

    # mysql, oracle, or postgresql configuration.
    ## [[[mysql]]]
      # Name to show in the UI.
      ## nice_name="My SQL DB"

      # For MySQL and PostgreSQL, name is the name of the database.
      # For Oracle, Name is instance of the Oracle server. For express edition
      # this is 'xe' by default.
      ## name=mysqldb

      # Database backend to use. This can be:
      # 1. mysql
      # 2. postgresql
      # 3. oracle
      ## engine=mysql

      # IP or hostname of the database to connect to.
      ## host=localhost

      # Port the database server is listening to. Defaults are:
      # 1. MySQL: 3306
      # 2. PostgreSQL: 5432
      # 3. Oracle Express Edition: 1521
      ## port=3306

      # Username to authenticate with when connecting to the database.
      ## user=example

      # Password matching the username to authenticate with when
      # connecting to the database.
      ## password=example

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en/1.4/ref/databases/
      ## options={}

###########################################################################
# Settings to configure your Hadoop cluster.
###########################################################################

[hadoop]

  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs

    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://h1:8020

      # NameNode logical name.
      logical_name=h1

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://h1:50070/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      security_enabled=false

      # Default umask for file and directory creation, specified in an octal value.
      umask=022
      hadoop_conf_dir=/home/search/hadoop/etc/hadoop

  # Configuration for YARN (MR2)
  # ------------------------------------------------------------------------
  [[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=h1

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8032

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://h1:8088

      # URL of the ProxyServer API
      proxy_api_url=http://h1:8088

      # URL of the HistoryServer API
      history_server_api_url=http://h1:19888

    # HA support by specifying multiple clusters
    # e.g.

    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name

  # Configuration for MapReduce (MR1)
  # ------------------------------------------------------------------------
  [[mapred_clusters]]

    [[[default]]]
      # Enter the host on which you are running the Hadoop JobTracker
     jobtracker_host=h1

      # The port where the JobTracker IPC listens on
     #jobtracker_port=8021

      # JobTracker logical name for HA
      ## logical_name=

      # Thrift plug-in port for the JobTracker
      ## thrift_port=9290

      # Whether to submit jobs to this cluster
      submit_to=False

      # Change this if your MapReduce cluster is Kerberos-secured
      ## security_enabled=false

    # HA support by specifying multiple clusters
    # e.g.

    # [[[ha]]]
      # Enter the logical name of the JobTrackers
      # logical_name=my-jt-name


###########################################################################
# Settings to configure the Filebrowser app
###########################################################################

[filebrowser]
  # Location on local filesystem where the uploaded archives are temporary stored.
  ## archive_upload_tempdir=/tmp

###########################################################################
# Settings to configure liboozie
###########################################################################

[liboozie]
  # The URL where the Oozie service runs on. This is required in order for
  # users to submit jobs. Empty value disables the config check.
  ## oozie_url=http://localhost:11000/oozie
  oozie_url=http://h1:11000/oozie

  # Requires FQDN in oozie_url if enabled
  ## security_enabled=false

  # Location on HDFS where the workflows/coordinator are deployed when submitted.
  remote_deployement_dir=/user/hue/oozie/deployments


###########################################################################
# Settings to configure the Oozie app
###########################################################################

[oozie]
  # Location on local FS where the examples are stored.
  local_data_dir=apps/oozie/examples/

  # Location on local FS where the data for the examples is stored.
  ## sample_data_dir=...thirdparty/sample_data

  # Location on HDFS where the oozie examples and workflows are stored.
  remote_data_dir=apps/oozie/workspaces

  # Maximum of Oozie workflows or coodinators to retrieve in one API call.
  oozie_jobs_count=100

  # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
  ## enable_cron_scheduling=true
  enable_cron_scheduling=true


###########################################################################
# Settings to configure Beeswax with Hive
###########################################################################

[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=h1

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/home/search/hive/conf

  # Timeout in seconds for thrift calls to Hive service
  server_conn_timeout=120

  # Set a LIMIT clause when browsing a partitioned table.
  # A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
  browse_partitioned_table_limit=250

  # A limit to the number of rows that can be downloaded from a query.
  # A value of -1 means there will be no limit.
  # A maximum of 65,000 is applied to XLS downloads.
  download_row_limit=1000000

  # Hue will try to close the Hive query when the user leaves the editor page.
  # This will free all the query resources in HiveServer2, but also make its results inaccessible.
  ## close_queries=false

  # Thrift version to use when communicating with HiveServer2
  ## thrift_version=5

  [[ssl]]
    # SSL communication enabled for this server.
    ## enabled=false

    # Path to Certificate Authority certificates.
    ## cacerts=/etc/hue/cacerts.pem

    # Path to the private key file.
    ## key=/etc/hue/key.pem

    # Path to the public certificate file.
    ## cert=/etc/hue/cert.pem

    # Choose whether Hue should validate certificates received from the server.
    ## validate=true


###########################################################################
# Settings to configure Pig
###########################################################################

[pig]
  # Location of piggybank.jar on local filesystem.
  local_sample_dir=/home/search/hue/apps/pig/examples

  # Location piggybank.jar will be copied to in HDFS.
  remote_data_dir=/home/search/pig/examples


###########################################################################
# Settings to configure Sqoop
###########################################################################

[sqoop]
  # For autocompletion, fill out the librdbms section.

  # Sqoop server URL
  server_url=http://h1:12000/sqoop


###########################################################################
# Settings to configure Proxy
###########################################################################

[proxy]
  # Comma-separated list of regular expressions,
  # which match 'host:port' of requested proxy target.
  ## whitelist=(localhost|127\.0\.0\.1):(50030|50070|50060|50075)

  # Comma-separated list of regular expressions,
  # which match any prefix of 'host:port/path' of requested proxy target.
  # This does not support matching GET parameters.
  ## blacklist=


###########################################################################
# Settings to configure Impala
###########################################################################

[impala]
  # Host of the Impala Server (one of the Impalad)
  ## server_host=localhost

  # Port of the Impala Server
  ## server_port=21050

  # Kerberos principal
  ## impala_principal=impala/hostname.foo.com

  # Turn on/off impersonation mechanism when talking to Impala
  ## impersonation_enabled=False

  # Number of initial rows of a result set to ask Impala to cache in order
  # to support re-fetching them for downloading them.
  # Set to 0 for disabling the option and backward compatibility.
  ## querycache_rows=50000

  # Timeout in seconds for thrift calls
  ## server_conn_timeout=120

  # Hue will try to close the Impala query when the user leaves the editor page.
  # This will free all the query resources in Impala, but also make its results inaccessible.
  ## close_queries=true

  # If QUERY_TIMEOUT_S > 0, the query will be timed out (i.e. cancelled) if Impala does not do any work
  # (compute or send back results) for that query within QUERY_TIMEOUT_S seconds.
  ## query_timeout_s=600


###########################################################################
# Settings to configure HBase Browser
###########################################################################

[hbase]
  # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
  # Use full hostname with security.
  ## hbase_clusters=(Cluster|localhost:9090)

  # HBase configuration directory, where hbase-site.xml is located.
  ## hbase_conf_dir=/etc/hbase/conf

  # Hard limit of rows or columns per row fetched before truncating.
  ## truncate_limit = 500

  # 'buffered' is the default of the HBase Thrift Server and supports security.
  # 'framed' can be used to chunk up responses,
  # which is useful when used in conjunction with the nonblocking server in Thrift.
  ## thrift_transport=buffered


###########################################################################
# Settings to configure Solr Search
###########################################################################

[search]

  # URL of the Solr Server
  solr_url=http://172.21.50.41:8983/solr/

  # Requires FQDN in solr_url if enabled
  ## security_enabled=false

  ## Query sent when no term is entered
  ## empty_query=*:*


###########################################################################
# Settings to configure Solr Indexer
###########################################################################

[indexer]

  # Location of the solrctl binary.
  ## solrctl_path=/usr/bin/solrctl

  # Location of the solr home.
  ## solr_home=/usr/lib/solr

  # Zookeeper ensemble.
  ## solr_zk_ensemble=localhost:2181/solr

  # The contents of this directory will be copied over to the solrctl host to its temporary directory.
  ## config_template_path=/../hue/desktop/libs/indexer/src/data/solr_configs


###########################################################################
# Settings to configure Job Designer
###########################################################################

[jobsub]

  # Location on local FS where examples and template are stored.
  ## local_data_dir=..../data

  # Location on local FS where sample data is stored
  ## sample_data_dir=...thirdparty/sample_data


###########################################################################
# Settings to configure Job Browser
###########################################################################

[jobbrowser]
  # Share submitted jobs information with all users. If set to false,
  # submitted jobs are visible only to the owner and administrators.
  ## share_jobs=true


###########################################################################
# Settings to configure the Zookeeper application.
###########################################################################

[zookeeper]

  [[clusters]]

    [[[default]]]
      # Zookeeper ensemble. Comma separated list of Host/Port.
      # e.g. localhost:2181,localhost:2182,localhost:2183
      host_ports=zk1:2181

      # The URL of the REST contrib service (required for znode browsing)
      ## rest_url=http://localhost:9998


###########################################################################
# Settings to configure the Spark application.
###########################################################################

[spark]
  # URL of the REST Spark Job Server.
  server_url=http://h1:8080/


###########################################################################
# Settings for the User Admin application
###########################################################################

[useradmin]
  # The name of the default user group that users will be a member of
  ## default_user_group=default


###########################################################################
# Settings for the Sentry lib
###########################################################################

[libsentry]
  # Hostname or IP of server.
  ## hostname=localhost

  # Port the sentry service is running on.
  ## port=8038

  # Sentry configuration directory, where sentry-site.xml is located.
  ## sentry_conf_dir=/etc/sentry/conf

编译好的目录如下：

-rw-rw-r--  1 search search  2782 5月  19 06:04 app.reg
-rw-rw-r--  1 search search  2782 5月  19 05:41 app.reg.bak
drwxrwxr-x 22 search search  4096 5月  20 01:05 apps
drwxrwxr-x  3 search search  4096 5月  19 05:41 build
drwxr-xr-x  2 search search  4096 5月  19 05:40 data
drwxrwxr-x  7 search search  4096 5月  20 01:29 desktop
drwxrwxr-x  2 search search  4096 5月  19 05:41 dist
drwxrwxr-x  7 search search  4096 5月  19 05:40 docs
drwxrwxr-x  3 search search  4096 5月  19 05:40 ext
-rw-rw-r--  1 search search 11358 5月  19 05:38 LICENSE.txt
drwxrwxr-x  2 search search  4096 5月  20 01:29 logs
-rw-rw-r--  1 search search  8121 5月  19 05:41 Makefile
-rw-rw-r--  1 search search  8505 5月  19 05:41 Makefile.sdk
-rw-rw-r--  1 search search  3093 5月  19 05:40 Makefile.tarball
-rw-rw-r--  1 search search  3498 5月  19 05:41 Makefile.vars
-rw-rw-r--  1 search search  2302 5月  19 05:41 Makefile.vars.priv
drwxrwxr-x  2 search search  4096 5月  19 05:41 maven
-rw-rw-r--  1 search search   801 5月  19 05:40 NOTICE.txt
-rw-rw-r--  1 search search  4733 5月  19 05:41 README.rst
-rw-rw-r--  1 search search    52 5月  19 05:38 start.sh
-rw-rw-r--  1 search search    65 5月  19 05:41 stop.sh
drwxrwxr-x  9 search search  4096 5月  19 05:38 tools
-rw-rw-r--  1 search search   932 5月  19 05:41 VERSION

6，启动hue，执行命令：build/env/bin/supervisor

[search@h1 hue]$ build/env/bin/supervisor 
[INFO] Not running as root, skipping privilege drop
starting server with options {'ssl_certificate': None, 'workdir': None, 'server_name': 'localhost', 'host': '0.0.0.0', 'daemonize': False, 'threads': 10, 'pidfile': None, 'ssl_private_key': None, 'server_group': 'search', 'ssl_cipher_list': 'DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2', 'port': 8000, 'server_user': 'search'}

然后我们就可以访问安装机ip+8000端口来查看了：

工具箱界面：

hive的界面：

在配置hive（散仙这里是0.13的版本）的时候，需要注意以下几个方面：

hive的metastrore的服务和hiveserver2服务都需要启动
执行下面命令
bin/hive --service metastore
bin/hiveserver2
除此之外，还需要关闭的hive的SAL认证，否则，使用hue访问会出现问题。
注意下面三项的配置

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>
<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
  <description>Port number of HiveServer2 Thrift interface.
  Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
</property>

<property>
  <name>hive.server2.thrift.bind.host</name>
  <value>h1</value>
  <description>Bind host on which to run the HiveServer2 Thrift interface.
  Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>
</property>

<property>
  <name>hive.server2.authentication</name>
  <value>NOSASL</value>
  <description>
    Client authentication types.
       NONE: no authentication check
       LDAP: LDAP/AD based authentication
       KERBEROS: Kerberos/GSSAPI authentication
       CUSTOM: Custom authentication provider
               (Use with property hive.server2.custom.authentication.class)
       PAM: Pluggable authentication module.
  </description>
</property>

除了上面的配置外，还需要把hive.server2.long.polling.timeout的参数值，默认是5000L给改成5000,否则使用beenline连接时候，会出错，这是hive的一个bug。

pig的界面：

solr的界面如下：

最后需要注意一点，hue也需要在hadoop的core-site.xml里面配置相应的代理用户，示例如下：

 <property>
       <name>hadoop.proxyuser.hue.hosts</name>
      <value>*</value>
  </property>


    <property>
      <name>hadoop.proxyuser.hue.groups</name>
      <value>*</value>
     </property>

ok至此，我们的hue已经能完美工作了，我们可以根据自己的需要，定制相应的app插件，非常灵活！

你可能感兴趣的:(hadoop,hive,pig,hue)

RHEL 安装 Hadoop 服务器 XhClojure hadoop 服务器大数据
在这篇文章中，我们将探讨如何在RedHatEnterpriseLinux(RHEL)上安装和配置Hadoop服务器。Hadoop是一个开源的分布式数据处理框架，用于处理大规模数据集。以下是在RHEL上安装Hadoop的详细步骤。步骤1：安装Java在安装Hadoop之前，我们需要确保系统上安装了JavaDevelopmentKit(JDK)。执行以下命令安装JDK：sudoyuminstallja
如何安装Hadoop 薇晶晶 hadoop 大数据分布式
Hadoop入门(一)——CentOS7下载+VM上安装（手动分区）Hadoop入门(二)——VMware虚拟网络设置+Windows10的IP地址配置+CentOS静态IP设置Hadoop入门(三)——XSHELL7远程访问工具+XFTP7文件传输Hadoop入门(四)——模板虚拟机环境准备Hadoop入门(五)——Hadoop集群搭建-克隆三台虚拟机Hadoop入门(六)——JDK安装Hado
安装配置MAVEN ByteVoyager maven java
安装配置MAVEN1.获取安装包下载apache-maven-3.8.1-bin.zip，下载地址：https://archive.apache.org/dist/maven/maven-3/3.8.1/binaries/apache-maven-3.8.1-bin.zip。2.解压maven压缩包3.配置maven环境变量新建环境变量MAVEN_HOME:右击【此电脑】->【属性】->【高级系统
flutter pigeon gomobile 插件中使用go工具类 yujunlong3919 flutter golang swift kotlin
文章目录为什么flutter要用go写工具类1.下载pigeon插件模版2.编写go代码3.生成greeting.aar，Greeting.xcframework4.ios5.android6.dart中使用为什么flutter要用go写工具类在Flutter应用中，有些场景涉及到大量的计算，比如复杂的加密算法、数据压缩/解压缩或者图形处理中的数学计算等1.下载pigeon插件模版base_plu
MapReduce 读取 Hive ORC ArrayIndexOutOfBoundsException: 1024 异常解决一张假钞 mapreduce hive 大数据
个人博客地址：MapReduce读取HiveORCArrayIndexOutOfBoundsException:1024异常解决|一张假钞的真实世界在MR处理ORC的时候遇到如下异常：Exceptioninthread"main"java.lang.ArrayIndexOutOfBoundsException:1024atorg.apache.orc.impl.RunLengthIntegerRe
Anaconda 配置镜像源猿代码_xiao python pytorch python 深度学习
Anaconda镜像使用帮助Anaconda是一个用于科学计算的Python发行版，支持Linux,Mac,Windows,包含了众多流行的科学计算、数据分析的Python包。Anaconda安装包可以到https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/下载。TUNA还提供了Anaconda仓库与第三方源（conda-forge、msys2
linux安装python开发环境燃冰结晶 linux python linux install python jupyter python web开发环境
安装Anaconda下载Anacondawgethttps://repo.anaconda.com/archive/Anaconda3-5.3.0-Linux-x86_64.sh安装AnacondabashAnaconda3-5.3.0-Linux-x86_64.sh会选择安装路径会选择是否需要安装vscode,国内安装不上,所以不要安装了刷新环境配置source~/.bashrc验证是否安装成功
Hadoop01-入门&集群环境搭建--非原创（test） xl.liu 大数据 Test
Hadoop01-入门&集群环境搭建今日内容Hadoop的介绍集群环境搭建准备工作Linux命令和Shell脚本增强集群环境搭建来来来大数据概述大数据:就是对海量数据进行分析处理，得到一些有价值的信息，然后帮助企业做出判断和决策.处理流程:1:获取数据2:处理数据3:展示结果1：Hadoop介绍Hadoop是一个分布式系基础框架,它允许使用简单的编程模型跨大型计算机的大型数据集进行分布式处理.它主
Hadoop管理工具dfsadmin和fsck的使用脚本无敌 Hadoop hadoop npm 大数据
Hadoop提供了多个管理工具，其中dfsadmin和fsck是用于管理HDFS（Hadoop分布式文件系统）的重要工具。以下是它们的使用方法和常见命令。1.dfsadmin工具dfsadmin是用于管理HDFS集群的命令行工具，主要用于监控和管理HDFS的状态。常用命令查看HDFS状态hdfsdfsadmin-report显示HDFS集群的总体状态，包括数据节点（DataNode）的状态、存储容
（一）大数据---Hadoop整体介绍（架构层）----（组件(3) 2401_84166965 程序员大数据 hadoop 架构
复杂性:体现在数据的管理和操作上。如何抽取，转换，加载，连接，关联以把握数据内蕴的有用信息已经变得越来越有挑战性二、大数据技术有哪些（重点）===================================================================================基础的技术包含数据的采集、数据预处理、分布式存储、NoSQL数据库、数据仓库、机器学习、并行计
实现MySQL数据全量迁移至Hive的简单脚本 xiaoxaoyu 数仓数据仓库
1、主要思路：编写脚本执行建表语句、sqoop命令1.1、编写建表语句脚本思路：在虚拟机下执行hive-f/脚本路径即可执行hql脚本1.2、编写shell脚本脚本内容为分为两部分执行hql建表语句脚本sqoop迁移命令2、示范案例：2.1、hive建表脚本：--示范案例dropdatabaseifexistsods_myshopscascade;createdatabaseods_myshops
Hive的动态分区与静态分区（区别及详解）东南枝上的大雄 Hadoop hive 大数据 hadoop
静态分区与动态分区的区别：1、静态分区2、动态分区静态分区与动态分区的区别：静态分区是先把分区表创好，然后手动把数据导入到对应的分区里面去。静态分区实在编译期间指定分区名。静态分区支持load、insert两种插入方式。静态分区是用于分区少，分区名可以明确的数据。动态分区是有一份数据集（2015-2022年的），按照数据集的字段给动态的生成分区。动态分区实在SQL执行的时候确定的。动态分区前需打开
hive—常用的函数整理风子~ hive hadoop 数据仓库
1、size(split(...))函数用于计算分割后字符串数组的长度1）实例：由客户编号列表计算客户编号个数--数据准备withtmp_test01as(select'tag074445270'tag_id,'202501'busi_mon,'012399931003,012399931000'index_valunionallselect'tag074445271'tag_id,'202501
Hive 分区详解 mm_ren 分区表 hadoop 大数据 hive spark
在Hive中处理数据时，当处理的一张表的数据量过大的时候，每次查询都是遍历整张表，显然对于计算机来说，是负担比较重的。所以我们可不可以针对数据进行分类，查询时只遍历该分类中的数据，这样就能有效的解决问题。所以就会Hive在表的架构下，就会有分区的这个概念，就是为了满足此需求。分区表的一个分区对应hdfs上的一个目录分区表包括静态分区表和动态分区表，根据分区会不会自动创建来区分多级分区表，即创建的时
Hive的动态分区的原理肥猪猪爸大数据 hive hadoop 数据仓库大数据 sql 面试
Hive动态分区原理详解1.什么是Hive动态分区？在Hive中，分区（Partition）是对表数据的一种划分方式，类似于关系型数据库中的分区表。例如，在电商数据中，可以按year、month、day进行分区存储，以便提高查询效率。静态分区（StaticPartition）：用户在INSERT数据时手动指定分区字段的值。动态分区（DynamicPartition）：分区字段的值从数据本身自动提取
hive全量迁移脚本我要用代码向我喜欢的女孩表白数据迁移 bigdata-大数据专栏 hive hadoop 数据仓库
#!/bin/bash#场景：数据在同一库下，并且hive是内部表（前缀的hdfs地址是相同的）#1.读取一个文件，获取表名#echo"时间$dt_jian_2-------------------------">>/home/hadoop/qianyi_zengliang/rs.txt#跟客户宽带有关，万兆(1.2g)，然后咨询业务后，看监控高峰，大概可以用一般600mb/spinjie="ha
笔记：DataSphere Studio安装部署流程右边com Java 大数据
一、标准版部署标准版：有一定的安装难度，体现在Hadoop、Hive和Spark版本不同时，可能需要重新编译，可能会出现包冲突问题。适合于试用和生产使用，2~3小时即可部署起来。支持的功能有：数据开发IDE-Scriptis工作流实时执行信号功能和邮件功能数据可视化-Visualis数据质量-Qualitis(单机版)工作流定时调度-Azkaban(单机版)Linkis管理台二、基础环境准备2.1
HIVE- SPARK 流川枫_ 20210706 hdfs hive spark
日常记录备忘Hive修改字段类型之后（varchar->string）Hive可以查到数据，Presto查询报错;分区字段数据类型和表结构字段类型不一样；spark-sql分区表和非分区表兼容问题，不能关联可以建临时表把分区数据导入，用完数据将表删除；count有数据，select没数据可能是压缩格式所导致；优化合全量任务，之前是row_number()函数先插入当天增量，取出最新的数据插入全量表
Hive的ReduceJoin/MapJoin/SMBJoin for your wish Hive 面试Interview hive hadoop
Hive中就是把Map，Reduce的Join拿过来，通过SQL来表示。参考链接：LanguageManualJoins-ApacheHive-ApacheSoftwareFoundation1.Reduce/Common/ShuffleJoinReduceJoin在Hive中也叫CommonJoin或ShuffleJoin它会进行把相同key的value合在一起，正好符合我们在sql中的join
hive-site.xml 配置总结 hxsln11 hive xml hadoop
在Hive安装后，hive主要的配置文件为conf中hive-site.xml那该文件中那么多的配置选项都是什么含义呢。下面这篇文章带你解密这些配置请跟随以下这些问题来看以下配置：1.hive输出格式的配置项是哪个？2.hive被各种语言调用如何配置？3.hive提交作业是在hive中还是hadoop中？4.一个查询的最后一个map/reduce任务输出是否被压缩的标志，通过哪个配置项？5.当用户
小白也能安装：Ubuntu20.04 安装 RabbitMQ Valishment RabbitMQ ubuntu rabbitmq linux 阿里云 java
开始我使用的是阿里云的轻量级服务器Ubuntu20.04系统镜像作为平台因为要使用RabbitMQ,想着步骤有点繁琐,写篇记一记安装基本依赖项更新源sudoapt-getupdate-y下载签名密钥和软件包所需的先决条件sudoapt-getinstallcurlgnupgdebian-keyringdebian-archive-keyring-y添加存储库签名密钥(指示易于信任由该密钥签名的软件
最新Apache Hudi 1.0.1源码编译详细教程以及常见问题处理 Toroidals 大数据组件安装部署教程 hudi1.0.1 源码编译教程最新
1.最新ApacheHudi1.0.1源码编译2.Flink、Spark、Hive集成Hudi1.0.13.flinkstreaming写入hudi目录1.版本介绍2.安装maven2.1.下载maven2.2.设置环境变量2.3.添加Maven镜像3.编译hudi3.1.下载hudi源码3.2.修改hudi源码3.3.修改hudi-1.0.1/pom.xml，注释或去掉410行内容3.4.安装c
Python 算法交易秘籍（五）绝不原创的飞龙默认分类默认分类
原文：zh.annas-archive.org/md5/010eca9c9f84c67fe4f8eb1d9bd1d316译者：飞龙协议：CCBY-NC-SA4.0第十一章：算法交易-实际交易现在我们已经建立了各种算法交易策略，并成功地进行了令人满意的回测，并在实时市场中进行了纸上交易，现在终于到了进行实际交易的时候了。实际交易是指我们在真实市场小时内用真钱执行交易策略。如果您的策略在回测和纸上交易
常见Linux命令程序员小柴后端工程化 linux 服务器运维
第八章常见Linux命令学习目标1熟练文件目录类命令2熟悉用户管理命令3熟悉组管理命令4熟练文件权限命令5熟悉搜索查找类命令6熟练压缩和解压缩命令7熟练进程线程类命令8了解磁盘分区类命令第一节文件目录类命令（1）pwd打印当前目录的绝对路径(printworkingdirectory)基本语法pwd（功能描述：显示当前工作目录的绝对路径）案例实操显示当前工作目录的绝对路径[root@hadoop1
Hive中文乱码解决方法快乐骑行^_^ 大数据大数据平台二次开发
Hive中文乱码解决方法一、Hive中文乱码原因二、Hive中文乱码解决方法三、修改hive配置文件四、再次查看表信息，中文注释正常一、Hive中文乱码原因hive的元数据是由mysql管理的，mysql默认编码是latin1，中文存储进去容易乱码，所以最好把mysql的编码改成utf-8二、Hive中文乱码解决方法需要把相应注释的地方的字符集由latin1改成utf-8，用到注释的就三个地方，表
Fink与Hadoop的简介以及联系 Bugkillers hadoop 大数据分布式
Fink和Hadoop是两个常用于大数据处理的开源工具，它们可以搭配使用以构建高效的数据处理系统。一、Fink和Hadoop的关系Fink：1、Fink是一个分布式流处理框架，专注于实时数据处理。它支持高吞吐、低延迟的流处理，适用于实时分析、事件驱动应用等场景。2、Fink提供精确一次（exactly-once）语义，确保数据处理的准确性。Hadoop：1、Hadoop是一个分布式存储和批处理框架
Hbase深入浅出天才之上数据存储 Hbase 大数据存储
目录HBase在大数据生态圈中的位置HBase与传统关系数据库的区别HBase相关的模块以及HBase表格的特性HBase的使用建议Phoenix的使用总结HBase在大数据生态圈中的位置提到大数据的存储，大多数人首先联想到的是Hadoop和Hadoop中的HDFS模块。大家熟知的Spark、以及Hadoop的MapReduce，可以理解为一种计算框架。而HDFS，我们可以认为是为计算框架服务的存
HBase简介：高效分布式数据存储和处理代码指四方分布式 hbase 数据库大数据
HBase简介：高效分布式数据存储和处理HBase是一个高效的、可扩展的分布式数据库，它是构建在ApacheHadoop之上的开源项目。HBase的设计目标是为大规模数据存储和处理提供高吞吐量和低延迟的解决方案。它可以在成百上千台服务器上运行，并能够处理海量的结构化和半结构化数据。HBase的核心特点包括：分布式存储：HBase使用Hadoop分布式文件系统（HDFS）作为底层存储，数据被分布在集
在Hadoop集群中实现数据安全：技术与策略并行 Echo_Wish 实战高阶大数据 hadoop 大数据分布式
在Hadoop集群中实现数据安全：技术与策略并行随着大数据技术的广泛应用，Hadoop已经成为处理和存储海量数据的首选平台。然而，随着数据规模的扩大，如何确保Hadoop集群中的数据安全也成为了亟待解决的难题。毕竟，数据安全不仅关系到企业的隐私保护，也直接影响到数据的可信度与可用性。本文将探讨如何在Hadoop集群中实现数据安全，分析数据加密、访问控制、审计日志等方面的技术与策略，并通过一些具体的
hive建表语句增加字段、分区基础操作节点。csn 数据库 #hive hive hadoop big data
目录hive建表内部分区表外部分区表表结构复制：hive表删除hive表重命名表修改操作增加分区修改分区删除分区新增表字段hive建表IFNOTEXISTS:表不存在才会创建分隔符：field.delim是表的两个列字段之间的文件中的字段分隔符.serialization.format是文件序列化时表中两个列字段之间的文件中的字段分隔符.分区partition:创建表时可指定分区字段，多个分区字段
如何用ruby来写hadoop的mapreduce并生成jar包 wudixiaotie mapreduce
ruby来写hadoop的mapreduce，我用的方法是rubydoop。怎么配置环境呢： 1.安装rvm：不说了网上有 2.安装ruby：由于我以前是做ruby的，所以习惯性的先安装了ruby，起码调试起来比jruby快多了。 3.安装jruby： rvm install jruby然后等待安
java编程思想 -- 访问控制权限百合不是茶 java 访问控制权限单例模式
访问权限是java中一个比较中要的知识点,它规定者什么方法可以访问,什么不可以访问一:包访问权限; 自定义包: package com.wj.control; //包 public class Demo { //定义一个无参的方法 public void DemoPackage(){ System.out.println("调用
[生物与医学]请审慎食用小龙虾 comsci 生物
现在的餐馆里面出售的小龙虾,有一些是在野外捕捉的,这些小龙虾身体里面可能带有某些病毒和细菌,人食用以后可能会导致一些疾病,严重的甚至会死亡..... 所以,参加聚餐的时候,最好不要点小龙虾...就吃养殖的猪肉,牛肉,羊肉和鱼,等动物蛋白质
org.apache.jasper.JasperException: Unable to compile class for JSP: 商人shang maven 2.2 jdk1.8
环境： jdk1.8 maven tomcat7-maven-plugin 2.0 原因： tomcat7-maven-plugin 2.0 不知吃 jdk 1.8，换成 tomcat7-maven-plugin 2.2就行，即 <plugin>
你的垃圾你处理掉了吗?GC oloz GC
前序:本人菜鸟，此文研究学习来自网络，各位牛牛多指教　 1.垃圾收集算法的核心思想　　Java语言建立了垃圾收集机制，用以跟踪正在使用的对象和发现并回收不再使用(引用)的对象。该机制可以有效防范动态内存分配中可能发生的两个危险：因内存垃圾过多而引发的内存耗尽，以及不恰当的内存释放所造成的内存非法引用。　　垃圾收集算法的核心思想是：对虚拟机可用内存空间，即堆空间中的对象进行识别
shiro 和 SESSSION 杨白白 shiro
shiro 在web项目里默认使用的是web容器提供的session，也就是说shiro使用的session是web容器产生的，并不是自己产生的，在用于非web环境时可用其他来源代替。在web工程启动的时候它就和容器绑定在了一起，这是通过web.xml里面的shiroFilter实现的。通过session.getSession()方法会在浏览器cokkice产生JESSIONID，当关闭浏览器，此
移动互联网终端淘宝客如何实现盈利小桔子移動客戶端淘客淘寶App
2012年淘宝联盟平台为站长和淘宝客带来的分成收入突破30亿元，同比增长100%。而来自移动端的分成达1亿元，其中美丽说、蘑菇街、果库、口袋购物等App运营商分成近5000万元。可以看出，虽然目前阶段PC端对于淘客而言仍旧是盈利的大头，但移动端已经呈现出爆发之势。而且这个势头将随着智能终端(手机，平板)的加速普及而更加迅猛
wordpress小工具制作 aichenglong wordpress 小工具
wordpress 使用侧边栏的小工具，很方便调整页面结构小工具的制作过程 1 在自己的主题文件中新建一个文件夹(如widget)，在文件夹中创建一个php(AWP_posts-category.php) 小工具是一个类,想侧边栏一样，还得使用代码注册，他才可以再后台使用，基本的代码一层不变 <?php class AWP_Post_Category extends WP_Wi
JS微信分享 AILIKES js
// 所有功能必须包含在 WeixinApi.ready 中进行 WeixinApi.ready(function(Api) { // 微信分享的数据 var wxData = { &nb
封装探讨百合不是茶 JAVA面向对象封装
//封装属性方法将某些东西包装在一起，通过创建对象或使用静态的方法来调用，称为封装；封装其实就是有选择性地公开或隐藏某些信息，它解决了数据的安全性问题，增加代码的可读性和可维护性在 Aname类中申明三个属性，将其封装在一个类中：通过对象来调用例如 1： //属性将其设为私有姓名 name 可以公开
jquery radio/checkbox change事件不能触发的问题 bijian1013 JavaScript jquery
我想让radio来控制当前我选择的是机动车还是特种车，如下所示： <html> <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"><
AngularJS中安全性措施 bijian1013 JavaScript AngularJS 安全性 XSRF JSON漏洞
在使用web应用中，安全性是应该首要考虑的一个问题。AngularJS提供了一些辅助机制，用来防护来自两个常见攻击方向的网络攻击。一.JSON漏洞当使用一个GET请求获取JSON数组信息的时候（尤其是当这一信息非常敏感，
[Maven学习笔记九]Maven发布web项目 bit1129 maven
基于Maven的web项目的标准项目结构 user-project user-core user-service user-web src
【Hive七】Hive用户自定义聚合函数(UDAF) bit1129 hive
用户自定义聚合函数，用户提供的多个入参通过聚合计算(求和、求最大值、求最小值)得到一个聚合计算结果的函数。问题：UDF也可以提供输入多个参数然后输出一个结果的运算，比如加法运算add(3，5)，add这个UDF需要实现UDF的evaluate方法,那么UDF和UDAF的实质分别究竟是什么？ Double evaluate(Double a, Double b)
通过 nginx-lua 给 Nginx 增加 OAuth 支持 ronin47
前言：我们使用Nginx的Lua中间件建立了OAuth2认证和授权层。如果你也有此打算，阅读下面的文档，实现自动化并获得收益。SeatGeek 在过去几年中取得了发展，我们已经积累了不少针对各种任务的不同管理接口。我们通常为新的展示需求创建新模块，比如我们自己的博客、图表等。我们还定期开发内部工具来处理诸如部署、可视化操作及事件处理等事务。在处理这些事务中，我们使用了几个不同的接口来认证： &n
利用tomcat-redis-session-manager做session同步时自定义类对象属性保存不上的解决方法 bsr1983 session
在利用tomcat-redis-session-manager做session同步时，遇到了在session保存一个自定义对象时，修改该对象中的某个属性，session未进行序列化，属性没有被存储到redis中。在 tomcat-redis-session-manager的github上有如下说明： Session Change Tracking As noted in the &qu
《代码大全》表驱动法-Table Driven Approach-1 bylijinnan java 算法
关于Table Driven Approach的一篇非常好的文章： http://www.codeproject.com/Articles/42732/Table-driven-Approach package com.ljn.base; import java.util.Random; public class TableDriven { public
Sybase封锁原理 chicony Sybase
昨天在操作Sybase IQ12.7时意外操作造成了数据库表锁定，不能删除被锁定表数据也不能往其中写入数据。由于着急往该表抽入数据，因此立马着手解决该表的解锁问题。无奈此前没有接触过Sybase IQ12.7这套数据库产品，加之当时已属于下班时间无法求助于支持人员支持，因此只有借助搜索引擎强大的
java异常处理机制 CrazyMizzz java
java异常关键字有以下几个，分别为 try catch final throw throws 他们的定义分别为 try： Opening exception-handling statement. catch： Captures the exception. finally： Runs its code before terminating
hive 数据插入DML语法汇总 daizj hive DML 数据插入
Hive的数据插入DML语法汇总1、Loading files into tables语法：1) LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]解释：1)、上面命令执行环境为hive客户端环境下： hive>l
工厂设计模式 dcj3sjt126com 设计模式
使用设计模式是促进最佳实践和良好设计的好办法。设计模式可以提供针对常见的编程问题的灵活的解决方案。工厂模式工厂模式（Factory）允许你在代码执行时实例化对象。它之所以被称为工厂模式是因为它负责“生产”对象。工厂方法的参数是你要生成的对象对应的类名称。 Example #1 调用工厂方法（带参数） <?phpclass Example{
mysql字符串查找函数 dcj3sjt126com mysql
FIND_IN_SET(str,strlist) 假如字符串str 在由N 子链组成的字符串列表strlist 中，则返回值的范围在1到 N 之间。一个字符串列表就是一个由一些被‘,’符号分开的自链组成的字符串。如果第一个参数是一个常数字符串，而第二个是type SET列，则 FIND_IN_SET() 函数被优化，使用比特计算。如果str不在strlist 或st
jvm内存管理 easterfly jvm
一、JVM堆内存的划分分为年轻代和年老代。年轻代又分为三部分：一个eden,两个survivor。工作过程是这样的：e区空间满了后，执行minor gc，存活下来的对象放入s0, 对s0仍会进行minor gc，存活下来的的对象放入s1中，对s1同样执行minor gc，依旧存活的对象就放入年老代中；年老代满了之后会执行major gc，这个是stop the word模式，执行
CentOS-6.3安装配置JDK-8 gengzg centos
JAVA_HOME=/usr/java/jdk1.8.0_45 JRE_HOME=/usr/java/jdk1.8.0_45/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export JAVA_HOME
【转】关于web路径的获取方法 huangyc1210 Web 路径
假定你的web application 名称为news,你在浏览器中输入请求路径： http://localhost:8080/news/main/list.jsp 则执行下面向行代码后打印出如下结果： 1、 System.out.println(request.getContextPath()); //可返回站点的根路径。也就是项
php里获取第一个中文首字母并排序远去的渡口数据结构 PHP
很久没来更新博客了，还是觉得工作需要多总结的好。今天来更新一个自己认为比较有成就的问题吧。最近在做储值结算，需求里结算首页需要按门店的首字母A-Z排序。我的数据结构原本是这样的： Array ( [0] => Array ( [sid] => 2885842 [recetcstoredpay] =&g
java内部类 hm4123660 java 内部类匿名内部类成员内部类方法内部类
　在Java中，可以将一个类定义在另一个类里面或者一个方法里面，这样的类称为内部类。内部类仍然是一个独立的类，在编译之后内部类会被编译成独立的.class文件，但是前面冠以外部类的类名和$符号。内部类可以间接解决多继承问题,可以使用内部类继承一个类，外部类继承一个类，实现多继承。 &nb
Caused by: java.lang.IncompatibleClassChangeError: class org.hibernate.cfg.Exten zhb8015
maven pom.xml关于hibernate的配置和异常信息如下，查了好多资料，问题还是没有解决。只知道是包冲突，就是不知道是哪个包....遇到这个问题的分享下是怎么解决的。。 maven pom: <dependency> <groupId>org.hibernate</groupId> <ar
Spark 性能相关参数配置详解－任务调度篇 Stark_Summer spark cache cpu 任务调度 yarn
随着Spark的逐渐成熟完善, 越来越多的可配置参数被添加到Spark中来, 本文试图通过阐述这其中部分参数的工作原理和配置思路, 和大家一起探讨一下如何根据实际场合对Spark进行配置优化。由于篇幅较长，所以在这里分篇组织，如果要看最新完整的网页版内容，可以戳这里：http://spark-config.readthedocs.org/，主要是便
css3滤镜 wangkeheng html css
经常看到一些网站的底部有一些灰色的图标，鼠标移入的时候会变亮，开始以为是js操作src或者bg呢，搜索了一下，发现了一个更好的方法：通过css3的滤镜方法。 html代码： <a href='' class='icon'><img src='utv.jpg' /></a> css代码： .icon{-webkit-filter: graysc