xiaoL_clo

Hadoop可视化分析利器之Hue

先来看下hue的架构图：

（1）Hue是什么？

Hue是一个可快速开发和调试Hadoop生态系统各种应用的一个基于浏览器的图形化用户接口。

（2）Hue能干什么？

1，访问HDFS和文件浏览
2，通过web调试和开发hive以及数据结果展示
3，查询solr和结果展示，报表生成
4，通过web调试和开发impala交互式SQL Query
5，spark调试和开发
6，Pig开发和调试
7，oozie任务的开发，监控，和工作流协调调度
8，Hbase数据查询和修改，数据展示
9，Hive的元数据（metastore）查询
10，MapReduce任务进度查看，日志追踪
11，创建和提交MapReduce，Streaming，Java job任务
12，Sqoop2的开发和调试
13，Zookeeper的浏览和编辑
14，数据库（MySQL，PostGres，SQlite，Oracle）的查询和展示

（3）Hue怎么用或者什么时候应该用？

如果你们公司用的是CDH的hadoop，那么很幸运，Hue也是出自CDH公司，自家的东西用起来当然很爽。

如果你们公司用的是Apache Hadoop或者是HDP的hadoop，那么也没事，Hue是开源的，而且支持任何版本的hadoop。

关于什么时候用，这纯属一个锦上添花的功能，你完全可以不用hue，因为各种开源项目都有自己的使用方式和开发接口，hue只不过是统一了各个项目的开发方式在一个接口里而已，这样比较方便而已，不用你一会准备使用hive，就开一个hive的cli终端，一会用pig，你就得开一个pig的grunt，或者你又想查Hbase，又得需要开一个Hbase的shell终端。如果你们使用hadoop生态系统的组件很多的情况下，使用hue还是比较方便的，另外一个好处就是hue提供了一个web的界面来开发和调试任务，不用我们再频繁登陆Linux来操作了。

你可以在任何时候，只要能上网，就可以通过hue来开发和调试数据，不用再装Linux的客户端来远程登陆操作了，这也是B/S架构的好处。

（4）如何下载，安装和编译Hue？

1，hue的依赖（centos系统）

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
           ant 
          
           asciidoc 
          
           cyrus-sasl-devel 
          
           cyrus-sasl-gssapi 
          
           gcc 
          
           gcc-c++ 
          
           krb5-devel 
          
           libtidy ( 
           for 
           unit tests only) 
          
           libxml2-devel 
          
           libxslt-devel 
          
           make 
          
           mvn (from maven  
           package 
           or maven3 tarball) 
          
           mysql 
          
           mysql-devel 
          
           openldap-devel 
          
           python-devel 
          
           sqlite-devel 
          
           openssl-devel ( 
           for 
           version 
           7 
           +)

2，散仙的在安装hue前，centos上已经安装好了，jdk，maven，ant，hadoop，hive，oozie等，环境变量如下：

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
           user= 
           "search" 
          
           # java 
          
           export JAVA_HOME= 
           "/usr/local/jdk" 
          
           export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
          
           export PATH=$PATH:$JAVA_HOME/bin 
          
           # ant 
          
           export ANT_HOME=/usr/local/ant 
          
           export CLASSPATH=$CLASSPATH:$ANT_HOME/lib 
          
           export PATH=$PATH:$ANT_HOME/bin 
          
           # maven 
          
           export MAVEN_HOME= 
           "/usr/local/maven" 
          
           export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib 
          
           export PATH=$PATH:$MAVEN_HOME/bin 
          
           ##Hadoop2. 
           2 
           的变量设置 
          
           export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 
          
           export HADOOP_HOME=/home/search/hadoop 
          
           export HADOOP_MAPRED_HOME=$HADOOP_HOME 
          
           export HADOOP_COMMON_HOME=$HADOOP_HOME 
          
           export HADOOP_HDFS_HOME=$HADOOP_HOME 
          
           export YARN_HOME=$HADOOP_HOME 
          
           export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
          
           export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
          
           export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 
          
           export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME 
          
           # Hive 
          
           export HIVE_HOME=/home/search/hive 
          
           export HIVE_CONF_DIR=/home/search/hive/conf 
          
           export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib 
          
           export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf 
          
           export OOZIE_HOME= 
           "/home/search/oozie-4.1.0" 
          
           export PATH=$PATH:$OOZIE_HOME/sbin:$OOZIE_HOME/bin

3,本文散仙主要是采用tar包的方式安装hue，除了tar包的方式，hue还能采用cm安装，当然这就与cdh的系统依赖比较大了。

hue最新的版本是3.8.1，散仙这里用的3.7.0的版本

下载地址： https://github.com/cloudera/hue/releases

hue的github地址： https://github.com/cloudera/hue

4，下载完后，解压tar包，并进入hue的根目录执行命令

make apps编译

5，编译成功后，需要配置/home/search/hue/desktop/conf/pseudo-distributed.ini文件，里面包含了hdfs，yarn，mapreduce，hive，oozie，pig，spark，solr等的ip地址和端口号配置，可根据自己的情况设置，如果没有安装某个应用，那就无须配置，只不过这个应用在web上不能使用而已，并不会影响其他框架的使用。

一个例子如下：

#####################################
    # DEVELOPMENT EDITION
    #####################################
    # Hue configuration file
    # ===================================
    #
    # For complete documentation about the contents of this file, run
    #       $ /build/env/bin/hue config_help
    #
    # All .ini files under the current directory are treated equally.  Their
    # contents are merged to form the Hue configuration, which can
    # can be viewed on the Hue at
    #       http://:/dump_config
    ###########################################################################
    # General configuration for core Desktop features (authentication, etc)
    ###########################################################################
    [desktop]
      send_dbug_messages=1
      # To show database transactions, set database_logging to 1
      database_logging=0
      # Set this to a random string, the longer the better.
      # This is used for secure hashing in the session store.
      secret_key=search
      # Webserver listens on this address and port
      http_host=0.0.0.0
      http_port=8000
      # Time zone name
      time_zone=Asia/Shanghai
      # Enable or disable Django debug mode
      ## django_debug_mode=true
      # Enable or disable backtrace for server error
      ## http_500_debug_mode=true
      # Enable or disable memory profiling.
      ## memory_profiler=false
      # Server email for internal error messages
      ## django_server_email='[email protected]'
      # Email backend
      ## django_email_backend=django.core.mail.backends.smtp.EmailBackend
      # Webserver runs as this user
      server_user=search
      server_group=search
      # This should be the Hue admin and proxy user
      default_user=search
      # This should be the hadoop cluster admin
      default_hdfs_superuser=search
      # If set to false, runcpserver will not actually start the web server.
      # Used if Apache is being used as a WSGI container.
      ## enable_server=yes
      # Number of threads used by the CherryPy web server
      ## cherrypy_server_threads=10
      # Filename of SSL Certificate
      ## ssl_certificate=
      # Filename of SSL RSA Private Key
      ## ssl_private_key=
      # List of allowed and disallowed ciphers in cipher list format.
      # See [url=http://www.openssl.org/docs/apps/ciphers.html]http://www.openssl.org/docs/apps/ciphers.html[/url] for more information on cipher list format.
      ## ssl_cipher_list=DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2
      # LDAP username and password of the hue user used for LDAP authentications.
      # Set it to use LDAP Authentication with HiveServer2 and Impala.
      ## ldap_username=hue
      ## ldap_password=
      # Default encoding for site data
      ## default_site_encoding=utf-8
      # Help improve Hue with anonymous usage analytics.
      # Use Google Analytics to see how many times an application or specific section of an application is used, nothing more.
      ## collect_usage=true
      # Support for HTTPS termination at the load-balancer level with SECURE_PROXY_SSL_HEADER.
      ## secure_proxy_ssl_header=false
      # Comma-separated list of Django middleware classes to use.
      # See [url=https://docs.djangoproject.com/en/1.4/ref/middleware/]https://docs.djangoproject.com/en/1.4/ref/middleware/[/url] for more details on middlewares in Django.
      ## middleware=desktop.auth.backend.LdapSynchronizationBackend
      # Comma-separated list of regular expressions, which match the redirect URL.
      # For example, to restrict to your local domain and FQDN, the following value can be used:
      # ^\/.*$,^http:\/\/www.mydomain.com\/.*$
      ## redirect_whitelist=
      # Comma separated list of apps to not load at server startup.
      # e.g.: pig,zookeeper
      ## app_blacklist=
      # The directory where to store the auditing logs. Auditing is disable if the value is empty.
      # e.g. /var/log/hue/audit.log
      ## audit_event_log_dir=
      # Size in KB/MB/GB for audit log to rollover.
      ## audit_log_max_file_size=100MB
    #poll_enabled=false
      # Administrators
      # ----------------
      [[django_admins]]
        ## [[[admin1]]]
        ## name=john
        ## [email protected]
      # UI customizations
      # -------------------
      [[custom]]
      # Top banner HTML code
      #banner_top_html=Search Team Hadoop Manager
      # Configuration options for user authentication into the web application
      # ------------------------------------------------------------------------
      [[auth]]
        # Authentication backend. Common settings are:
        # - django.contrib.auth.backends.ModelBackend (entirely Django backend)
        # - desktop.auth.backend.AllowAllBackend (allows everyone)
        # - desktop.auth.backend.AllowFirstUserDjangoBackend
        #     (Default. Relies on Django and user manager, after the first login)
        # - desktop.auth.backend.LdapBackend
        # - desktop.auth.backend.PamBackend
        # - desktop.auth.backend.SpnegoDjangoBackend
        # - desktop.auth.backend.RemoteUserDjangoBackend
        # - libsaml.backend.SAML2Backend
        # - libopenid.backend.OpenIDBackend
        # - liboauth.backend.OAuthBackend
        #     (New oauth, support Twitter, Facebook, Google+ and Linkedin
        ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend
        # The service to use when querying PAM.
        ## pam_service=login
        # When using the desktop.auth.backend.RemoteUserDjangoBackend, this sets
        # the normalized name of the header that contains the remote user.
        # The HTTP header in the request is converted to a key by converting
        # all characters to uppercase, replacing any hyphens with underscores
        # and adding an HTTP_ prefix to the name. So, for example, if the header
        # is called Remote-User that would be configured as HTTP_REMOTE_USER
        #
        # Defaults to HTTP_REMOTE_USER
        ## remote_user_header=HTTP_REMOTE_USER
        # Ignore the case of usernames when searching for existing users.
        # Only supported in remoteUserDjangoBackend.
        ## ignore_username_case=false
        # Ignore the case of usernames when searching for existing users to authenticate with.
        # Only supported in remoteUserDjangoBackend.
        ## force_username_lowercase=false
        # Users will expire after they have not logged in for 'n' amount of seconds.
        # A negative number means that users will never expire.
        ## expires_after=-1
        # Apply 'expires_after' to superusers.
        ## expire_superusers=true
      # Configuration options for connecting to LDAP and Active Directory
      # -------------------------------------------------------------------
      [[ldap]]
        # The search base for finding users and groups
        ## base_dn="DC=mycompany,DC=com"
        # URL of the LDAP server
        ## ldap_url=ldap://auth.mycompany.com
        # A PEM-format file containing certificates for the CA's that
        # Hue will trust for authentication over TLS.
        # The certificate for the CA that signed the
        # LDAP server certificate must be included among these certificates.
        # See more here [url=http://www.openldap.org/doc/admin24/tls.html.]http://www.openldap.org/doc/admin24/tls.html.[/url]
        ## ldap_cert=
        ## use_start_tls=true
        # Distinguished name of the user to bind as -- not necessary if the LDAP server
        # supports anonymous searches
        ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"
        # Password of the bind user -- not necessary if the LDAP server supports
        # anonymous searches
        ## bind_password=
        # Pattern for searching for usernames -- Use  for the parameter
        # For use when using LdapBackend for Hue authentication
        ## ldap_username_pattern="uid=,ou=People,dc=mycompany,dc=com"
        # Create users in Hue when they try to login with their LDAP credentials
        # For use when using LdapBackend for Hue authentication
        ## create_users_on_login = true
        # Synchronize a users groups when they login
        ## sync_groups_on_login=false
        # Ignore the case of usernames when searching for existing users in Hue.
        ## ignore_username_case=false
        # Force usernames to lowercase when creating new users from LDAP.
        ## force_username_lowercase=false
        # Use search bind authentication.
        ## search_bind_authentication=true
        # Choose which kind of subgrouping to use: nested or suboordinate (deprecated).
        ## subgroups=suboordinate
        # Define the number of levels to search for nested members.
        ## nested_members_search_depth=10
        [[[users]]]
          # Base filter for searching for users
          ## user_filter="objectclass=*"
          # The username attribute in the LDAP schema
          ## user_name_attr=sAMAccountName
        [[[groups]]]
          # Base filter for searching for groups
          ## group_filter="objectclass=*"
          # The username attribute in the LDAP schema
          ## group_name_attr=cn
        [[[ldap_servers]]]
          ## [[[[mycompany]]]]
            # The search base for finding users and groups
            ## base_dn="DC=mycompany,DC=com"
            # URL of the LDAP server
            ## ldap_url=ldap://auth.mycompany.com
            # A PEM-format file containing certificates for the CA's that
            # Hue will trust for authentication over TLS.
            # The certificate for the CA that signed the
            # LDAP server certificate must be included among these certificates.
            # See more here [url=http://www.openldap.org/doc/admin24/tls.html.]http://www.openldap.org/doc/admin24/tls.html.[/url]
            ## ldap_cert=
            ## use_start_tls=true
            # Distinguished name of the user to bind as -- not necessary if the LDAP server
            # supports anonymous searches
            ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"
            # Password of the bind user -- not necessary if the LDAP server supports
            # anonymous searches
            ## bind_password=
            # Pattern for searching for usernames -- Use  for the parameter
            # For use when using LdapBackend for Hue authentication
            ## ldap_username_pattern="uid=,ou=People,dc=mycompany,dc=com"
            ## Use search bind authentication.
            ## search_bind_authentication=true
            ## [[[[[users]]]]]
              # Base filter for searching for users
              ## user_filter="objectclass=Person"
              # The username attribute in the LDAP schema
              ## user_name_attr=sAMAccountName
            ## [[[[[groups]]]]]
              # Base filter for searching for groups
              ## group_filter="objectclass=groupOfNames"
              # The username attribute in the LDAP schema
              ## group_name_attr=cn
      # Configuration options for specifying the Desktop Database. For more info,
      # see [url=http://docs.djangoproject.com/en]http://docs.djangoproject.com/en[/url] ... gs/#database-engine
      # ------------------------------------------------------------------------
      [[database]]
        # Database engine is typically one of:
        # postgresql_psycopg2, mysql, sqlite3 or oracle.
        #
        # Note that for sqlite3, 'name', below is a a path to the filename. For other backends, it is the database name.
        # Note for Oracle, options={'threaded':true} must be set in order to avoid crashes.
        # Note for Oracle, you can use the Oracle Service Name by setting "port=0" and then "name=:/".
        ## engine=sqlite3
        ## host=
        ## port=
        ## user=
        ## password=
        ## name=desktop/desktop.db
        ## options={}
      # Configuration options for specifying the Desktop session.
      # For more info, see [url=https://docs.djangoproject.com/en/1.4/topics/http/sessions/]https://docs.djangoproject.com/en/1.4/topics/http/sessions/[/url]
      # ------------------------------------------------------------------------
      [[session]]
        # The cookie containing the users' session ID will expire after this amount of time in seconds.
        # Default is 2 weeks.
        ## ttl=1209600
        # The cookie containing the users' session ID will be secure.
        # Should only be enabled with HTTPS.
        ## secure=false
        # The cookie containing the users' session ID will use the HTTP only flag.
        ## http_only=false
        # Use session-length cookies. Logs out the user when she closes the browser window.
        ## expire_at_browser_close=false
      # Configuration options for connecting to an external SMTP server
      # ------------------------------------------------------------------------
      [[smtp]]
        # The SMTP server information for email notification delivery
        host=localhost
        port=25
        user=
        password=
        # Whether to use a TLS (secure) connection when talking to the SMTP server
        tls=no
        # Default email address to use for various automated notification from Hue
        ## default_from_email=hue@localhost
      # Configuration options for Kerberos integration for secured Hadoop clusters
      # ------------------------------------------------------------------------
      [[kerberos]]
        # Path to Hue's Kerberos keytab file
        ## hue_keytab=
        # Kerberos principal name for Hue
        ## hue_principal=hue/hostname.foo.com
        # Path to kinit
        ## kinit_path=/path/to/kinit
      # Configuration options for using OAuthBackend (Core) login
      # ------------------------------------------------------------------------
      [[oauth]]
        # The Consumer key of the application
        ## consumer_key=XXXXXXXXXXXXXXXXXXXXX
        # The Consumer secret of the application
        ## consumer_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        # The Request token URL
        ## request_token_url=https://api.twitter.com/oauth/request_token
        # The Access token URL
        ## access_token_url=https://api.twitter.com/oauth/access_token
        # The Authorize URL
        ## authenticate_url=https://api.twitter.com/oauth/authorize
    ###########################################################################
    # Settings to configure SAML
    ###########################################################################
    [libsaml]
      # Xmlsec1 binary path. This program should be executable by the user running Hue.
      ## xmlsec_binary=/usr/local/bin/xmlsec1
      # Entity ID for Hue acting as service provider.
      # Can also accept a pattern where '' will be replaced with server URL base.
      ## entity_id="/saml2/metadata/"
      # Create users from SSO on login.
      ## create_users_on_login=true
      # Required attributes to ask for from IdP.
      # This requires a comma separated list.
      ## required_attributes=uid
      # Optional attributes to ask for from IdP.
      # This requires a comma separated list.
      ## optional_attributes=
      # IdP metadata in the form of a file. This is generally an XML file containing metadata that the Identity Provider generates.
      ## metadata_file=
      # Private key to encrypt metadata with.
      ## key_file=
      # Signed certificate to send along with encrypted metadata.
      ## cert_file=
      # A mapping from attributes in the response from the IdP to django user attributes.
      ## user_attribute_mapping={'uid':'username'}
      # Have Hue initiated authn requests be signed and provide a certificate.
      ## authn_requests_signed=false
      # Have Hue initiated logout requests be signed and provide a certificate.
      ## logout_requests_signed=false
      # Username can be sourced from 'attributes' or 'nameid'.
      ## username_source=attributes
      # Performs the logout or not.
      ## logout_enabled=true
    ###########################################################################
    # Settings to configure OpenId
    ###########################################################################
    [libopenid]
      # (Required) OpenId SSO endpoint url.
      ## server_endpoint_url=https://www.google.com/accounts/o8/id
      # OpenId 1.1 identity url prefix to be used instead of SSO endpoint url
      # This is only supported if you are using an OpenId 1.1 endpoint
      ## identity_url_prefix=https://app.onelogin.com/openid/your_company.com/
      # Create users from OPENID on login.
      ## create_users_on_login=true
      # Use email for username
      ## use_email_for_username=true
    ###########################################################################
    # Settings to configure OAuth
    ###########################################################################
    [liboauth]
      # NOTE:
      # To work, each of the active (i.e. uncommented) service must have
      # applications created on the social network.
      # Then the "consumer key" and "consumer secret" must be provided here.
      #
      # The addresses where to do so are:
      # Twitter:  [url=https://dev.twitter.com/apps]https://dev.twitter.com/apps[/url]
      # Google+ : [url=https://cloud.google.com/]https://cloud.google.com/[/url]
      # Facebook: [url=https://developers.facebook.com/apps]https://developers.facebook.com/apps[/url]
      # Linkedin: [url=https://www.linkedin.com/secure/developer]https://www.linkedin.com/secure/developer[/url]
      #
      # Additionnaly, the following must be set in the application settings:
      # Twitter:  Callback URL (aka Redirect URL) must be set to http://YOUR_HUE_IP_OR_DOMAIN_NAME/oauth/social_login/oauth_authenticated
      # Google+ : CONSENT SCREEN must have email address
      # Facebook: Sandbox Mode must be DISABLED
      # Linkedin: "In OAuth User Agreement", r_emailaddress is REQUIRED
      # The Consumer key of the application
      ## consumer_key_twitter=
      ## consumer_key_google=
      ## consumer_key_facebook=
      ## consumer_key_linkedin=
      # The Consumer secret of the application
      ## consumer_secret_twitter=
      ## consumer_secret_google=
      ## consumer_secret_facebook=
      ## consumer_secret_linkedin=
      # The Request token URL
      ## request_token_url_twitter=https://api.twitter.com/oauth/request_token
      ## request_token_url_google=https://accounts.google.com/o/oauth2/auth
      ## request_token_url_linkedin=https://www.linkedin.com/uas/oauth2/authorization
      ## request_token_url_facebook=https://graph.facebook.com/oauth/authorize
      # The Access token URL
      ## access_token_url_twitter=https://api.twitter.com/oauth/access_token
      ## access_token_url_google=https://accounts.google.com/o/oauth2/token
      ## access_token_url_facebook=https://graph.facebook.com/oauth/access_token
      ## access_token_url_linkedin=https://api.linkedin.com/uas/oauth2/accessToken
      # The Authenticate URL
      ## authenticate_url_twitter=https://api.twitter.com/oauth/authorize
      ## authenticate_url_google=https://www.googleapis.com/oauth2/v1/userinfo?access_token=
      ## authenticate_url_facebook=https://graph.facebook.com/me?access_token=
      ## authenticate_url_linkedin=https://api.linkedin.com/v1/people/~:(email-address)?format=json&oauth2_access_token=
      # Username Map. Json Hash format.
      # Replaces username parts in order to simplify usernames obtained
      # Example: {"@sub1.domain.com":"_S1", "@sub2.domain.com":"_S2"}
      # converts '[email protected]' to 'email_S1'
      ## username_map={}
      # Whitelisted domains (only applies to Google OAuth). CSV format.
      ## whitelisted_domains_google=
    ###########################################################################
    # Settings for the RDBMS application
    ###########################################################################
    [librdbms]
      # The RDBMS app can have any number of databases configured in the databases
      # section. A database is known by its section name
      # (IE sqlite, mysql, psql, and oracle in the list below).
      [[databases]]
        # sqlite configuration.
        ## [[[sqlite]]]
          # Name to show in the UI.
          ## nice_name=SQLite
          # For SQLite, name defines the path to the database.
          ## name=/tmp/sqlite.db
          # Database backend to use.
          ## engine=sqlite
          # Database options to send to the server when connecting.
          # [url=https://docs.djangoproject.com/en/1.4/ref/databases/]https://docs.djangoproject.com/en/1.4/ref/databases/[/url]
          ## options={}
        # mysql, oracle, or postgresql configuration.
        ## [[[mysql]]]
          # Name to show in the UI.
          ## nice_name="My SQL DB"
          # For MySQL and PostgreSQL, name is the name of the database.
          # For Oracle, Name is instance of the Oracle server. For express edition
          # this is 'xe' by default.
          ## name=mysqldb
          # Database backend to use. This can be:
          # 1. mysql
          # 2. postgresql
          # 3. oracle
          ## engine=mysql
          # IP or hostname of the database to connect to.
          ## host=localhost
          # Port the database server is listening to. Defaults are:
          # 1. MySQL: 3306
          # 2. PostgreSQL: 5432
          # 3. Oracle Express Edition: 1521
          ## port=3306
          # Username to authenticate with when connecting to the database.
          ## user=example
          # Password matching the username to authenticate with when
          # connecting to the database.
          ## password=example
          # Database options to send to the server when connecting.
          # [url=https://docs.djangoproject.com/en/1.4/ref/databases/]https://docs.djangoproject.com/en/1.4/ref/databases/[/url]
          ## options={}
    ###########################################################################
    # Settings to configure your Hadoop cluster.
    ###########################################################################
    [hadoop]
      # Configuration for HDFS NameNode
      # ------------------------------------------------------------------------
      [[hdfs_clusters]]
        # HA support by using HttpFs
        [[[default]]]
          # Enter the filesystem uri
          fs_defaultfs=hdfs://h1:8020
          # NameNode logical name.
          logical_name=h1
          # Use WebHdfs/HttpFs as the communication mechanism.
          # Domain should be the NameNode or HttpFs host.
          # Default port is 14000 for HttpFs.
          webhdfs_url=http://h1:50070/webhdfs/v1
          # Change this if your HDFS cluster is Kerberos-secured
          security_enabled=false
          # Default umask for file and directory creation, specified in an octal value.
          umask=022
          hadoop_conf_dir=/home/search/hadoop/etc/hadoop
      # Configuration for YARN (MR2)
      # ------------------------------------------------------------------------
      [[yarn_clusters]]
        [[[default]]]
          # Enter the host on which you are running the ResourceManager
          resourcemanager_host=h1
          # The port where the ResourceManager IPC listens on
          resourcemanager_port=8032
          # Whether to submit jobs to this cluster
          submit_to=True
          # Resource Manager logical name (required for HA)
          ## logical_name=
          # Change this if your YARN cluster is Kerberos-secured
          ## security_enabled=false
          # URL of the ResourceManager API
          resourcemanager_api_url=http://h1:8088
          # URL of the ProxyServer API
          proxy_api_url=http://h1:8088
          # URL of the HistoryServer API
          history_server_api_url=http://h1:19888
        # HA support by specifying multiple clusters
        # e.g.
        # [[[ha]]]
          # Resource Manager logical name (required for HA)
          ## logical_name=my-rm-name
      # Configuration for MapReduce (MR1)
      # ------------------------------------------------------------------------
      [[mapred_clusters]]
        [[[default]]]
          # Enter the host on which you are running the Hadoop JobTracker
         jobtracker_host=h1
          # The port where the JobTracker IPC listens on
         #jobtracker_port=8021
          # JobTracker logical name for HA
          ## logical_name=
          # Thrift plug-in port for the JobTracker
          ## thrift_port=9290
          # Whether to submit jobs to this cluster
          submit_to=False
          # Change this if your MapReduce cluster is Kerberos-secured
          ## security_enabled=false
        # HA support by specifying multiple clusters
        # e.g.
        # [[[ha]]]
          # Enter the logical name of the JobTrackers
          # logical_name=my-jt-name
    ###########################################################################
    # Settings to configure the Filebrowser app
    ###########################################################################
    [filebrowser]
      # Location on local filesystem where the uploaded archives are temporary stored.
      ## archive_upload_tempdir=/tmp
    ###########################################################################
    # Settings to configure liboozie
    ###########################################################################
    [liboozie]
      # The URL where the Oozie service runs on. This is required in order for
      # users to submit jobs. Empty value disables the config check.
      ## oozie_url=http://localhost:11000/oozie
      oozie_url=http://h1:11000/oozie
      # Requires FQDN in oozie_url if enabled
      ## security_enabled=false
      # Location on HDFS where the workflows/coordinator are deployed when submitted.
      remote_deployement_dir=/user/hue/oozie/deployments
    ###########################################################################
    # Settings to configure the Oozie app
    ###########################################################################
    [oozie]
      # Location on local FS where the examples are stored.
      local_data_dir=apps/oozie/examples/
      # Location on local FS where the data for the examples is stored.
      ## sample_data_dir=...thirdparty/sample_data
      # Location on HDFS where the oozie examples and workflows are stored.
      remote_data_dir=apps/oozie/workspaces
      # Maximum of Oozie workflows or coodinators to retrieve in one API call.
      oozie_jobs_count=100
      # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
      ## enable_cron_scheduling=true
      enable_cron_scheduling=true
    ###########################################################################
    # Settings to configure Beeswax with Hive
    ###########################################################################
    [beeswax]
      # Host where HiveServer2 is running.
      # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
      hive_server_host=h1
      # Port where HiveServer2 Thrift server runs on.
      hive_server_port=10000
      # Hive configuration directory, where hive-site.xml is located
      hive_conf_dir=/home/search/hive/conf
      # Timeout in seconds for thrift calls to Hive service
      server_conn_timeout=120
      # Set a LIMIT clause when browsing a partitioned table.
      # A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
      browse_partitioned_table_limit=250
      # A limit to the number of rows that can be downloaded from a query.
      # A value of -1 means there will be no limit.
      # A maximum of 65,000 is applied to XLS downloads.
      download_row_limit=1000000
      # Hue will try to close the Hive query when the user leaves the editor page.
      # This will free all the query resources in HiveServer2, but also make its results inaccessible.
      ## close_queries=false
      # Thrift version to use when communicating with HiveServer2
      ## thrift_version=5
      [[ssl]]
        # SSL communication enabled for this server.
        ## enabled=false
        # Path to Certificate Authority certificates.
        ## cacerts=/etc/hue/cacerts.pem
        # Path to the private key file.
        ## key=/etc/hue/key.pem
        # Path to the public certificate file.
        ## cert=/etc/hue/cert.pem
        # Choose whether Hue should validate certificates received from the server.
        ## validate=true
    ###########################################################################
    # Settings to configure Pig
    ###########################################################################
    [pig]
      # Location of piggybank.jar on local filesystem.
      local_sample_dir=/home/search/hue/apps/pig/examples
      # Location piggybank.jar will be copied to in HDFS.
      remote_data_dir=/home/search/pig/examples
    ###########################################################################
    # Settings to configure Sqoop
    ###########################################################################
    [sqoop]
      # For autocompletion, fill out the librdbms section.
      # Sqoop server URL
      server_url=http://h1:12000/sqoop
    ###########################################################################
    # Settings to configure Proxy
    ###########################################################################
    [proxy]
      # Comma-separated list of regular expressions,
      # which match 'host:port' of requested proxy target.
      ## whitelist=(localhost|127\.0\.0\.1):(50030|50070|50060|50075)
      # Comma-separated list of regular expressions,
      # which match any prefix of 'host:port/path' of requested proxy target.
      # This does not support matching GET parameters.
      ## blacklist=
    ###########################################################################
    # Settings to configure Impala
    ###########################################################################
    [impala]
      # Host of the Impala Server (one of the Impalad)
      ## server_host=localhost
      # Port of the Impala Server
      ## server_port=21050
      # Kerberos principal
      ## impala_principal=impala/hostname.foo.com
      # Turn on/off impersonation mechanism when talking to Impala
      ## impersonation_enabled=False
      # Number of initial rows of a result set to ask Impala to cache in order
      # to support re-fetching them for downloading them.
      # Set to 0 for disabling the option and backward compatibility.
      ## querycache_rows=50000
      # Timeout in seconds for thrift calls
      ## server_conn_timeout=120
      # Hue will try to close the Impala query when the user leaves the editor page.
      # This will free all the query resources in Impala, but also make its results inaccessible.
      ## close_queries=true
      # If QUERY_TIMEOUT_S > 0, the query will be timed out (i.e. cancelled) if Impala does not do any work
      # (compute or send back results) for that query within QUERY_TIMEOUT_S seconds.
      ## query_timeout_s=600
    ###########################################################################
    # Settings to configure HBase Browser
    ###########################################################################
    [hbase]
      # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
      # Use full hostname with security.
      ## hbase_clusters=(Cluster|localhost:9090)
      # HBase configuration directory, where hbase-site.xml is located.
      ## hbase_conf_dir=/etc/hbase/conf
      # Hard limit of rows or columns per row fetched before truncating.
      ## truncate_limit = 500
      # 'buffered' is the default of the HBase Thrift Server and supports security.
      # 'framed' can be used to chunk up responses,
      # which is useful when used in conjunction with the nonblocking server in Thrift.
      ## thrift_transport=buffered
    ###########################################################################
    # Settings to configure Solr Search
    ###########################################################################
    [search]
      # URL of the Solr Server
      solr_url=http://172.21.50.41:8983/solr/
      # Requires FQDN in solr_url if enabled
      ## security_enabled=false
      ## Query sent when no term is entered
      ## empty_query=*:*
    ###########################################################################
    # Settings to configure Solr Indexer
    ###########################################################################
    [indexer]
      # Location of the solrctl binary.
      ## solrctl_path=/usr/bin/solrctl
      # Location of the solr home.
      ## solr_home=/usr/lib/solr
      # Zookeeper ensemble.
      ## solr_zk_ensemble=localhost:2181/solr
      # The contents of this directory will be copied over to the solrctl host to its temporary directory.
      ## config_template_path=/../hue/desktop/libs/indexer/src/data/solr_configs
    ###########################################################################
    # Settings to configure Job Designer
    ###########################################################################
    [jobsub]
      # Location on local FS where examples and template are stored.
      ## local_data_dir=..../data
      # Location on local FS where sample data is stored
      ## sample_data_dir=...thirdparty/sample_data
    ###########################################################################
    # Settings to configure Job Browser
    ###########################################################################
    [jobbrowser]
      # Share submitted jobs information with all users. If set to false,
      # submitted jobs are visible only to the owner and administrators.
      ## share_jobs=true
    ###########################################################################
    # Settings to configure the Zookeeper application.
    ###########################################################################
    [zookeeper]
      [[clusters]]
        [[[default]]]
          # Zookeeper ensemble. Comma separated list of Host/Port.
          # e.g. localhost:2181,localhost:2182,localhost:2183
          host_ports=zk1:2181
          # The URL of the REST contrib service (required for znode browsing)
          ## rest_url=http://localhost:9998
    ###########################################################################
    # Settings to configure the Spark application.
    ###########################################################################
    [spark]
      # URL of the REST Spark Job Server.
      server_url=http://h1:8080/
    ###########################################################################
    # Settings for the User Admin application
    ###########################################################################
    [useradmin]
      # The name of the default user group that users will be a member of
      ## default_user_group=default
    ###########################################################################
    # Settings for the Sentry lib
    ###########################################################################
    [libsentry]
      # Hostname or IP of server.
      ## hostname=localhost
      # Port the sentry service is running on.
      ## port=8038
      # Sentry configuration directory, where sentry-site.xml is located.
      ## sentry_conf_dir=/etc/sentry/conf

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
 
       
         
         
           -rw-rw-r--  
           1 
           search search   
           2782 
           5 
           月  
           19 
           06 
           : 
           04 
           app.reg 
          
 
           -rw-rw-r--  
           1 
           search search   
           2782 
           5 
           月  
           19 
           05 
           : 
           41 
           app.reg.bak 
          
 
           drwxrwxr-x 
           22 
           search search   
           4096 
           5 
           月  
           20 
           01 
           : 
           05 
           apps 
          
 
           drwxrwxr-x  
           3 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           41 
           build 
          
 
           drwxr-xr-x  
           2 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           40 
           data 
          
 
           drwxrwxr-x  
           7 
           search search   
           4096 
           5 
           月  
           20 
           01 
           : 
           29 
           desktop 
          
 
           drwxrwxr-x  
           2 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           41 
           dist 
          
 
           drwxrwxr-x  
           7 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           40 
           docs 
          
 
           drwxrwxr-x  
           3 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           40 
           ext 
          
 
           -rw-rw-r--  
           1 
           search search  
           11358 
           5 
           月  
           19 
           05 
           : 
           38 
           LICENSE.txt 
          
 
           drwxrwxr-x  
           2 
           search search   
           4096 
           5 
           月  
           20 
           01 
           : 
           29 
           logs 
          
 
           -rw-rw-r--  
           1 
           search search   
           8121 
           5 
           月  
           19 
           05 
           : 
           41 
           Makefile 
          
 
           -rw-rw-r--  
           1 
           search search   
           8505 
           5 
           月  
           19 
           05 
           : 
           41 
           Makefile.sdk 
          
 
           -rw-rw-r--  
           1 
           search search   
           3093 
           5 
           月  
           19 
           05 
           : 
           40 
           Makefile.tarball 
          
 
           -rw-rw-r--  
           1 
           search search   
           3498 
           5 
           月  
           19 
           05 
           : 
           41 
           Makefile.vars 
          
 
           -rw-rw-r--  
           1 
           search search   
           2302 
           5 
           月  
           19 
           05 
           : 
           41 
           Makefile.vars.priv 
          
 
           drwxrwxr-x  
           2 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           41 
           maven 
          
 
           -rw-rw-r--  
           1 
           search search    
           801 
           5 
           月  
           19 
           05 
           : 
           40 
           NOTICE.txt 
          
 
           -rw-rw-r--  
           1 
           search search   
           4733 
           5 
           月  
           19 
           05 
           : 
           41 
           README.rst 
          
 
           -rw-rw-r--  
           1 
           search search     
           52 
           5 
           月  
           19 
           05 
           : 
           38 
           start.sh 
          
 
           -rw-rw-r--  
           1 
           search search     
           65 
           5 
           月  
           19 
           05 
           : 
           41 
           stop.sh 
          
 
           drwxrwxr-x  
           9 
           search search   
           4096 
           5 
           月  
           19 
           05 
           : 
           38 
           tools 
          
 
           -rw-rw-r--  
           1 
           search search    
           932 
           5 
           月  
           19 
           05 
           : 
           41 
           VERSION 
          
 
       
 
     

6，启动hue，执行命令：build/env/bin/supervisor

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
 
       
         
         
           [search 
           @h1 
           hue]$ build/env/bin/supervisor 
          
 
           [INFO] Not running as root, skipping privilege drop 
          
 
           starting server with options { 
           'ssl_certificate' 
           : None,  
           'workdir' 
           : None,  
           'server_name' 
           : 
           'localhost' 
           , 
           'host' 
           : 
           '0.0.0.0' 
           , 
           'daemonize' 
           : False,  
           'threads' 
           : 
           10 
           , 
           'pidfile' 
           : None,  
           'ssl_private_key' 
           : None,  
           'server_group' 
           : 
           'search' 
           , 
           'ssl_cipher_list' 
           : 
           'DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2' 
           , 
           'port' 
           : 
           8000 
           , 
           'server_user' 
           : 
           'search' 
           } 
          
 
       
 
     

然后我们就可以访问安装机ip+8000端口来查看了：

工具箱界面：

hive的界面：

在配置hive（散仙这里是0.13的版本）的时候，需要注意以下几个方面：

hive的metastrore的服务和hiveserver2服务都需要启动

执行下面命令

[Java] 纯文本查看复制代码

 
      ? 
     
           bin/hive --service metastore 
          
           bin/hiveserver2

除此之外，还需要关闭的hive的SAL认证，否则，使用hue访问会出现问题。

注意下面三项的配置

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
           hive.metastore.warehouse.dir 
          
           /user/hive/warehouse 
          
           location of  
           default 
           database  
           for 
           the warehouse 
          
           hive.server2.thrift.port 
          
           10000 
            
           Port number of HiveServer2 Thrift  
           interface 
           . 
          
           Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT 
          
           hive.server2.thrift.bind.host 
          
           h1 
          
           Bind host on which to run the HiveServer2 Thrift  
           interface 
           . 
          
           Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST 
          
           hive.server2.authentication 
          
           NOSASL 
          
           Client authentication types. 
          
           NONE: no authentication check 
          
           LDAP: LDAP/AD based authentication 
          
           KERBEROS: Kerberos/GSSAPI authentication 
          
           CUSTOM: Custom authentication provider 
          
           (Use with property hive.server2.custom.authentication. 
           class 
           ) 
          
           PAM: Pluggable authentication module.

除了上面的配置外，还需要把hive.server2.long.polling.timeout的参数值，默认是5000L给改成5000,否则使用beenline连接时候，会出错，这是hive的一个bug。

pig的界面：

solr的界面如下：

最后需要注意一点，hue也需要在hadoop的core-site.xml里面配置相应的代理用户，示例如下：

Java代码

[Java] 纯文本查看复制代码

 
      ? 
     
           hadoop.proxyuser.hue.hosts 
          
           * 
          
           hadoop.proxyuser.hue.groups 
          
           *

ok至此，我们的hue已经能完美工作了，我们可以根据自己的需要，定制相应的app插件，非常灵活！

你可能感兴趣的:(大数据)

贝融助手是什么？贝融助手是专业的大数据信用查询平台无忧达人
贝融助手是一个可以快速了解自己信用的工具，是一个生活中非常实用的小助手，信用是现在最重要的一个生活场景，人人都想有一个好的信用，贝融助手就是帮助我们查询自己信用的平台。贝融助手是一个非常专业的平台，贝融助手18年就上线了，到现在已经有很多年的历史了，在信用行业一直都是行业前三的平台，用户量也是非常的大，身边朋友都在用的平台。贝融助手查询入口放在文末了，划到文章结尾就可以看到查询入口贝融助手大数据信
从AWS MySQL数据库下载备份到S3的完整解决方案 AWS官方合作商数据库 aws mysql
本文将介绍两种主流方法将AWSRDSMySQL数据库备份下载到S3，适用于生产环境需求。方法一：通过RDS快照导出（AWS原生方案）适用场景：全量备份、大数据量、无需额外计算资源流程：创建数据库快照进入AWSRDS控制台→选择目标MySQL实例→点击"操作"→"拍摄快照"输入快照名称（如my-db-snapshot-2024）配置S3导出任务在RDS控制台左侧菜单选择快照→选择刚创建的快照点击"操
java毕业设计-基于Javaweb的家常小菜烹饪学习管理系统的设计与实现(源码+LW+部署文档+全bao+远程调试+代码讲解等) 程序猿刘 vue spring boot 毕业设计 java 课程设计学习
博主介绍：✌️码农一枚，专注于大学生项目实战开发、讲解和毕业文撰写修改等。全栈领域优质创作者，博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、小程序技术领域和毕业项目实战✌️技术范围：：小程序、SpringBoot、SSM、JSP、Vue、PHP、Java、python、爬虫、数据可视化、大数据、物联网、机器学习等设计与开发。主要内容：免费开题报告、任务书、全bao定制+
java毕业设计源码案例-基于ssm+协同过滤的个性化小说推荐系统设计与实现(源码+LW+部署文档+全bao+远程调试+代码讲解等) 项目帮 springboot java 计算机毕设 java 课程设计开发语言
博主介绍：✌️码农一枚，专注于大学生项目实战开发、讲解和毕业文撰写修改等。全栈领域优质创作者，博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、小程序技术领域和毕业项目实战✌️技术范围：：小程序、SpringBoot、SSM、JSP、Vue、PHP、Java、python、爬虫、数据可视化、大数据、物联网、机器学习等设计与开发。主要内容：免费功能设计，开题报告、任务书、全b
碳中和碳交易骗局揭晓！第七届内部操盘群伍戈被骗黑幕曝光!血泪事迹令人惊心! 昌龙律法
如今大家生活好了，手里或多或少有点闲钱了。就开始想着怎么赚更多的钱！这也使得各种投资市场很火爆，无孔不入的骗子们又暗戳戳上线了，利用人们对赚钱的渴望，打着网络投资的旗号实施诈骗。随着“互联网+”的发展，万物皆可“数字经济”的“数字大数据”投资项目走入现实生活中。但是有不法分子就利用了这一“商机”，将数字投资变为新型找形式，并且利用洗脑话术，核心骗术仍然是高额返利，让人不知不觉掉进提前布局的“陷阱”
计算机专业大数据毕业设计-基于 Spark 的音乐数据分析项目(源码+LW+部署文档+全bao+远程调试+代码讲解等) 程序猿八哥数据可视化计算机毕设 spark 大数据课程设计 spark
博主介绍：✌️码农一枚，专注于大学生项目实战开发、讲解和毕业文撰写修改等。全栈领域优质创作者，博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、小程序技术领域和毕业项目实战✌️技术范围：：小程序、SpringBoot、SSM、JSP、Vue、PHP、Java、python、爬虫、数据可视化、大数据、物联网、机器学习等设计与开发。主要内容：免费功能设计，开题报告、任务书、全b
智慧水库信息化系统建设产品需求文档V2.0 小赖同学啊 test Technology Precious 物联网
智慧水库信息化系统建设产品需求文档1.引言1.1文档目的本文档旨在明确智慧水库信息化系统的建设需求，为系统设计、开发和实施提供全面依据，确保系统功能满足水库管理业务需求，提升水库管理的智能化水平和决策效率。1.2背景介绍传统水库管理面临数据采集不及时、分析手段有限、决策依赖经验等问题，难以应对复杂多变的水文情势和日益增长的管理需求。随着物联网、大数据、人工智能等技术的发展，智慧水库建设成为必然趋势
9.20其二道左无人
做一家服务公司，为下面的公司提供一些事务性的管理、财务管理、风险管理的服务，粘住一个大圈子的HR，通过下面的公司做掉项目，为HR提供一个稳定的资源变现的渠道；做一家科技公司，提供线上的平台运营，大数据采集，以及基于这个基础上的卖货、信贷等服务做一家连锁企业，每一家门店都是独立的企业，提供招聘、引流以及终端服务所以外部通过众筹绑定大批量的HR，就会有稳定的订单，通过服务公司提供服务，通过终端门店保证
只靠可视化大屏，做不了数字化，数据总监总结3点，你做到了几个大数据的那些事
企业数字化是很多企业热衷的话题。本文的数字化指各行业头的头部企业的端到端数字化解决方案，常见部署于华为专有云、阿里私有云、亚马逊云，项目金额一般百万起步，上不封顶。很多企业投人、投钱数字化，都希望有个酷炫的数据大脑，政府、合作伙伴来参观时，用酷炫的数据大脑让来宾们啧啧称赞。热闹散去后，企业内部的各部门，天天围着数据挖宝，大数据快告诉我，下个月能卖多少，哪几个渠道卖得不好，哪条生产线有问题，哪些货压
你多久没有认真读一本书了我是巴卡
我九岁博览群书，二十岁达到顶峰。我现在都是看社会人文类的书，例如《知音》《故事会》……往前推三百年，往后推三百年，总共六百年没有人超过我。——凤姐引用凤姐的话，没有嘲讽的意思。现在的人，包括我自己，除了刷手机，恐怕连杂志都很少读了，更别说认真读一本书了。1、大数据下，人越读越窄，越读越傻前段时间，埃航波音737MAX8出事，就在网上跟着读了几篇报道。随后的一段时间，基本打开APP都是关于波音和73
注意力才是我们最值钱的东西心守平凡_王慧超
4月10日晚，罗永浩携手国民神车哈弗品牌完成了第二场带货直播。此次直播共售出11357张2777元的优惠券，预估销售额15.65亿元，创造了汽车直播带货的新纪录。流量时代真的已经来临了，随着互联网的高速发展，越来越多的网络用户增加，我们不得不承认，我们已经进入了一个网络时代，进入了一个流量大数据时代。我们所有想获得的东西都可以通过网络获取，资料、信息、购物，网络正在改变人们的生活方式，正在成为人们
六、深度剖析 Hadoop 分布式文件系统（HDFS）的数据存储机制与读写流程
深度剖析Hadoop分布式文件系统（HDFS）的数据存储机制与读写流程在当今大数据领域当中，Hadoop分布式文件系统（HDFS）作为极为关键的核心组件之一，为海量规模的数据的存储以及处理构筑起了坚实无比的根基。本文将会对HDFS的数据存储机制以及读写流程展开全面且深入的探究，通过将原理与实际的实例紧密结合的方式，助力广大读者更加全面地理解HDFS的工作原理以及其具体的应用场景。一、HDFS概述H
养老院管理系统基于SpringBoot的养老院管理系统系统设计与实现（源码+论文+部署讲解等）
博主介绍：✌全网粉丝60W+,csdn特邀作者、Java领域优质创作者、csdn/掘金/哔哩哔哩/知乎/道客/小红书等平台优质作者，计算机毕设实战导师，目前专注于大学生项目实战开发,讲解,毕业答疑辅导，欢迎高校老师/同行前辈交流合作✌技术栈范围：SpringBoot、Vue、SSM、Jsp、HLMT、Nodejs、Python、爬虫、数据可视化、小程序、安卓app、大数据、物联网、机器学习、单片机
大数据处理技术：分布式文件系统HDFS 茜茜西西CeCe hdfs hadoop 大数据 HDFS-JAVA接口文件头歌 Java
目录1实验名称：2实验目的3实验内容4实验原理5实验过程或源代码5.1HDFS的基本操作5.2HDFS-JAVA接口之读取文件5.3HDFS-JAVA接口之上传文件5.4HDFS-JAVA接口之删除文件6实验结果6.1HDFS的基本操作6.2HDFS-JAVA接口之读取文件6.3HDFS-JAVA接口之上传文件6.4HDFS-JAVA接口之删除文件1实验名称：分布式文件系统HDFS2实验目的1.理
基于用户画像的商品推荐系统 Dush32 机器学习人工智能 python 推荐算法
随着人工智能和大数据技术的进步，产品推荐系统成为了现代广告与电商平台中不可或缺的部分。通过深度挖掘用户的行为数据，能够为广告主提供精准的用户画像，从而更高效地推荐相关产品，提升购买转化率。本项目基于科大讯飞AI营销云大赛的赛题，目的是利用用户画像进行产品推荐，预测用户是否会购买相应商品。我们使用了机器学习的二分类模型，通过分析用户的性别、年龄、常驻地、机型等信息，来判断用户的付费行为。项目目标：本
InfluxDB 数据模型：桶、测量、标签与字段详解（一）计算机毕设定制辅导-无忧 #InfluxDB db
一、引言**在大数据和物联网蓬勃发展的当下，时间序列数据的处理需求呈爆发式增长。InfluxDB作为一款高性能的开源时序数据库，凭借其卓越的特性，在时序数据库领域占据了重要地位，被广泛应用于各种场景。InfluxDB专为时间序列数据设计，拥有高效的存储和查询性能。它采用独特的存储引擎，能够快速写入大量带有时间戳的数据，并支持灵活的查询操作。其核心设计针对时间序列数据的特点进行了优化，包括时间索引、
Kafka 集群架构与高可用方案设计（一）计算机毕设定制辅导-无忧 #Kafka kafka 架构分布式
Kafka集群架构与高可用方案设计的重要性在大数据和分布式系统的广阔领域中，Kafka已然成为了一个中流砥柱般的存在。它最初由LinkedIn开发，后捐赠给Apache软件基金会并成为顶级项目，凭借其卓越的高吞吐量、可扩展性以及持久性，被广泛应用于日志收集、实时数据处理、流计算、数据集成等诸多关键领域。在日志收集场景下，以大型互联网公司为例，每天都会产生海量的日志数据，如用户的访问记录、系统操作日
大数据集成方案对比：Kafka vs Flume vs Sqoop AI天才研究院计算 AI大模型应用入门实战与进阶 Agentic AI 实战大数据 kafka flume ai
大数据集成方案对比：KafkavsFlumevsSqoop关键词：大数据集成、Kafka、Flume、Sqoop、流处理、批量迁移、日志收集摘要：在大数据生态中，数据集成是连接数据源与数据处理平台的关键环节。本文深度对比Kafka、Flume、Sqoop三大主流集成工具，从核心架构、技术原理、适用场景到实战案例展开系统性分析。通过数学模型量化性能差异，结合实际项目经验总结选型策略，帮助开发者根据业
飞算科技：以创新科技引领数字化变革，旗下飞算 JavaAI 成开发利器飞算JavaAI开发助手科技
作为国家级高新技术企业，飞算科技专注于自主创新，在数字科技领域持续深耕，用前沿技术为各行业客户赋能，助力其实现数字化转型升级的飞跃。飞算科技凭借深厚的技术积累，将互联网科技、大数据、人工智能等技术与实际应用紧密融合。公司组建了一支由行业资深专家和技术精英构成的团队，他们在相关领域积累了多年实践经验，深刻理解不同行业客户在数字化进程中面临的痛点与挑战。基于这些洞察，飞算科技推出了一系列具有创新性和实
Java 大视界 -- Java 大数据机器学习模型在金融市场情绪分析与投资策略制定中的应用青云交大数据新视界 Java 大视界 java 大数据机器学习情绪分析智能投资多源数据
Java大视界--Java大数据机器学习模型在金融市场情绪分析与投资策略制定中的应用）引言：正文：一、金融情绪数据的立体化采集与治理1.1多模态数据采集架构1.2数据治理与特征工程二、Java机器学习模型的工程化实践2.1情感分析模型的深度优化2.2强化学习驱动的动态投资策略三、顶级机构实战：Java系统的金融炼金术四、技术前沿：Java与金融科技的未来融合4.1量子机器学习集成4.2联邦学习在合
Java 大视界 -- Java 大数据在影视内容推荐与用户兴趣挖掘中的深度实践（183）青云交大数据新视界 Java 大视界 Java+Python 双剑合璧：AI 大数据实战通关秘籍大数据影视内容推荐用户兴趣挖掘协同过滤基于内容推荐数据可视化个性化推荐系统
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！全网（微信公众号/CSDN/抖音/华为/支付宝/微博）：青云交一、欢迎加入【福利社群】点击快速加入1：青云交技术圈福利社群（NEW)点击快速加入2：2025CS
Java 大视界 -- 基于 Java 的大数据分布式文件系统在科研数据存储与共享中的应用优化（187）青云交大数据新视界 Java 大视界 Java+Python 双剑合璧：AI 大数据实战通关秘籍大数据大数据分布式文件系统科研数据存储科研数据共享应用优化 HDFS 数据分区
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！全网（微信公众号/CSDN/抖音/华为/支付宝/微博）：青云交一、欢迎加入【福利社群】点击快速加入1：青云交技术圈福利社群（NEW)点击快速加入2：CSDN博客
Python医疗大数据实战：基于Scrapy-Redis的医院评价数据分布式爬虫设计与实现 Python爬虫项目 python 开发语言爬虫 selenium scrapy
摘要本文将详细介绍如何使用Python构建一个高效的医院评价数据爬虫系统。我们将从爬虫基础讲起，逐步深入到分布式爬虫架构设计，使用Scrapy框架结合Redis实现分布式爬取，并采用最新的反反爬技术确保数据采集的稳定性。文章包含完整的代码实现、性能优化方案以及数据处理方法，帮助读者掌握医疗大数据采集的核心技术。关键词：Python爬虫、Scrapy-Redis、分布式爬虫、医疗大数据、反反爬技术1
flink-sql读写hive-1.13 第一片心意 flink flink sql hive
1.版本说明本文档内容基于flink-1.13.x，其他版本的整理，请查看本人博客的flink专栏其他文章。1.1.概述ApacheHive已经成为了数据仓库生态系统中的核心。它不仅仅是一个用于大数据分析和ETL场景的SQL引擎，同样也是一个数据管理平台，可用于发现，定义，和演化数据。Flink与Hive的集成包含两个层面。一是利用了Hive的MetaStore作为持久化的Catalog，用户可通
觉察与正念佳佳的宝瓶子
今天因为交电费的事与妈妈沟通。在沟通的过程中，年届八十的母亲一直给我强调着过去怎么怎么。父母家的电费一直是银行代扣的，这样的模式自从可以通过银行代扣便开始了。可见那时候的父母还是蛮新潮的，能接受新事物的。至从有了智能手机，人类便进入了大数据时代。通过微信或支付宝来交电费方便得多。可惜父亲不在了，老母亲是连手机都坚决不用的人。（因为想要掩饰自己的不能、不敢，所以干脆拒绝！不愿意做任何的改变）。今年，
Java大视界：Java大数据在智能医疗电子健康档案数据挖掘与健康服务创新＞ Loving_enjoy 计算机学科论文创新点人工智能深度学习迁移学习经验分享
>本文通过完整代码示例，揭秘如何用Java大数据技术挖掘电子健康档案价值，实现疾病预测、个性化健康管理等创新服务。###一、智能医疗时代的数据金矿电子健康档案（EHR）作为医疗数字化的核心载体，包含海量患者全生命周期健康数据。据统计，全球医疗数据量正以每年**48%的速度增长**，单个三甲医院年数据量可达**PB级**。这些数据蕴藏着疾病规律、治疗效能的宝贵知识，但传统技术难以有效挖掘。**Jav
无人值守人工智能智慧系统数据分析：深度洞察与未来展望呆码科技人工智能数据分析数据挖掘
无人值守人工智能智慧系统数据分析：深度洞察与未来展望随着科技的飞速发展，人工智能（AI）技术已逐渐渗透到社会经济的各个领域，其中无人值守人工智能智慧系统作为AI技术应用的前沿阵地，正引领着一场深刻的行业变革。这类系统通过集成高级算法、大数据分析、物联网（IoT）及云计算等先进技术，实现了对复杂环境的自主监控、智能决策与高效管理，极大地提升了运营效率，降低了人力成本，并开启了数据驱动决策的新纪元。本
浮漂式水质监测设备：智能守护水环境的未来之眼柏峰电子人工智能
浮漂式水质监测设备：智能守护水环境的未来之眼柏峰【BF-FBSZ】随着全球水资源短缺和水污染问题日益严峻，水质监测技术正迎来前所未有的发展机遇。作为这一领域的创新突破，浮漂式水质监测设备凭借其实时性、智能化和网络化优势，正在重塑水资源管理的新格局。本文将深入探讨这一技术的原理、特点、应用场景及未来发展趋势。一、技术原理与系统架构浮漂式水质监测设备是一种集成了现代传感器技术、物联网和大数据分析的智能
基于蜣螂算法优化多头注意力机制的卷积神经网络结合双向长短记忆神经网络实现温度预测DBO-CNN-biLSTM-Multihead-Attention附matlab代码 matlab科研助手神经网络算法 cnn
✅作者简介：热爱科研的Matlab仿真开发者，修心和技术同步精进，代码获取、论文复现及科研仿真合作可私信。个人主页：Matlab科研工作室个人信条：格物致知。更多Matlab完整代码及仿真定制内容点击智能优化算法神经网络预测雷达通信无线传感器电力系统信号处理图像处理路径规划元胞自动机无人机物理应用机器学习内容介绍温度预测在气象学、农业、能源等领域具有重要的应用价值。随着大数据和人工智能技术的快速发
基于Socket来构建无界数据流并通过Flink框架进行处理每天五分钟玩转人工智能 Flink技术实战 flink 大数据 Flink 分布式无界数据
本文重点随着大数据技术的不断发展，实时数据流处理已成为企业应对海量数据、实现快速决策的关键技术。ApacheFlink是一个开源的流处理框架，它能够对无界数据流进行高效的、精确的处理。本文将介绍如何通过Socket构建无界数据流，并利用Flink框架进行无界流处理。基于Socket构建无界数据无界数据指的是源源不断产生的数据，这些数据通常来自各种实时数据源，如用户行为日志、传感器数据等。Socke
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam