I. add hive-site.xml
Add the hive-site.xml configuration file under $SPARK_HOME/conf in order to access hive metadata normally
vim hive-site.xml
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <!-- hive Output column name when querying --> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <!-- Display current database name --> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> </configuration>
Note: there is no need to deploy hive on the node, as long as you can connect to the metadata of hive!
2. Start the thriftserver service
[hadoop@hadoop003 spark]$ ./sbin/start-thriftserver.sh --jars ~/softwares/mysql-connector-java-5.1.47.jar starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out
Check the log to make sure that the thriftserver service starts normally
[hadoop@hadoop003 spark]$ tail -50f /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out 19/05/21 09:39:14 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 19/05/21 09:39:15 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 19/05/21 09:39:15 INFO metastore.ObjectStore: Initialized ObjectStore 19/05/21 09:39:15 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 19/05/21 09:39:15 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException 19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added admin role in metastore 19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added public role in metastore 19/05/21 09:39:15 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_all_databases 19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_all_databases 19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=* 19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_functions: db=default pat=* 19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/73df82dd-1fd3-4dd5-97f1-680d53bd44bc_resources 19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc 19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc 19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc/_tmp_space.db 19/05/21 09:39:16 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/home/hadoop/app/spark-2.4.2-bin-hadoop-2.6.0-cdh5.7.0/spark-warehouse 19/05/21 09:39:16 INFO session.SessionManager: Operation log root directory is created: /tmp/hadoop/operation_logs 19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread pool size: 100 19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread wait queue size: 100 19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread keepalive time: 10 seconds 19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is inited. 19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is inited. 19/05/21 09:39:16 INFO service.AbstractService: Service: CLIService is inited. 19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is inited. 19/05/21 09:39:16 INFO service.AbstractService: Service: HiveServer2 is inited. 19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is started. 19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is started. 19/05/21 09:39:16 INFO service.AbstractService: Service:CLIService is started. 19/05/21 09:39:16 INFO metastore.ObjectStore: ObjectStore, initialize called 19/05/21 09:39:16 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 19/05/21 09:39:16 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 19/05/21 09:39:16 INFO metastore.ObjectStore: Initialized ObjectStore 19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: get_databases: default 19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_databases: default 19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Shutting down the object store... 19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store... 19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete. 19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete. 19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is started. 19/05/21 09:39:16 INFO service.AbstractService: Service:HiveServer2 is started. 19/05/21 09:39:16 INFO thriftserver.HiveThriftServer2: HiveThriftServer2 started 19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a7c72af{/sqlserver,null,AVAILABLE,@Spark} 19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e24bab6{/sqlserver/json,null,AVAILABLE,@Spark} 19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@391e2a{/sqlserver/session,null,AVAILABLE,@Spark} 19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@641cea11{/sqlserver/session/json,null,AVAILABLE,@Spark} 19/05/21 09:39:16 INFO thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads#Flag start successful
III. start beeline
[hadoop@hadoop003 spark]$ ./bin/beeline -u jdbc:hive2://localhost:10000 -n hadoop Connecting to jdbc:hive2://localhost:10000 19/05/21 09:46:19 INFO jdbc.Utils: Supplied authorities: localhost:10000 19/05/21 09:46:19 INFO jdbc.Utils: Resolved authority: localhost:10000 19/05/21 09:46:19 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.4.2) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:10000> select * from student.student limit 5; +---------+-----------+-----------------+--------------------------------------------+--+ | stu_id | stu_name | stu_phone_num | stu_email | +---------+-----------+-----------------+--------------------------------------------+--+ | 1 | Burke | 1-300-746-8446 | ullamcorper.velit.in@ametnullaDonec.co.uk | | 2 | Kamal | 1-668-571-5046 | pede.Suspendisse@interdumenim.edu | | 3 | Olga | 1-956-311-1686 | Aenean.eget.metus@dictumcursusNunc.edu | | 4 | Belle | 1-246-894-6340 | vitae.aliquet.nec@neque.co.uk | | 5 | Trevor | 1-300-527-4967 | dapibus.id@acturpisegestas.net | +---------+-----------+-----------------+--------------------------------------------+--+ 5 rows selected (3.275 seconds) 0: jdbc:hive2://localhost:10000>
Startup success
Four, pay attention to
1. It is better to start beeline in spark/bin directory
Because if you start spark beeline and deploy hive, it happens that your hive environment variable is just before the spark environment variable, it is likely that you start the bee line of hive
For example:
[hadoop@hadoop003 spark]$ beeline ls: cannot access /home/hadoop/app/spark/lib/spark-assembly-*.jar: No such file or directory which: no hbase in (/home/hadoop/app/hive/bin:/home/hadoop/app/spark/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0//bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0//sbin:/home/hadoop/app/zookeeper/bin:/usr/java/jdk1.8.0_131/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin) Beeline version 1.1.0-cdh5.7.0 by Apache Hive # Isn't this hive beeline>
Now you can view the environment variables
[hadoop@hadoop003 spark]$ cat ~/.bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH #####JAVA_HOME##### export JAVA_HOME=/usr/java/jdk1.8.0_131 ####ZOOKEEPER_HOME#### export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper #####HADOOP_HOME###### export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/ export SPARK_HOME=/home/hadoop/app/spark #####HIVE_HOME##### export HIVE_HOME=/home/hadoop/app/hive export PATH=$HIVE_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH
Sure enough, if you don't specify the beeline path, you will give priority to hive's beeline