hue oozie stepped on the pit again, workflow,coordinator could finally run away.

Keywords: hive JDBC Hadoop HBase

The front summarizes some pits of sqoop1, oozie and HBase under hue. When the project expires today, we must work out oozie workflow and schedule execution on time to skr skr.

1. The pits of sqoop mysql imported and exported from the front have been trampled. Later, it was found that besides cdh(5.15), sqoop1 was not automatically configured. After manual configuration, some incomplete packages (such as sqoop,hbase,mysql,oozie, etc.) could be copied after installing sharelib, basically running in hue (hue can run with oozie, XML escape bug s written by Python can not be quoted with quotation marks, etc.). Start one. I couldn't find the driver, but I still couldn't find the driver after adding MySQL driver to the directories of lib libext libtools and other sqoop LIBS under OOZIE. Then I changed the agent user of core-site.xml under hdfs.

<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
<property><name>hadoop.proxyuser.oozie.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.ozzie.groups</name><value>*</value></property>

Manual configuration of sqoop1:

Sqoop 1 Client Service Environment Advanced Configuration Code Section (Safety Valve):

SQOOP_CONF_DIR=/etc/sqoop/conf
HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase
HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
ZOOCFGDIR=/opt/cloudera/parcels/CDH/lib/zookeeper

sqoop-conf/sqoop-env.sh Sqoop 1 Client Client Client Client Client Advanced Configuration Code Section (Safety Valve):

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce

#set the path to where bin/hbase is available
export HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/cloudera/parcels/CDH/lib/zookeeper

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH::/opt/cloudera/parcels/SQOOP_TERADATA_CONNECTOR1.7c5//lib/tdgssconfig.jar:/opt/cloudera/parcels/SQOOP_TERADATA_CONNECTOR-1.7c5//lib/terajdbc4.jar:


export  SQOOP_CONF_DIR=/etc/sqoop/conf

2. Import because the relational database has been modified, so import to hbase to solve the data duplication after data modification incremental synchronization to ensure data consistency, and then some types of conversion, such as int float (11,2) datetime to java type, especially int, some relational databases are tinyint does not transfer to hbase will become true false, and then in the hive data warehouse, it is not wrong, to imp The ala prompt

3. Data is available, hive builds hbase appearance into warehouse, and then starts to calculate. Because of incremental calculation, it is necessary to delete the creation table and export it to Mysql at regular intervals. The exported pit also says that null transformation parameter of - columns type and custom class of thoroughly solving type problem. Another pit is that because the export table is in hdfs, each person has different operation rights. Although there is only share function in hue, it can only script. Sharing, can not be correctly executed, also reported HDFS permission error, and later the same core-site as above. XML and all the members of the group as proxy users, although not reporting errors, but there are still problems, such as dorp table, because someone created hive deleted, directory under HDFS did not delete, data creation and so on. The conclusion is that hue still runs its own programs and scripts, online copy with an account all to run together, there is no problem, can not really cooperate, only In the process of script sharing and viewing, we can not call each other in the first class. Because of different privileges, some temporary directories are created, which can not be operated by proxy users.

4. The import and output scripts SQL are available. They need to be streamed and invoked regularly, but oozie can only run on the command line like sqoop1. Later, the test can run on Workflow. According to the workspace configuration, it does not work, and the configuration is restored, so we know many configurations in HUE. Unlike open source installations, configuration files can not be changed directly, because they may not use the configuration, but have their own directory, some key parameters have default values, such as: oozie. wf. application.path oozie.coord.application.path, etc., and the cornd scheduling generated from the menu of the workflow design page can not be submitted at all. Show undefined-! After eliminating all the other problems mentioned above, I ran OOZIE-examples from scratch. As a result, I could still run under the command line. The copied examples couldn't run in the hue. I tried all kinds of ways. I watched the OZIE principle and operation several times and had to mention: http://shiyanjun.cn/archives/684.html.

https://www.cnblogs.com/en-heng/p/5581331.html, finally understand that workflow and cornd are the same or not including subordinate relationship, and then build a new cornd from the hue menu and select workflow, the result can run on time, I erase, the cornd adjustment in workflow is pit goods. Some posts on the Internet can't run in hue at all. They just write workflow files on the command line in hue, but in hue 4.1 (cdh5.15.1), they can't run at all. Either the parameters are heavy or your configuration will be restored after submission.

There are other pits. Most of the time zones are utc. However, Oozie Server Advanced Configuration Code Section (Safety Valve) of oozie-site.xml is changed according to the official network.

<property><name>oozie.processing.timezone</name><value>GMT+0800</value></property>

Cord became Shanghai Time:

<coordinator-app name="MY_APP" frequency="${coord:minutes(2)}" start="${start}" end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2">
   <action>
      <workflow>
         <app-path>${workflowAppUri}</app-path>
      </workflow>
   </action>
</coordinator-app>

When submitting, the prompt format should not be added +0800-!

Error: E1003 : E1003: Invalid coordinator application attributes, parameter [start] = [2018-09-19T16:35] must be Date in GMT+08:00 format (yyyy-MM-dd'T'HH:mm+0800). Parsing error java.text.ParseException: Could not parse [2018-09-19T16:35] using [yyyy-MM-dd'T'HH:mm+0800] mask
So job. properties:

oozie.use.system.libpath=true
security_enabled=False
dryrun=False
send_email=False
jobTracker=master:8032
start=2018-09-27T16:35+0800
nameNode=hdfs://master:8020
end=2018-09-27T18:35+0800
workflowAppUri=${nameNode}/user/hue/oozie/apps/sqoop#Self generatingwfDon't write these two lines at all.
oozie.coord.application.path=${nameNode}/user/hue/oozie/apps/sqoop#Self generatingwfDon't write these two lines at all.
<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="sqoop-ace0"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="sqoop-ace0">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <command> list-databases --connect jdbc:mysql://master:3306/ --username bigdata --password xxxxx </command>
        </sqoop>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

Workflow and job. properties created in hue

<workflow-app name="Workflow-1" xmlns="uri:oozie:workflow:0.5">
    <start to="hive-d593"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive-d593" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-d593.sql</script>
        </hive2>
        <ok to="hive-9e6a"/>
        <error to="Kill"/>
    </action>
    <action name="hive-9e6a" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-9e6a.sql</script>
        </hive2>
        <ok to="hive-016c"/>
        <error to="Kill"/>
    </action>
    <action name="hive-016c" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-016c.sql</script>
        </hive2>
        <ok to="hive-02ec"/>
        <error to="Kill"/>
    </action>
    <action name="hive-3c77" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-3c77.sql</script>
        </hive2>
        <ok to="hive-6ffa"/>
        <error to="Kill"/>
    </action>
    <action name="hive-02ec" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-02ec.sql</script>
        </hive2>
        <ok to="hive-3c77"/>
        <error to="Kill"/>
    </action>
    <action name="hive-6ffa" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-6ffa.sql</script>
        </hive2>
        <ok to="hive-bbd4"/>
        <error to="Kill"/>
    </action>
    <action name="hive-bbd4" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-bbd4.sql</script>
        </hive2>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
start_date=2018-09-27T17:28
end_date=2018-09-27T18:28

There are also automatic cornds.

<coordinator-app name="Schedule-1"
  frequency="0,33,40 * * * *"
  start="${start_date}" end="${end_date}" timezone="Asia/Shanghai"
  xmlns="uri:oozie:coordinator:0.2"
  >
  <controls>
    <execution>FIFO</execution>
  </controls>
  <action>
    <workflow>
      <app-path>${wf_application_path}</app-path>
      <configuration>
        <property>
            <name>oozie.use.system.libpath</name>
            <value>True</value>
        </property>
        <property>
            <name>start_date</name>
            <value>${start_date}</value>
        </property>
        <property>
            <name>end_date</name>
            <value>${end_date}</value>
        </property>
      </configuration>
   </workflow>
  </action>
</coordinator-app>
oozie.use.system.libpath True
security_enabled False
oozie.coord.application.path hdfs://master:8020/user/hue/oozie/deployments/_hue_-oozie-3509-1538040624.53
dryrun False
end_date 2018-09-27T18:28+0800
jobTracker master:8032
mapreduce.job.user.name hue
user.name hue
hue-id-c 3509
nameNode hdfs://master:8020
wf_application_path hdfs://master:8020/user/hue/oozie/workspaces/hue-oozie-1537954075.59
start_date 2018-09-27T17:28+0800

Posted by Ree on Sat, 18 May 2019 22:13:03 -0700