Integrating spark Ranger with Submarin

preface

This article summarizes that if you use Submarin to integrate spark ranger, you can control the permissions of spark sql through ranger

premise

Spark, Hive, kerberos and ranger have been installed, and Hive has integrated Ranger. The environment in this article is based on Ambari

Submarine spark security plug-in packaging

Official website document https://submarine.apache.org/docs/userDocs/submarine-security/spark-security/README (version 0.6.0)

git clone https://github.com/apache/submarine

There is no submarine spark security module in the current master branch (0.7.0). You need to switch to tag:release-0.6.0-RC0

Then use the mvn command to package

mvn clean package -Dmaven.javadoc.skip=true -DskipTests -pl :submarine-spark-security -Pspark-2.4 -Pranger-1.2

Type submarine-spark-security-0.6.0.jar in the directory submarine \ submarine-security \ spark security \ target

Then upload submarine-spark-security-0.6.0.jar to $SPARK_HOME/jars

Spark Server policy is added to Ranger Admin interface

Address: http://yourIp:16080/)

Under the original Hive module, add a sparkServer policy with a custom name

The following three key s must be filled in spark. If they are not filled in, there may be problems. As for whether they are required and what the specific meaning is, there is no research yet

tag.download.auth.users
policy.download.auth.users
policy.grantrevoke.auth.users

jdbc.url is the same as that in hive policy, and then test the connection. After completion, it is shown in the figure below

In order to test the effect, we delete other policies first and only keep all database, table and column. This is the default and has spark users

Configure Spark

According to the official website documents, in $SPARK_HOME/conf creates the following two configuration files

ranger-spark-security.xml

<configuration>

    <property>
        <name>ranger.plugin.spark.policy.rest.url</name>
        <value>http://yourIp:16080</value>
    </property>

    <property>
        <name>ranger.plugin.spark.service.name</name>
        <value>sparkServer</value>
    </property>

    <property>
        <name>ranger.plugin.spark.policy.cache.dir</name>
        <value>/etc/ranger/sparkServer/policycache</value>
    </property>

    <property>
        <name>ranger.plugin.spark.policy.pollIntervalMs</name>
        <value>30000</value>
    </property>

    <property>
        <name>ranger.plugin.spark.policy.source.impl</name>
        <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
    </property>

</configuration>

ranger-spark-audit.xml

<configuration>

    <property>
        <name>xasecure.audit.is.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>xasecure.audit.destination.db</name>
        <value>false</value>
    </property>

    <property>
        <name>xasecure.audit.destination.db.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>xasecure.audit.destination.db.jdbc.url</name>
        <value>jdbc:mysql://yourIp:3306/ranger</value>
    </property>

    <property>
        <name>xasecure.audit.destination.db.password</name>
        <value>ranger</value>
    </property>

    <property>
        <name>xasecure.audit.destination.db.user</name>
        <value>ranger</value>
    </property>

   <property>
      <name>xasecure.audit.jaas.Client.option.keyTab</name>
      <value>/etc/security/keytabs/hive.service.keytab</value>
    </property>

    <property>
      <name>xasecure.audit.jaas.Client.option.principal</name>
      <value>hive/_HOST@INDATA.COM</value>
    </property>

</configuration>

The specific ip, port, user name and password information can be viewed in the previously configured hive Ranger plug-in configuration file or searched in the ambari interface

Start spark SQL sparkserver

Spark-SQL

spark-sql --master yarn --deploy-mode client --conf 'spark.sql.extensions=org.apache.submarine.spark.security.api.RangerSparkSQLExtension' --principal spark/youIp@INDATA.COM --keytab /etc/security/keytabs/spark.service.keytab

SparkServer

spark-submit --master yarn --deploy-mode client --executor-memory 2G --num-executors 3 --executor-cores 2 --driver-memory 4G --driver-cores 2 --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift-11111  --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --principal spark/yourIp --keytab /etc/security/keytabs/spark.service.keytab --hiveconf hive.server2.thrift.http.port=11111

Note that you need to add submarine extensions and authenticate kerberos with spark user, because ranger controls permissions through kerberos authenticated users Here, spark SQL is taken as an example to demonstrate. After starting spark SQL, execute show tables, which will read the sparkServer policy in the ranger according to the configuration file. If the reading and parsing are successful, the corresponding json file will be generated in the configured cache directory, such as:

ll /etc/ranger/sparkServer/policycache/
total 28
-rw-r--r-- 1 root root 27223 Dec  2 09:52 sparkSql_sparkServer.json

At this time, you can see that spark has permission to read table information

Then delete the spark user of the spark server policy in the ranger. After a few seconds, we can see that spark SQL has refreshed the ranger policy and then show tables. At this time, we have no permission to read the table information

Here, we have verified that we can control the permissions of spark sql through ranger. Later, we can add back the spark user to verify whether the permissions are available again, and we can modify the policies, such as which tables can be viewed, select update and other operation permissions Of course, Spark Server should be used formally

problem

Other kerberos users

When we authenticate other kerberos users, although we can control permissions, exceptions will be thrown. Although Spark Server has no exceptions, it cannot control the permissions of other users

Extend multiple spark.sql.extensions

Because we have extended submarine through spark. SQL. Extensions = org. Apache. Submarine. Spark. Security. API. Rangersmarksqlextension. If hudi is extended at the same time, use spark SQL Conflict creating hudi table

Posted by Fed51 on Thu, 02 Dec 2021 21:04:37 -0800