The influence of firewall on ceph

Keywords: osd firewall Ceph Python

When we deploy ceph, we mostly need to close the firewall. What happens if we don't close the firewall?

Previously, a CEPH cluster was deployed remotely to the company, and openshift was deployed on three nodes. There are many problems when deploying ceph. The most painful thing is that osd is hung up, but the health check shows err, osd 0 in 0 on. It was only later that the problem was solved. The problem is that the firewall cannot be closed on a node with openshift installed. Several ports are not accessible.

Now let's experiment with the phenomenon of not closing the firewall.

1 I created three pool s

rep1. 1 copy

rep2. 2 copies

rep3. 3 copies

[root@admin ~]# ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED 
    89044M     72066M       16977M         19.07 
POOLS:
    NAME      ID     USED     %USED     MAX AVAIL     OBJECTS 
    rbd       0      500M      2.17        22537M           2 
    pool0     1         0         0        22537M           1 
    rep1      5         0         0        67612M           1 
    rep2      6         0         0        33806M           1 
    rep3      7         0         0        22537M           0

[root@admin ~]# ceph osd pool get rep1 size
size: 1
[root@admin ~]# ceph osd pool get rep2 size
size: 2
[root@admin ~]# ceph osd pool get rep3 size
size: 3

# The number of copies can be checked as well.
[root@admin ~]# ceph osd pool ls detail|grep rep
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 115 flags hashpspool stripe_width 0
pool 1 'pool0' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 31 flags hashpspool stripe_width 0
pool 5 'rep1' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 233 flags hashpspool stripe_width 0
pool 6 'rep2' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 235 flags hashpspool stripe_width 0
pool 7 'rep3' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 237 flags hashpspool stripe_width 0

Several new objects have been built in rep1 rep2 pools, and I have found several typical objects to experiment with.

[root@admin ~]# ceph osd map rep1 file
osdmap e237 pool 'rep1' (5) object 'file' -> pg 5.2e6fb49a (5.1a) -> up ([6], p6) acting ([6], p6)
[root@admin ~]# ceph osd map rep2 file2
osdmap e237 pool 'rep2' (6) object 'file2' -> pg 6.be5b00c1 (6.1) -> up ([1,6], p1) acting ([1,6], p1)

As can be seen from the above, since there is only one copy in rep 1, pg5.1a is only placed in osd.6, the main PG has only one copy, rep2 has two copies, pg6.1 is placed in osd.1 and osd.6, and the main PG is placed in osd.1. Let's move on to see which nodes osd.1 and osd.6 are on.

[root@admin ~]# ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.08487 root default                                     
-2 0.02829     host admin                                   
 6 0.02829         osd.6       up  1.00000          1.00000 
-3 0.02829     host node1                                   
 0 0.02829         osd.0       up  1.00000          1.00000 
-4 0.02829     host node2                                   
 1 0.02829         osd.1       up  1.00000          1.00000

We see osd.6 at admin node and osd.1 at node2 node.

Now let's start the experiment:

1 Open admin firewall

[root@admin ~]# systemctl start firewalld
[root@admin ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-02-11 22:17:43 EST; 15s ago
 Main PID: 15005 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─15005 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

Feb 11 22:17:42 admin systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 11 22:17:43 admin systemd[1]: Started firewalld - dynamic firewall daemon.

2 Write data to pg5.1a on admin node

[root@admin ~]# rados -p rep1 put file /etc/hosts
[root@admin ~]#

Very fast data write normally

3 We write data to pg5.1a at other nodes

[root@node2 ~]# rados -p rep1 put file /etc/ceph/ceph.conf 
^C
[root@node2 ~]# 
#Waiting for 10 minutes without writing in, stuck to death

Why? Because we opened admin's firewall, and osd.6 is on admin, we can write data to pg5.1a in osd.6 without any problem, but we can't write data from other nodes to pg5.1a in osd.6 because the firewall of admin intercepts the data.

4 We write data from admin node to pg6.1 in rep2

[root@admin ~]# rados -p rep2 put file /etc/hosts
[root@admin ~]# 
#Normal write

5 We write data from other nodes to pg6.1 in rep2

[root@node2 ~]# rados -p rep2 put www /etc/hosts
[root@node2 ~]#
# Can also write normally

Why is that? Isn't osd.6 blocked by its node admin firewall? Why is pg6.1 also on osd.6 and why can it be written normally? This is because although pg6.1 on rep1 is also on osd.6, its main PG is osd.1, and its osd.1 node is node2. Its firewall is not open, so it can be written in.

Next we will explain the relationship between the main PG and the secondary PG. Write data values directly into the main pg, and then copy them from the main PG through the secondary PG. So we can understand why pg6.1 can be written normally, because the main PG 6.1 is written directly in osd.1, and then the PG on the secondary pg osd.6 is copied from the main pg, although the firewall on osd.6 is opened. At that time, the connection between them is from the inside of the firewall to the outside, so it is not affected by the firewall. Then I can guess that if the main PG is osd.6, it will not write properly.

Other OSDs contact you. Your firewall opens and refuses to communicate with other osd s. Other OSDs think you're dead and report to mon. The status is down. But you see, I'm still alive and well. You report to mon again and write up. Then other OSDs contact you again and again, and so on.

So when you can rest assured that osd keeps going down and up, it's probably a firewall problem.

If the osd reporting interval is long, you will see that the osd is up consistently, but you just can't write data, especially in production.

Posted by spheonix on Thu, 16 May 2019 21:24:05 -0700