Record a production on-line problem

Keywords: Java jvm Linux Oracle

Yesterday the project came online, and after verification, I took a deep breath and went online perfectly without any problems.However, half an hour later, I received an alert from the Operations and Maintenance Team that several ports of a server were not detected.At that time, I heard that I was in a tight heart. No, we have observed the system logs after the server. All transactions are normal. If the system starts abnormally, the logs will not be printed.So I rushed to log on to the server between operations and detect the port.

       netstat -anpl | grep 1099

       netstat -anpl | grep 8900

The output shows that the port is indeed not started.But the system's ports for external services are all fine.So we found the system boot log and located the following exception:

       

So I searched the web for articles about Failed to write core dump, many of which were not consistent with the problem we encountered this time.

At the same time, we found an additional log file hs_err_pid13893.log in the startup script directory, which is the thread stack information of the JVM that had an exception at startup, as described below:

       

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007fa0a94e14e2, pid=13893, tid=0x00007f9e8fefe700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build 1.8.0_172-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libzip.so+0x124e2]  newEntry+0x62
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f9ed4111000):  JavaThread "pool-FramePools-thread2" [_thread_in_native, id=14081, stack(0x00007f9e8fdfe000,0x00007f9e8feff000)]

siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f9efc0cb0fa

Registers:
RAX=0x00007f9ee830cc60, RBX=0x00007f9f6e0e1e00, RCX=0x0000000000000000, RDX=0x00007f9ee830cc60
RSP=0x00007f9e8fefb4b0, RBP=0x00007f9e8fefb500, RSI=0x0000000000000000, RDI=0x00007f9ee8000020
R8 =0x00007f9ee8000088, R9 =0x00007f9e8fefb580, R10=0x0000000000000048, R11=0x00000000c6bfe5f0
R12=0x00007f9efc0cb0dd, R13=0x00007f9ee830cc60, R14=0x000000001cb26b69, R15=0x00007f9f6e0de110
RIP=0x00007fa0a94e14e2, EFLAGS=0x0000000000010206, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f9e8fefb4b0)
0x00007f9e8fefb4b0:   00000000c6b13fe0 0000000100000000
0x00007f9e8fefb4c0:   00007f9f6e0de1e0 00007fa0b4c1a02e
0x00007f9e8fefb4d0:   00007f9ee830cc60 00007f9f6e0e1e00
0x00007f9e8fefb4e0:   00007f9f6e0e10b0 00007f9f6e0de110
0x00007f9e8fefb4f0:   000000001cb26b69 000000000000004f
0x00007f9e8fefb500:   00007f9e8fefb560 00007fa0a94e1c3f
0x00007f9e8fefb510:   00007f9ed4111000 0100000000000000
0x00007f9e8fefb520:   00007f9e8fefb580 00007f9e8fefb5d0
0x00007f9e8fefb530:   00007f9e8fefb5ce 00007f9e8fefb580
0x00007f9e8fefb540:   000000000000004f 00007f9ed41111f8
0x00007f9e8fefb550:   00007f9e8fefb9d8 0000000000000001
0x00007f9e8fefb560:   00007f9e8fefb9c0 00007fa0a94d2e90
0x00007f9e8fefb570:   00007fa09a4cf7a7 00007f9f6e0de110
0x00007f9e8fefb580:   672f63642f6d6f63 636e616e7265766f
0x00007f9e8fefb590:   61646174656d2f65 736c706d692f6174
0x00007f9e8fefb5a0:   656d69746e75722f 5f72656b6361502f
0x00007f9e8fefb5b0:   6465525f736e6f63 56626d4362625a6d
0x00007f9e8fefb5c0:   4f534a5f6f745f31 007373616c632e4e
0x00007f9e8fefb5d0:   00007f9e8fefb650 0000000000800000
0x00007f9e8fefb5e0:   00007f9ee8000078 00007f9ee8000020
0x00007f9e8fefb5f0:   00007f9ee8000020 0000000000000000
0x00007f9e8fefb600:   0000000000008010 00000038ad67a567
0x00007f9e8fefb610:   00007f9ed4111000 0000000000008000
0x00007f9e8fefb620:   00007f9e8fefb690 00007f9e8fefb868
0x00007f9e8fefb630:   000000768fefb690 0000000000000200
0x00007f9e8fefb640:   0000000000000040 0000000000000000
0x00007f9e8fefb650:   0000000000000001 0000000000000008
0x00007f9e8fefb660:   0000000000008030 0000023000000801
0x00007f9e8fefb670:   0000007c0000009b 0000007600000078
0x00007f9e8fefb680:   00007fa09a515f20 00007f9ee8000020
0x00007f9e8fefb690:   0000000000008000 0000000000000000
0x00007f9e8fefb6a0:   0000000000000019 00000000c6bfe409 

Instructions: (pc=0x00007fa0a94e14e2)
0x00007fa0a94e14c2:   00 48 c7 40 28 00 00 00 00 41 80 7f 30 00 0f 84
0x00007fa0a94e14d2:   8a 02 00 00 4c 8b 63 08 4d 2b 67 28 4d 03 67 18
0x00007fa0a94e14e2:   41 0f b6 5c 24 1d 41 0f b6 44 24 1c c1 e3 08 09
0x00007fa0a94e14f2:   c3 41 0f b6 44 24 1e 88 45 bd 41 0f b6 54 24 20 

Register to memory mapping:

RAX=0x00007f9ee830cc60 is an unknown value
RBX=0x00007f9f6e0e1e00 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x00007f9ee830cc60 is an unknown value
RSP=0x00007f9e8fefb4b0 is pointing into the stack for thread: 0x00007f9ed4111000
RBP=0x00007f9e8fefb500 is pointing into the stack for thread: 0x00007f9ed4111000
RSI=0x0000000000000000 is an unknown value
RDI=0x00007f9ee8000020 is an unknown value
R8 =0x00007f9ee8000088 is an unknown value
R9 =0x00007f9e8fefb580 is pointing into the stack for thread: 0x00007f9ed4111000
R10=0x0000000000000048 is an unknown value
R11=

Key points are as follows:

     

The general meaning is that the java virtual machine had an original exception at startup, and the framework that caused the problem is C [libzip.so+0x124e2] newEntry+0x62.Then I searched the web for C [libzip.so+0x124e2] newEntry+0x62, and finally I found this article: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8214788 This is an exception to JVM. When java operates on zip files, it calls the libzip.so Library of the linux system. If a zip file operation is done while the JVM is running and the zip file is modified, there is a risk that the JVM will crash.Searching in oracle's java bug database found many similar problems:

     

Of course, Oracle officially offers a solution:

1. Add in the JVM startup parameter: -Dsun.zip.disableMemoryMapping=true, but this scheme has a risk that it will cause a huge performance risk for the JVM when it reads jar files on disk, provided you read them over and over again.However, when I searched the Internet for similar questions, I found that some netizens added this startup parameter but encountered other problems.

2. Upgrade to JDK9.The problem is perfectly solved in this version.

Summary: Our system loads zip files at startup. Perhaps our operators restarted the next server immediately before other servers finished startup. By chance, both servers were operating on the same zip file, which resulted in JVM not starting completely.

     

   

Posted by Pavel_Nedved on Thu, 15 Aug 2019 20:27:06 -0700