Why is + not recommended in Alibaba Java development manual for string splicing in cyclic body?

Keywords: Programming Java JDK

When reading the Alibaba Java development manual, I found that there is a suggestion about string splicing in the loop body. The specific content is as follows:

Let's first take an example to see the efficiency of string splicing with + or StringBuilder in the loop body (JDK version is jdk1.8.0_).

package com.wupx.demo;

/**
 * @author wupx
 * @date 2019/10/23
 */
public class StringConcatDemo {
    public static void main(String[] args) {
        long s1 = System.currentTimeMillis();
        new StringConcatDemo().addMethod();
        System.out.println("Use + Splicing:" + (System.currentTimeMillis() - s1));

        s1 = System.currentTimeMillis();
        new StringConcatDemo().stringBuilderMethod();
        System.out.println("Use StringBuilder Splicing:" + (System.currentTimeMillis() - s1));
    }

    public String addMethod() {
        String result = "";
        for (int i = 0; i < 100000; i++) {
            result += (i + "Wu Pu Xi");
        }
        return result;
    }

    public String stringBuilderMethod() {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            result.append(i).append("Wu Pu Xi");
        }
        return result.toString();
    }
}

The results are as follows:

Use + splice: 29282
 Splicing with StringBuilder: 4

Why are the two methods so different in time? Let's go further.

Why is StringBuilder so much faster than +?

From the bytecode level, why is string splicing in the loop body so much faster than + StringBuilder?

Use the Java C stringconcatdemo.java command to compile the source file, and use the javap -c StringConcatDemo command to view the content of the bytecode file.

The bytecode of addMethod() method is as follows:

  public java.lang.String addMethod();
    Code:
       0: ldc           #16                 // String
       2: astore_1
       3: iconst_0
       4: istore_2
       5: iload_2
       6: ldc           #17                 // int 100000
       8: if_icmpge     41
      11: new           #7                  // class java/lang/StringBuilder
      14: dup
      15: invokespecial #8                  // Method java/lang/StringBuilder."<init>":()V
      18: aload_1
      19: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: iload_2
      23: invokevirtual #18                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      26: ldc           #19                 // String wupx
      28: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      31: invokevirtual #12                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: astore_1
      35: iinc          2, 1
      38: goto          5
      41: aload_1
      42: areturn

It can be seen that lines 8 to 38 form a cycle body: make condition judgment at line 8, and if the cycle condition is not met, skip to line 41. The compiler has done some optimization, new a StringBuilder object on line 11, and then make three calls to the append() method on lines 19, 23, and 28. However, a StringBuilder object will be renewed every time.

Let's look at the bytecode of stringBuilderMethod():

  public java.lang.String stringBuilderMethod();
    Code:
       0: new           #7                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #8                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_1
       8: iconst_0
       9: istore_2
      10: iload_2
      11: ldc           #17                 // int 100000
      13: if_icmpge     33
      16: aload_1
      17: iload_2
      18: invokevirtual #18                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      21: ldc           #19                 // String wupx
      23: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      26: pop
      27: iinc          2, 1
      30: goto          10
      33: aload_1
      34: invokevirtual #12                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      37: areturn

Lines 13 to 30 form the loop body. As you can see, in line 4 (outside the loop), the StringBuilder object is built, and then the loop body only calls the append() method.

It can be seen from this that in the for loop, String splicing is carried out by + every time a StringBuilder is newly built, then String is converted to StringBuilder, and then append is carried out. Frequent new objects not only consume a lot of time, but also cause a waste of memory resources. This explains why it is not recommended to use + in the loop body for String splicing at the bytecode level.

Next let's see how to use + or StringBuilder to splice strings.

Use + concatenate string

In Java development, the simplest and commonly used string splicing method is to use + directly:

String boy = "wupx";
String girl = "huyx";
String love = boy + girl;

The content after decompilation is as follows: (the decompilation tool used is jad)

String boy = "wupx";
String girl = "huyx";
String love = (new StringBuilder()).append(boy).append(girl).toString();

By looking at the code after decompilation, we can find that in the splicing process of String constants, after converting String to StringBuilder, the append() method is used for processing.

In other words, the splicing of + to string in Java is realized by using the append() of StringBuilder. Using + to splice string is actually just a syntax sugar provided by Java.

Using StringBuilder to splice strings

The append method of StringBuilder is the second commonly used string splicing pose.

Similar to the String class, the StringBuilder class also encapsulates a character array, which is defined as follows:

char[] value;

Unlike String, it is not final, so it can be modified. In addition, unlike String, not all positions in the character array have been used. It has an instance variable that represents the number of characters used in the array. The definition is as follows:

int count;

The source code of its append() method is as follows:

public StringBuilder append(String str) {
   super.append(str);
   return this;
}

This class inherits the AbstractStringBuilder class. Look at its append() method:

public AbstractStringBuilder append(String str) {
    if (str == null)
        return appendNull();
    int len = str.length();
    ensureCapacityInternal(count + len);
    str.getChars(0, len, value, count);
    count += len;
    return this;
}

First, judge whether the spliced string str is null. If so, call the appendNull() method for processing. The source code of the appendNull() method is as follows:

private AbstractStringBuilder appendNull() {
    int c = count;
    ensureCapacityInternal(c + 4);
    final char[] value = this.value;
    value[c++] = 'n';
    value[c++] = 'u';
    value[c++] = 'l';
    value[c++] = 'l';
    count = c;
    return this;
}

If the string str is not null, then judge whether the length of the character array after splicing exceeds the current array length. If it exceeds, call the Arrays.copyOf() method to expand and copy. The source code of the ensurcapacityinternal() method is as follows:

private void ensureCapacityInternal(int minimumCapacity) {
    if (minimumCapacity - value.length > 0) {
        value = Arrays.copyOf(value,
                newCapacity(minimumCapacity));
    }
}

Finally, the concatenated string str is copied to the target array value.

str.getChars(0, len, value, count);

summary

Based on the suggestion of string splicing in the cyclic body in Alibaba Java development manual, this paper explains why StringBuilder is faster than + from the byte code level, and introduces the principle of + and StringBuilder respectively. Therefore, when splicing strings in the cyclic body, the append() of StringBuilder should be used to complete the splicing.

Posted by stueee on Wed, 23 Oct 2019 10:12:04 -0700