Recent projects, all is normal after online, after a period of time administrators feedback users to export EXCEL error report, the front desk access user list does not display, find the problem is that Weixin nickname, emoji expression caused the error.
Introduction to emoji expression
Because the emoji expression in the wechat interface uses UTF-8 binary string and does not decode, it is shown that when receiving the emoji expression sent by the wechat end user, it is displayed as a square character or a character that cannot be displayed, and then it needs to be transcoded.
In fact, each emoji expression has a corresponding unicode code code. When parsing emoji expression characters in the text sent by users to public numbers, we can match or store emoji expressions in information according to unicode code code code. Similarly, when sending text messages containing emoji expressions to users, we can binary transcode the emoji expression characters according to unicode code code code before sending them.
Find all kinds of online, all PHP and JAVA to test, did not solve the problem, pit ~, continue to look for, and then transform and consult friends to solve this problem.
The simple and crude method I used filtered the emoji code directly, and no mistake was found for the time being.
1 #region Remove emoticons 2 /// <summary> 3 /// Remove emoticons 4 /// </summary> 5 /// <param name="codePoint"></param> 6 /// <returns></returns> 7 public static bool isEmojiCharacter(char codePoint) 8 { 9 return (codePoint >= 0x2600 && codePoint <= 0x27BF) // Miscellaneous Symbols and Symbolic Fonts 10 || codePoint == 0x303D 11 || codePoint == 0x2049 12 || codePoint == 0x203C 13 || (codePoint >= 0x2000 && codePoint <= 0x200F) // 14 || (codePoint >= 0x2028 && codePoint <= 0x202F) // 15 || codePoint == 0x205F // 16 || (codePoint >= 0x2065 && codePoint <= 0x206F) // 17 /* Punctuation occupied area */ 18 || (codePoint >= 0x2100 && codePoint <= 0x214F) // Alphabetic symbol 19 || (codePoint >= 0x2300 && codePoint <= 0x23FF) // Various technical symbols 20 || (codePoint >= 0x2B00 && codePoint <= 0x2BFF) // Arrow A 21 || (codePoint >= 0x2900 && codePoint <= 0x297F) // Arrow B 22 || (codePoint >= 0x3200 && codePoint <= 0x32FF) // Chinese symbols 23 || (codePoint >= 0xD800 && codePoint <= 0xDFFF) // High and low substitutes reserved region 24 || (codePoint >= 0xE000 && codePoint <= 0xF8FF) // Private Reserved Areas 25 || (codePoint >= 0xFE00 && codePoint <= 0xFE0F) // Variant selector 26 // || (codePoint >= U + 2600 && codePoint <= 0xFE0F) 27 || codePoint >= 0x10000; // Plane Above the second plane, char Neither can be saved, all can be transferred. 28 29 } 30 /// <summary> 31 /// Check if there is emoji character 32 /// </summary> 33 /// <param name="source"></param> 34 /// <returns></returns> 35 public static bool containsEmoji(String source) 36 { 37 if (string.IsNullOrEmpty(source)) 38 { 39 return false; 40 } 41 42 int len = source.Length; 43 44 for (int i = 0; i < len; i++) 45 { 46 char codePoint = source[i]; 47 48 if (isEmojiCharacter(codePoint)) 49 { 50 //do nothing,Judgment here shows that the confirmation of emotive characters 51 return true; 52 } 53 } 54 55 return false; 56 } 57 /// <summary> 58 /// filter emoji Characters of other non-literal types 59 /// </summary> 60 /// <param name="source">param source</param> 61 /// <returns></returns> 62 public static String filterEmoji(String source) 63 { 64 if(string.IsNullOrWhiteSpace(source)) 65 { 66 return ""; 67 } 68 source = source.Replace("[^\\u0000-\\uFFFF]", "").Replace("??", ""); 69 if (!containsEmoji(source)) 70 { 71 return source; //If not, return directly 72 } 73 //So here it must contain 74 StringBuilder buf = null; 75 76 int len = source.Length; 77 78 for (int i = 0; i < len; i++) 79 { 80 char codePoint = source[i]; 81 82 if (!isEmojiCharacter(codePoint)) 83 { 84 if (buf == null) 85 { 86 buf = new StringBuilder(source.Length); 87 } 88 89 buf.Append(codePoint); 90 } 91 else 92 { 93 } 94 } 95 96 if (buf == null) 97 { 98 return source; //If not found emoji The expression returns the source string 99 } 100 else 101 { 102 if (buf.Length == len) 103 { 104 //The point here is to do as little as possible. toString,Because strings are regenerated 105 buf = null; 106 return source; 107 } 108 else 109 { 110 return buf.ToString(); 111 } 112 } 113 114 } 115 #endregion
Reception
Success...
At this point, the problem of emoji expression, which is the nickname of Weixin, is solved. The special expression causes the list not to be displayed and the EXCEL error is exported.
Although the code is not the most perfect, there is room for optimization. Thank you very much for "burning ice".