Binary conversion of Unicode code (Java)

Xiangxi assassin Wang Hu 2021-02-23 16:49:20
binary conversion unicode code java

The content is personal learning experience , There is not much guarantee of accuracy , I hope you can give me some advice on the mistakes .

Sometimes we come across some \u Starting string , We know these are Unicode code , A group of \uxxxx The string corresponds to a Unicode character . What is the actual binary storage format of these coded characters ?
We know Unicode Coding can present most of the text in the world , And in its most common way of encoding UTF-8
) Next , The storage length of a single character is 1-4 byte ( variable ), The origin and advantages of this kind of design will not be mentioned much , Here we mainly talk about what we saw \u The conversion between code string and binary .
stay UTF-8 coded java Under the code , Yes “ test ” Two words print its bytes and characters and the result is as follows :

String s = " test ";
byte[] bs = s.getBytes();
6d4b 8bd5
[-26, -75, -117, -24, -81, -107] */

The observation shows that ,“ test ” Two words in UTF-8 Six bytes under the encoding , take 【-26, -75, -117, -24, -81, -107】 6 To binary complement format , Or get “ test ” Binary storage of two words , by :
11100110 10110101 10001011 11101000 10101111 10010101
And by char.ToHexString Got 6d4b 8bd5 It's the combination of these two words Unicode code
How are the two related ?
adopt UTF-8 The encyclopedia page of is as follows :

UTF-8 The meaning of encoded bytes
  • about UTF-8 Any byte in the encoding B, If B The first one in the world is 0, be B Independently represent a character (ASCII code );
  • If B The first one in the world is 1, The second is 0, be B Is a byte in a multibyte character ( Not ASCII character );
  • If B The top two are 1, The third is 0, be B Is the first byte of a character represented by two bytes ;
  • If B The top three are 1, The fourth is 0, be B Is the first byte of a character represented by three bytes ;
  • If B The top four are 1, The fifth is 0, be B Is the first byte of a character represented by four bytes ;

therefore , For the binary string obtained above , Every time 8 The front part of the bit is used for marking ,1110 The beginning indicates the need for 3 Bytes to describe the current character , And the current byte is 3 The first part of the byte , The following bytes use 10 The beginning indicates that it is the last part of the current character encoding string .
Mark the first three bytes as remove and merge , obtain 0110 110101 001011, and “ measuring ” The word 16 Base number Unicode Code to binary , It is 0110 1101 0100 1011.
The advantages of this are obvious , Easy to expand ( It seems to support 8 Byte encoding ), The coding structure removes the binary tag bits , Smaller size makes data transmission easier .1 Bytes of UTF-8 Code is also fully compatible with ASCII code , therefore UTF-8 It can be said that it should be the best choice in most scenarios .

本文为[Xiangxi assassin Wang Hu]所创,转载请带上原文链接,感谢

  1. k8s-prometheus
  2. Linux Disk Command
  3. Linux FS
  4. 使用docker-compose &WordPress建站
  5. Linux Command
  6. This time, thoroughly grasp the depth of JavaScript copy
  7. Linux Disk Command
  8. Linux FS
  9. Using docker compose & WordPress to build a website
  10. Linux Command
  11. 摊牌了,我 HTTP 功底贼好!
  12. shiro 报 Submitted credentials for token
  13. It's a showdown. I'm good at it!
  14. Shiro submitted credentials for token
  15. Linux Stress test
  16. Linux Root Disk Extension
  17. Linux Stress test
  18. Linux Root Disk Extension
  19. Redis高级客户端Lettuce详解
  20. springboot学习-综合运用(一)
  21. 忘记云服务器上MySQL数据库的root密码时如何重置密码?
  22. Detailed explanation of lettuce, an advanced client of redis
  23. Springboot learning integrated application (1)
  24. Linux File Recover
  25. Linux-Security
  26. How to reset the password when you forget the root password of MySQL database on the cloud server?
  27. Linux File Recover
  28. Linux-Security
  29. LiteOS:盘点那些重要的数据结构
  30. Linux Memory
  31. Liteos: inventory those important data structures
  32. Linux Memory
  33. 手把手教你使用IDEA2020创建SpringBoot项目
  34. Hand in hand to teach you how to create a springboot project with idea2020
  35. spring boot 整合swagger2生成API文档
  36. Spring boot integrates swagger2 to generate API documents
  37. linux操作系统重启后 解决nginx的pid消失问题
  38. Solve the problem of nginx PID disappearing after Linux operating system restart
  39. JAVA版本号含义
  40. The meaning of java version number
  41. 开源办公开发平台丨Mysql5.7两套四节点主从结构环境搭建教程(二)
  42. 开源办公开发平台丨Mysql5.7两套四节点主从结构环境搭建教程(一)
  43. Open source office development platform mysql5.7 two sets of four node master-slave structure environment building tutorial (2)
  44. HTTP的“无状态”和REST的“状态转换”
  45. Open source office development platform mysql5.7 two sets of four node master-slave structure environment building tutorial (1)
  46. 【大数据哔哔集20210128】使用Hive计算环比和同比
  47. 【大数据哔哔集20210125】Kafka将逐步弃用对zookeeper的依赖
  48. 【大数据哔哔集20210124】有人问我Kafka Leader选举?我真没慌
  49. 【大数据哔哔集20210123】别问,问就是Kafka高可靠
  50. Spring 事务、异步和循环依赖有什么关系?
  51. Spring 动态代理时是如何解决循环依赖的?为什么要使用三级缓存?
  52. "Stateless" of HTTP and "state transition" of rest
  53. [big data bibiji 20210128] use hive to calculate month on month and year on year
  54. [big data bibiji 20210125] Kafka will gradually abandon its dependence on zookeeper
  55. [big data beeps 20210124] someone asked me about Kafka leader election? I'm not in a panic
  56. [big data bibiji 20210123] don't ask, ask is Kafka highly reliable
  57. jQuery Gantt Package 在Visual Studio中创建一个新的ASP.NET项目
  58. What is the relationship between spring transactions, asynchrony, and circular dependencies?
  59. How to solve circular dependency in spring dynamic proxy? Why use level 3 caching?
  60. Unicode码的二进制转换(Java)