I found that this is the most clear article about the word set .

1. summary
This article mainly includes the following aspects : Coding Basics , java, Systems software ,url, Tools, software, etc .
In the following description , Will be with " chinese " Two words for example , We can know it by looking up the table GB2312 Encoding is "d6d0cec4",Unicode Encoded as "4e2d 6587",UTF The encoding is "e4b8ad e69687". Be careful , There is no such thing as iso8859-1 code , But you can use iso8859-1 Code to " Express ".
2. Coding Basics
The first coding was iso8859-1, and ascii Coding is similar . But for the convenience of expressing all kinds of languages , A lot of standard codes are emerging , The important ones are as follows .
2.1. iso8859-1
It belongs to single byte encoding , The maximum range of characters that can be represented is 0-255, Apply to English Series . such as , Letter 'a' The code of is 0x61=97.
Obviously ,iso8859-1 Encoding represents a narrow range of characters , Unable to represent Chinese characters . however , Because it's a single byte code , Consistent with the most basic representation of a computer , So a lot of times , Still use iso8859-1 Code to represent . And in many agreements , The code is used by default .
2.2. GB2312/GBK
This is the national standard code of Chinese characters , It is specially used to represent Chinese characters , It's a double byte code , And English letters and iso8859-1 Agreement ( compatible iso8859-1 code ). among gbk Coding can be used to represent both traditional and simplified characters , and gb2312 It can only mean simplified characters ,gbk Is compatible gb2312 Coded .
2.3. unicode
This is the most uniform code , Characters that can be used to represent all languages , And it's a fixed length double byte ( There are also four byte ) code , Including English letters . So it's not compatible iso8859-1 Coded , It's not compatible with any coding . however , be relative to iso8859-1 encoding ,uniocode The code just added a 0 byte , Such as the letter 'a' by "00 61".
It should be noted that , Fixed length coding is convenient for computer processing ( Be careful GB2312/GBK It's not a fixed length code ), and unicode It can also be used to represent all characters , So it's used in many software unicode Code to handle , such as java.
2.4. UTF
in consideration of unicode Incompatible encoding iso8859-1 code , And it's easy to take up more space : Because for English letters ,unicode It also needs two bytes to represent . therefore unicode Not easy to transfer and store . As a result, there is utf code ,utf Encoding compatibility iso8859-1 code , It can also be used to represent characters in all languages , however ,utf Coding is variable length coding , The length of each character is from 1-6 Different bytes . in addition ,utf The code has a simple verification function . In general , English letters are represented by a byte , Chinese characters use three bytes .
Be careful , Although I say utf It's for less space , But that's only relative to unicode encoding , If you already know it's Chinese characters , Then use GB2312/GBK No doubt the most economical . But on the other hand , It's worth noting that , although utf The code uses 3 Bytes , But even for Chinese pages ,utf The code will also compare unicode Coding savings , Because the web page contains a lot of English characters .
3. java Processing of characters
stay java In application software , There will be many things that involve character set coding , Some places need to be set up correctly , Some places need to be dealt with to some extent .
3.1. getBytes(charset)
This is a java A standard function for string processing , Its function is to follow the character represented by the string charset code , And in bytes . Note that the string is in java Always press... In memory unicode Code stored . such as " chinese ", Under normal circumstances ( When there is no mistake ) Stored as "4e2d 6587", If charset by "gbk", Is encoded as "d6d0 cec4", Then return bytes "d6 d0 ce c4". If charset by "utf8" And finally "e4 b8 ad e6 96 87". If it is "iso8859-1", Because it can't be encoded , Finally back to "3f 3f"( Two question marks ).
3.2. new String(charset)
This is a java Another standard function for string processing , It's the opposite of the previous function , Set the byte array according to charset Code for combination identification , And then finally convert to unicode Storage . Refer to the above getBytes Example ,"gbk" and "utf8" You can get the right results "4e2d6587", but iso8859-1 And it turns into "003f 003f"( Two question marks ).
because utf8 It can be used to express / Code all characters , therefore new String( str.getB+ytes( "utf8" ), "utf8") === str, It's completely reversible .
3.3. setCharacterEncoding()
This function is used to set http Request or corresponding code .
about request, The code of the submitted content , When specified, you can go through getParameter() Get the correct string directly , If you don't specify , It is used by default iso8859-1 code , Need to be dealt with further . See below " Form input ". It's worth noting that the setCharacterEncoding() Before , Can't do anything getParameter().javadoc Above description :This method must be called prior toreading request parameters or reading input using getReader(). and , The designation is only for POST Effective method , Yes GET method is invalid . The analysis reason , It should be the first getParameter() When ,java All submissions will be analyzed by coding , And the subsequent getParameter() No more analysis , therefore setCharacterEncoding() Invalid . And for GET Method submit form is , The submission is in URL in , All submissions have been analyzed by coding from the beginning ,setCharacterEncoding() It's no use .

【 turn 】Java More articles on character set in programming

  1. Java Research on character set of programming

    1. summary This article mainly includes the following aspects : Coding Basics ,java, Systems software ,url, Tools, software, etc . In the following description , Will be with " chinese " Two words for example , We can know it by looking up the table GB2312 Encoding is " ...

  2. Java Programming idea —— The first 17 Chapter In depth study of container Reading notes ( 3、 ... and )

    7、 ... and . queue line up , fifo . In addition to concurrent applications Queue There are only two realizations :LinkedList,PriorityQueue. Their difference is in order, not performance . Some common methods : Inherited from Collection Methods : ad ...

  3. Java Programming idea —— The first 17 Chapter In depth study of container (two)

    6、 ... and . queue line up , fifo . In addition to concurrent applications Queue There are only two realizations :LinkedList,PriorityQueue. Their difference is in order, not performance . Some common methods : Inherited from Collection Methods : add ...

  4. Java Introduction to programming ( glossary )

    abstract class (abstract class): Abstract class cannot create object , Mainly used to create subclasses .Java The abstract classes in use abstract Modifier definition . Abstract data types (abstract data type ADT): abstract ...

  5. Java Introduction to programming ( glossary )

    abstract class (abstract class): Abstract class cannot create object , Mainly used to create subclasses .Java The abstract classes in use abstract Modifier definition . Abstract data types (abstract data type ADT): abstract ...

  6. JAVA Programming specification ( Next )

    JAVA Programming specification ( Next ) 2016-03-27 6. Code formatting 6.1 When formatting code , The purpose to achieve 1.     Through code segmentation, we can successfully block and easily understand code segments , Make the code easier to read and understand : 2.     ...

  7. Java Programming idea (11~17)

    [ notes : This blog is designed from <Java Programming idea > This book's catalog structure to test its own Java Basic knowledge of , Just for notes ] Chapter 11 Possession object 11.1 Generic and type safe containers >eg: List<St ...

  8. Big data reveals 10 Common JAVA Programming error

    What are the most common programming mistakes that beginners make ? It's possible that they're always confused (==) And assignment (=), perhaps  &  and  &&: It's also possible that they use the wrong separator in the loop (for (int i = 0, i &l ...

  9. Java Programming ideas, reading harvest

    15 year 8 I bought one in September Java Programming thought fourth edition Chinese Edition . I bought the Chinese version because I read the English version of my colleagues and found that my English level is still limited , Although we know the words , But I'm not sure about many sentences , It's too slow to read like this , To understand English, we have to understand technology hol ...

Random recommendation

  1. SESSION and COOKIE Functions and differences of ,SESSION How information is stored , How to traverse ?

    The definition of both : When you are browsing the website ,WEB The server will send a little information to your computer first ,Cookie It will help you type the text or some choices on the website , It's all recorded . Next time you visit the same website ,WEB The server will first see if there is ...

  2. NODE Programming ( Four )-- structure Node Web Program 2

    Four . Provide static file service 1. Create a static file server __dirname , The value is the directory path where the file is located . Files scattered in different directories can have different values . /** * The most advanced ReadStream Static file server */ ...

  3. JS Ability test questions

    Find out the elements item In a given array arr Position in function indexOf(arr, item) { return arr.indexOf(item); } function indexOf( ...

  4. mysqlbinlog see binlog Log error mysqlbinlog: unknown variable &#39;default-character-set=utf8mb4&#39;

    today , see mysql slave node binlog Log information , perform mysqlbinlog Command error mysqlbinlog: unknown variable 'default-character-set ...

  5. Day5_ Modules and packages (import)(form......import....)

    Many modules are defined in one file , Then you can call these modules in another file . # The import module (import) #1, Execute source file #2, Generate a global namespace based on the source file .

  6. Redis Building the server ( primary )

    Low energy ahead , Only suitable for beginners , Get out of the way ! Preface :redis It's often used for caching ( The reason is that ), Based on the needs of learning, I built a redis The server , Considering the distributed deployment of the project , So at the beginning of the beginning ,redi ...

  7. 2018GIAC Take a look at the latest schedule of the global Internet architecture Conference Shanghai station !

    2018 year 11 month 23-24 Japan , For two days  GIAC The global Internet architecture conference will open in Shanghai .GIAC The global Internet architecture conference was held by msup And the high availability architecture technology community for architects . Years of technical leaders and high-end technology practitioners ...

  8. install OpenOffice

    The attachment : OpenOffice 1. Download decompression tar -zxvf Apache_OpenOffice_4.1.3_Linux_x86-64_install-rpm_zh-CN.tar.gz 2. install / ...

  9. IIS8.5 About “ Configuration error This configuration section cannot be used in this path ” Solutions for

    It was just installed today IIS8.5, My system is win8.1 enterprise edition . Build a simple page for debugging , But I found this mistake : Detailed error information module  IIS Web Core notice  BeginReques ...

  10. About CSS:transform、transition The use of

    this 3 The first attribute is CSS3 New properties , It's extremely powerful , Can accomplish a lot in the past JS To complete the page dynamic effect , And it's very efficient , Considering browser compatibility , Should be in 3 Attributes are prefixed with each browser . The mind map below describes 3 Each genus of a property ...