This is an overview of Japanese Encoding Standards. Currently, there are three primary Japanese encoding standards in use to process Japanese characters:
  • JIS – Japanese Industrial Standard
  • Shift-JIS
  • EUC-JP – Extended Unix Code

Japanese encoding standards require 2-bytes, as opposed Western languages which are based on 1-byte encoding. In addition to these three, another international standard of growing importance is Unicode, designed by the Unicode Consortium. It can be used to represent most of the world's languages. Unicode unifies Chinese characters (kanji) used in traditional and simplified Chinese as well as in Japanese and Korean. This unified set of kanji is referred to as CJK.

JIS

The Japanese Industrial Standard uses 7-bit bytes and works with ASCII characters as well as with escape sequences to deliminate Japanese from other languages. It is mostly used for network transmissions such as sending and receiving email or network news since many networks do not read the eighth bit of 8-bit bytes. Japanese email clients on a Japanese operating system will automatically convert messages into JIS and back. While most modern browsers recognise all three encoding types ("Auto-Detect"), JIS will alert the browser to switch to Japanese. ISO-2022-JP(JIS) encoding defines a standard way to send data in multiple characters sets when the transmission medium supports7-bit bytes.

Shift-JIS

Initially developed by Microsoft, Shift-JIS (also known as SJIS, X-SJIS or MS Kanji) is mainly used for internal computer coding in PCs and Macs. It uses 8-bit bytes, resulting in double-byte dependencies: a given byte may be a single byte ASCII character meant to stand alone, or it may be the second byte of a 2-byte character, meant to be read together with the other byte (especially problematic if the eighth bit has been cut off by the network, as mentioned above).

EUC-JP

Extended UNIX Code (formerly also called X-EUC-JP) is commonly used on Japanese UNIX systems. Web pages that reside on UNIX systems are often encoded in EUC. EUC is very similar to JIS without the escape sequences, and the 8th bit turned on in encoded bytes. It is highly recommended to use EUC-JP together with PHP and MySQL. Last, but not least, XML will only support EUC-JP.

Links: