UTF-8 - definition of UTF-8 - synonyms, pronunciation, spelling from Free Dictionary

Search Result for "utf-8":

The Free On-line Dictionary of Computing (19 January 2023):

UTF-8

    (UCS transformation format 8) An
   ASCII-compatible multibyte Unicode and UCS encoding,
   used by Java and Plan 9.

   The Unicode character set occupies a 16-bit code space.  The
   most obvious Unicode encoding (known as UCS-2) consists of a
   sequence of 16-bit words.  Such strings can contain bytes like
   '\0' or '/' which have a special meaning in filenames and
   other C library function parameters.  In addition, the
   majority of Unix tools expects ASCII files and can't read
   16-bit words as characters without major modifications.  For
   these reasons, UCS-2 is not a suitable external encoding of
   Unicode in filenames, text files, environment variables, etc.

   The ISO 10646 Universal Character Set (UCS), a superset of
   Unicode, occupies a 31-bit code space and the obvious UCS-4
   encoding for it (a sequence of 32-bit words) has the same
   problems.

   The UTF-8 encoding of Unicode and UCS avoids the problems of
   fixed-length Unicode encodings because an ASCII file encoded
   in UTF is exactly same as the original ASCII file and all
   non-ASCII characters are guaranteed to have the most
   significant bit set (bit 0x80).  This means that normal tools
   for text searching etc. work as expected.

   UTF-8 is defined in RFC 2279.

   ["File System Safe UCS Transformation Format (FSS_UTF)",
   X/Open Preliminary Specification, X/Open Company Ltd.,
   Document Number: P316.  This information also appears in
   ISO/IEC 10646, Annex P].

   Plan 9 UTF manual entry
   (ftp://ftp.uu.net/doc/obi/Bell.Labs/plan9pm/09utf.ps.Z).

   (1998-07-29)