UTF-8 to UTF-32 Conversion

John J. Boyer director at chpi.org
Sun Apr 18 04:30:05 UTC 2004


Dave,

Thanks for the info. Did you find this on the Web? I searched for a long 
time and couldn't find it, and what information I gad was incorrect. If 
it is on the Web, what  is the URL?

John


On Sat, 17 Apr 2004, Dave Mielke wrote:

> [quoted lines by John J. Boyer on 2004/04/17 at 09:13 -0500]
> 
> >For one of my projects I need to convert UTF-8 to ?UTF-32. However, I 
> >can't find information on which bits are set in the various bytes of a 
> >multi-byte UTI-8 character. 
> 
> 0X00 through 0X7F are literal, i.e. single-byte characters.
> 
> If bit 7 is set and bit 6 is clear, i.e. the range 0X80 through 0XBF, it's a
> continuation byte containing six more bits. The first byte of a multi-byte
> character is never within this range.
> 
> If bits 7 and 6 are set but bit 5 isn't, i.e. the range 0XC0 through 0XDF, then
> it's the first 5 bits of a two-byte character. The resultant value is an 
> 11-bit character in the range 0 through 0X7FF.
> 
> Each time the first clear bit is moved one position to the right the length of
> the multi-byte character increases by one byte and the number of leading bits
> in the first byte decreases by 1. Every non-leading byte, as mentioned above,
> has bit 7 set and bit 6 clear, i.e. is within the range 0X80 through 0XBF, and
> appends six bits to the value. Here's a table to illustrate:
> 
>    First   RangeOf   NumOf  Init  Totl  MaxUnicode
>    0-Bit  FirstByte  Bytes  Bits  Bits  Character
>      7    0X00 0X7F    1      7     7   0X0000007F
>      5    0XC0 0XDF    2      5    11   0X000007FF
>      4    0XE0 0XEF    3      4    16   0X0000FFFF
>      3    0XF0 0XF7    4      3    21   0X001FFFFF
>      2    0XF8 0XFB    5      2    26   0X03FFFFFF
>      1    0XFC 0XFD    6      1    31   0X7FFFFFFF
> 
> 

-- 
John J. Boyer; Executive Director, Chief Software Developer
Computers to Help People, Inc.
http://www.chpi.org
825 East Johnson; Madison, WI 53703






More information about the Blinux-list mailing list