UTF-8 Multibyte String Encoding

Decode in php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Hexadecimal
#
# utf-8 (ASCII)
# \x2a
#
# utf-8 (2-byte)
# \xC3\xa9
#
# utf-8 (3-byte)
# \xe5\x8f\x82
#
# utf-8 (4-byte)
# \xF0\x9F\x98\x81
#



# decode in php
$str = "\xe5\x8f\x82";
echo mb_convert_encoding($str, 'utf-8');
# output: 参
echo mb_convert_encoding($str, mb_detect_encoding($str));




# Decimal
#
# \ud83d\ude01
#
# decode in php
$str = "\ud83d\ude01";
echo json_decode('"' . '\ud83d\ude01' . '"');
# output: [smile]




# HTML Entities encoding and UTF-16
#
# ¼
# \x10\x00
#
# decode in php
echo mb_convert_encoding('¼', 'UTF-8', 'HTML-ENTITIES');
# output: ¼
echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');
# output: က
.

Reference

https://stackoverflow.com/questions/6058394/unicode-character-in-php-string