Correct me if I am wrong but in my opinion, PHP is not really supporting development with UTF-8 this is why my function is checking the bytes low-level.
Simply change the variable $string to the text you want to convert and the output will be value and character row by row.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us" dir="ltr" > <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> </head> <body>
<?php $string = "|^€{}[~]\\"; $count = 0; for ($i=0; $i < strlen($string); $i++) { echo ordUTF8($string, $i, $count)." ".$string[$i]."<br />"; $i += $count - 1; } function ordUTF8($string, $index = 0, &$bytes = null) { $len = strlen($string); $bytes = 0; if ($index >= $len) { return false; } $h = ord($string{$index}); if ($h <= 0x7F) { $bytes = 1; return $h; } else if ($h < 0xC2) { return false; } else if ($h <= 0xDF && $index < $len - 1) { $bytes = 2; return ($h & 0x1F) << 6 | (ord($string{$index + 1}) & 0x3F); } else if ($h <= 0xEF && $index < $len - 2) { $bytes = 3; return ($h & 0x0F) << 12 | (ord($string{$index + 1}) & 0x3F) << 6 | (ord($string{$index + 2}) & 0x3F); } else if ($h <= 0xF4 && $index < $len - 3) { $bytes = 4; return ($h & 0x0F) << 18 | (ord($string{$index + 1}) & 0x3F) << 12 | (ord($string{$index + 2}) & 0x3F) << 6 | (ord($string{$index + 3}) & 0x3F); } else { return false; } } ?>
</body> </html>
No comments:
Post a Comment