JS函数charCodeAt的Lua实现
charCodeAt by Lua
@(Lua JavaScript charCodeAt)
I wanted to have a function
charCodeAt
in Lua ,and it should works exactly likejavascript
but with Lua5.1 ,UTF8 and Unicode are not supported,
1: how charCodeAt works in javascript
to show Console press F12 in Chrome( MAC:CMD+alt+J)
[
'你'.charCodeAt(0),
'ñ'.charCodeAt(0),
'n'.charCodeAt(0)
]
it will output [20320, 241, 110] ,it means the numeric value of Unicode , '你'=20320 , 'ñ'=241, 'n'=110.
The charCodeAt() method returns the numeric Unicode value of the character at the given index (except for unicode codepoints > 0x10000).
according to alexander-yakushev we can know how many bytes one UTF8 word takes using function utf8.charbytes
[https://github.com/alexander-yakushev/awesompd/blob/master/utf8.lua]
function utf8.charbytes (s, i)
-- argument defaults
i = i or 1
local c = string.byte(s, i)
-- determine bytes needed for character, based on RFC 3629
if c > 0 and c <= 127 then
-- UTF8-1 byte
return 1
elseif c >= 194 and c <= 223 then
-- UTF8-2 byte
return 2
elseif c >= 224 and c <= 239 then
-- UTF8-3 byte
return 3
elseif c >= 240 and c <= 244 then
-- UTF8-4 byte
return 4
end
end
Unicode & UTF8 convert method
Unicode code range | UTF-8 code | example |
---|---|---|
hex code | binary code | char |
0000 0000-0000 007F | 0xxxxxxx | n (alphabet) |
0000 0000-0000 007F | 110xxxxx 10xxxxxx | ñ |
0000 0080-0000 07FF | 1110xxxx 10xxxxxx 10xxxxxx | 你 (most CJK) |
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | other chars |
but we should pay attention to 4 bytes UTF8[emoji
], it works not that simple
special Method
javascript engine using UTF16,characters in Basic Multilingual Plane
were the same with unicode, but if the characters were in Supplementary Plane
it should use the formula below,usually we encounter Supplementary Plane
emoji like (4 byte UTF8 character)
-- formula 1
H = Math.floor((c-0x10000) / 0x400)+0xD800
L = (c - 0x10000) % 0x400 + 0xDC00
code is here
https://github.com/lilien1010/lua-bit
Feedback & Bug Report
- Twitter: [@lilien1010]
- Email: [email protected]
Thank you for reading this , if you got any better idea, share it.