the problem of test.js which using UTF8 encoding #8
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
BeRo1985/besen#8
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi, here is my js file which save as utf8 encoding:
function test(a) { println(a); return 1; } test('哈哈');I used
BESENShell.exe test.jsto execute it. but '哈哈' cannot print out.The BESENShell.exe on the repo is built with Delphi 7 (which is from August 2002 and before the offical Unicode-Support-epoch inside Delphi), and BESENShell uses Write/WriteLn, so that it's normal.
For UTf8 output on the win32 console on older Delphi versions (like for example Delphi 7), you do need SetConsoleOutputCP(CP_UTF8); and changing the console font to a UTF8-capable per GetStdHandle(STD_OUTPUT_HANDLE), GetCurrentConsoleFontEx and SetCurrentConsoleFontEx, and then use something like WriteConsoleW(ConsoleHandle,PWideChar(s),80,Written,nil); instead Write/WriteLn.
For more details, look into the MSDN.
And since it isn't a BESEN issue per se, I'll close this issue.
but if encoding of test.js is ansi, I use
to covert it to utf8, then '哈哈' can print.and the charcode of is difference from test.js which using utf8 encoding.
and if I use Delphi xe2, '哈哈' will print '1t1t'
BESENGetFileContent is defined as
function BESENGetFileContent(fn:TBESENANSISTRING):TBESENANSISTRING;so it returns a ansistring, where BESENConvertToUTF8 converts this "ansi" string into a UTF8 string (but still as ansistring as container due to support for older Delphi versions) but with the ansi codepoints, so insert a UTF8 BOM (0xEF 0xBB 0xBF) at the beginning "or" remove the BESENConvertToUTF8 call, because BESENConvertToUTF8 does at the UTF8 BOM case just:so that you should use BESENUTF8ToUTF16 for to convert a ansistring-misused UTF8 string into a WideString for the Win32 unicode-capable API, or even just that from ansistring-misused UTF8 string to a real UTF8String for newer Delphi versions:
and have a look into http://stackoverflow.com/questions/26255148/is-writeln-capable-of-supporting-unicode due to the general Write/WriteLn Unicode problematic.
TLDR as summary: BESEN misuses the ansistring datatype for to containing UTF8 data, so that it works also on older Delphi and FreePascal versions, and you must convert it to a real unicode string datatype in a raw way for printing it to the screen.
thanks for you answer, but I mean is that:
in delphi7,
if test.js is ansi encoding, I use
BESENConvertToUTF8(BESENGetFileContent('test.js'))to covert ‘哈哈’ to utf8, it will be '鹿镁鹿镁', and ‘哈哈’ will print.but if test.js is utf8 encoding (with bom), then I use
BESENConvertToUTF8(BESENGetFileContent('test.js'))to covert ‘哈哈’ to utf8, it will be '鍝堝搱', and ‘哈哈’ will not print.So I think that: BESENConvertToUTF8 function may be has some issue which lead to encoding wrong?