Hi, I found the single point where a slight modification unblocks proper unicode handling for the whole tbaMUD and CircleMUD as well!
My previous post was about the fact that tbaMUD can display UTF8 room descriptions when the wld files were edited in a UTF8 editor (like in nano in linux terminal). Though you cannot enter unicode strings inside the MUD OLC because non ascii letters will be gagged, the stock TBAMUD was able to send and properly display unicode text.
The C isascii() and isprint() can handle US-ASCII characters, the rest is gagged out - like accented latin utf chars or greek, chinese etc...
in comm.c this part is the key part, this
else if gags out non us-ascii characters:
Code:
} else if (isascii(*ptr) && isprint(*ptr)) {
if ((*(write_point++) = *ptr) == '$') { /* copy one character */
*(write_point++) = '$'; /* if it's a $, double it */
space_left -= 2;
} else
space_left--;
}
}
change this line:
Code:
} else if (isascii(*ptr) && isprint(*ptr)) {
to this:
Code:
} else if (*ptr == *ptr) {
That's it, nothing else should be touched...
Ok I know it is an ugly hack - changing an if condition to always be true, but the original code is also an ugly hack sitting in comm.c since circlemud 2.01 (that was released in 1993)! OK I know unicode was not everywhere back then, it was probably existen only in the Plan9 OS from Bell Labs in the 90's. Just check old circlemd soucres and search for isascii in comm.c there if you no believe me!
The code block is a workaround for a [strike]case[/strike] obscure issue where an incoming string ends with $ causing problems on some unix systems so it changes $ to $$ to [strike]prevent[/strike] workaround problems. Well the bad news it prevented UTF capability for circle/tbaMUD since they exists!
What would be good is to implement an unicode equivalent for isascii & isprint for this loop, probably iswascii and iswprint or similar stuff, because my fix removing the filter between user input and the mud, so it is possible to enter nonprintable/obscure characters as well. In the worst case some weird non-alphanumeric invisible characters that may cause issues. Anyway, this fix works now, you can implement truly international commands, use hindy or cyrillic letters for mobs, rooms, triggers, it works!
Here is the full block of the $ issue in the main loop from the stock code:
Code:
/* The '> 1' reserves room for a '$ => $$' expansion. */
for (ptr = read_point; (space_left > 1) && (ptr < nl_pos); ptr++) {
if (*ptr == '\b' || *ptr == 127) { /* handle backspacing or delete key */
if (write_point > tmp) {
if (*(--write_point) == '$') {
write_point--;
space_left += 2;
} else
space_left++;
}
} else if (isascii(*ptr) && isprint(*ptr)) {
if ((*(write_point++) = *ptr) == '$') { /* copy one character */
*(write_point++) = '$'; /* if it's a $, double it */
space_left -= 2;
} else
space_left--;
}
}
My server: Ubuntu Core 16.04 running on NanoPi Neo (ARM CPU), stock GCC, stock tbamud 3.68 compiled and started with autorun.sh. Linux have all locale set to: en_EN.utf8 or hu_HU.utf8.
Test systems: on the server with stock telnet, utf works. From a linux desktop with latest ubuntu, in terminal with stock telnet (system locale set to us_EN.utf8), unicode arrived/sent properly. OSX 10.x with tt++, utf works, windows7 with tt++ and putty, unicode works.
Attachment Capture.JPG not found