problem with patch: proc_color in comm.c file

  • felipevr
  • Topic Author
  • Offline
  • Fresh Boarder
  • Fresh Boarder
More
5 years 7 months ago #1705 by felipevr
Hi there, how're you doing?

I'm starting try somethings as coder with tba, but I got a big problem.

I try add a patch to latin acentuation, but after some work, I needed to find the
proc_collor at comm.c file. The problem: in my tba3.64 this don't exists!
so, I ask, what can I do now? how solv this problem and use acentuations?

the patch is here:
dl.dropbox.com/u/61348426/acentos.rar

thank you!
Felipe

Please Log in or Create an account to join the conversation.

  • Vatiken
  • Vatiken's Avatar
  • Offline
  • Administrator
  • Administrator
  • tbaMUD Programmer
More
5 years 7 months ago #1722 by Vatiken
Replied by Vatiken on topic problem with patch: proc_color in comm.c file
proc_color() is outdated and has been replaced by ProtocolOutput() found in the protocol.c file.

/******************************************************************************
How do I use unicode characters?
******************************************************************************/

Unicode characters can be displayed in a similar way to colour, using square
brackets to provide both a unicode value and an ASCII substitute. For example:

\t[U9814/Rook]

The above will draw a rook (the chess piece - unicode value 9814) if the client
supports UTF-8, otherwise it'll display the text "Rook".

As with extended colour, support for UTF-8 is detected automatically - in this
case using the CHARSET telnet option. However it's not possible to detect if
their font includes that particular character, or even if they're actually
using a unicode font at all, so some care will need to be taken.

A free unicode font that I've found good is Fixedsys Excelsior, which you can
download from here: www.fixedsysexcelsior.com/

Also of interest: en.wikipedia.org/wiki/List_of_Unicode_characters

Unicode characters include all latin letters, so tbaMUD can display latin characters to anyone who has charset enabled and has the proper font. Not sure if this helps or not, but I'm a little green when it comes to adding latin characters to my MUDs.

tbaMUD developer/programmer

Please Log in or Create an account to join the conversation.

  • felipevr
  • Topic Author
  • Offline
  • Fresh Boarder
  • Fresh Boarder
More
5 years 7 months ago #1754 by felipevr
Replied by felipevr on topic problem with patch: proc_color in comm.c file
Thank you!
I got it.
I could solve the problem with other method that I send here:

index:comm.c
@@@15,3
#include "conf.h"
#include "sysdep.h"
+#include "locale.h"


/* Begin conf.h dependent includes */
@@@749,5
gettimeofday(&last_time, (struct timezone *) 0);

/* The Main Loop. The Big Cheese. The Top Dog. The Head Honcho. The.. */
+ setlocale(LC_CTYPE, "hu_HU.ISO-8859-2"); /*modify this to your locale*/
while (!circle_shutdown) {

@@@1902,5
} else
space_left++;
}
- } else if (isascii(*ptr) && isprint(*ptr)) {
+ } else if (isprint(*ptr)) {
if ((*(write_point++) = *ptr) == '$') { /* copy one character */
*(write_point++) = '$'; /* if it's a $, double it */

thank you again!

Please Log in or Create an account to join the conversation.

More
11 months 3 weeks ago #7108 by sz4bo
Hi, sorry for resurrecting this thread, I am tinkering with unicode as well for the purpose of working accented latin letters in commands and descriptions as well.

It is 2017 and as far as I see everything is UTF-8 in latest linux and windows8-10 command line as well. Unicode strings goes throug without issue from server to telnet client nowadays, if telnet fails then mud clients can handle it too, like tintin++. I have proof of concept python mud server that can send UTF-8 text and receive UTF-8 commands!! while connected with standard linux telnet client, this PoC is based on MiniBoa open source server. MiniBoa does not even have CHARSET negotiation implemented, it have the following options and nothing more:
BINARY = chr( 0) # Transmit Binary
ECHO = chr( 1) # Echo characters back to sender
RECON = chr( 2) # Reconnection
SGA = chr( 3) # Suppress Go-Ahead
TTYPE = chr( 24) # Terminal Type
NAWS = chr( 31) # Negotiate About Window Size
LINEMO = chr( 34) # Line Mode

I did not tried yet the solution provided about modify TBA's comm.c where I remove the isascii check, but I will try soon.

What I have found too is probably related to that isascii() function that gags non-ASCII text from every input while playing the MUD, or using OLC. This makes funny things:

500H 100M 82V > say Árvíztűrő
You say, 'rvztr'

What more I have found is that when you edit directly the /lib/world/wld files under linux with nano, I can enter unicode text in room descriptions, and the description will be correct under linux/telnet client even if the CHARSET/UTF negotiation fails it works until the client terminal is set to unicode locale:

An idea would be to remove sanity checks for string after input maybe in whole TBA? That will unblock unicode character flow and it solve this kind of problem, maybe make this a configuration option?

Cheers!

PS: I know the \t[U9814/Rook] method for TBA mud, it is not a good option for long unicode text.
Attachments:

Please Log in or Create an account to join the conversation.

More
11 months 3 weeks ago - 11 months 3 weeks ago #7123 by sz4bo
Hi, I found the single point where a slight modification unblocks proper unicode handling for the whole tbaMUD and CircleMUD as well!

My previous post was about the fact that tbaMUD can display UTF8 room descriptions when the wld files were edited in a UTF8 editor (like in nano in linux terminal). Though you cannot enter unicode strings inside the MUD OLC because non ascii letters will be gagged, the stock TBAMUD was able to send and properly display unicode text.

The C isascii() and isprint() can handle US-ASCII characters, the rest is gagged out - like accented latin utf chars or greek, chinese etc...

in comm.c this part is the key part, this else if gags out non us-ascii characters:
} else if (isascii(*ptr) && isprint(*ptr)) {
	if ((*(write_point++) = *ptr) == '$') {		/* copy one character */
	  *(write_point++) = '$';	/* if it's a $, double it */
	  space_left -= 2;
	} else
	  space_left--;
      }
    }

change this line:
} else if (isascii(*ptr) && isprint(*ptr)) {
to this:
} else if (*ptr == *ptr) {

That's it, nothing else should be touched...

Ok I know it is an ugly hack - changing an if condition to always be true, but the original code is also an ugly hack sitting in comm.c since circlemud 2.01 (that was released in 1993)! OK I know unicode was not everywhere back then, it was probably existen only in the Plan9 OS from Bell Labs in the 90's. Just check old circlemd soucres and search for isascii in comm.c there if you no believe me! ;-)

The code block is a workaround for a case obscure issue where an incoming string ends with $ causing problems on some unix systems so it changes $ to $$ to prevent workaround problems. Well the bad news it prevented UTF capability for circle/tbaMUD since they exists! :-(

What would be good is to implement an unicode equivalent for isascii & isprint for this loop, probably iswascii and iswprint or similar stuff, because my fix removing the filter between user input and the mud, so it is possible to enter nonprintable/obscure characters as well. In the worst case some weird non-alphanumeric invisible characters that may cause issues. Anyway, this fix works now, you can implement truly international commands, use hindy or cyrillic letters for mobs, rooms, triggers, it works!

Here is the full block of the $ issue in the main loop from the stock code:
/* The '> 1' reserves room for a '$ => $$' expansion. */
    for (ptr = read_point; (space_left > 1) && (ptr < nl_pos); ptr++) {
      if (*ptr == '\b' || *ptr == 127) { /* handle backspacing or delete key */
        if (write_point > tmp) {
          if (*(--write_point) == '$') {
            write_point--;
            space_left += 2;
          } else
            space_left++;
        }
      } else if (isascii(*ptr) && isprint(*ptr)) {
        if ((*(write_point++) = *ptr) == '$') {         /* copy one character */
          *(write_point++) = '$';       /* if it's a $, double it */
          space_left -= 2;
        } else
          space_left--;
      }
    }

My server: Ubuntu Core 16.04 running on NanoPi Neo (ARM CPU), stock GCC, stock tbamud 3.68 compiled and started with autorun.sh. Linux have all locale set to: en_EN.utf8 or hu_HU.utf8.
Test systems: on the server with stock telnet, utf works. From a linux desktop with latest ubuntu, in terminal with stock telnet (system locale set to us_EN.utf8), unicode arrived/sent properly. OSX 10.x with tt++, utf works, windows7 with tt++ and putty, unicode works.

Attachments:
Last edit: 11 months 3 weeks ago by sz4bo.

Please Log in or Create an account to join the conversation.

More
11 months 3 weeks ago - 11 months 3 weeks ago #7125 by sz4bo
Hi again, method seemingly works, though my modification definitely makes glitches in the tbaMUD when running. Glitches that caused by wide characters in strings, like accented ones that are two bytes long. Code parts where code is waiting ascii - one letter equals one byte - strings can yield unwanted results. This all can be boiled down to different length of the string when a string is unicode and not ascii, and handling the length difference in the MUD code.

wiki.sei.cmu.edu/confluence/display/c/ST...trings+and+functions

Yep.

Full code needs to be get rid of strncpy, stpcpy, strdup, isascii, isalnum, islower and such, they are ascii text handling C functions, they need to be replaced their wide character supporting (wcs) cousine functions like wcscpy, wcsncpy, wcsdup, iswalnum etc... also #include ctype.h or wctype.h is needed too. Then, tba will be able to 100% safely deliver unicode.

What are your thoughts?

Cheers!
Last edit: 11 months 3 weeks ago by sz4bo.

Please Log in or Create an account to join the conversation.

Time to create page: 1.789 seconds