SYSERR: Write to socket: Connection reset by peer

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago - 5 years 8 months ago #7386 by JTP
Hmm

Yesterday i came on and mud had restarted, and now again today.

The only thing file syslog/crash says twice is:

SYSERR: Write to socket: Connection reset by peer


Anyone that also gets this and know How to fix it ?
Last edit: 5 years 8 months ago by JTP.

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago - 5 years 8 months ago #7387 by JTP
6 Seconds prior to the mud starting up again


File crash shows:
Jan 17 04:10:20 :: No connections. Going to sleep.
Jan 17 04:10:20 :: New connection. Waking up.
SYSERR: gethostbyaddr: No such file or directory
Jan 17 04:10:27 :: WARNING: EOF on socket read (connection broken by peer)
Jan 17 04:10:27 :: Losing descriptor without char.
Jan 17 04:10:27 :: No connections. Going to sleep.
Jan 17 04:10:27 :: New connection. Waking up.
Jan 17 04:10:34 :: WARNING: EOF on socket read (connection broken by peer)
Jan 17 04:10:34 :: Losing descriptor without char.
Jan 17 04:10:34 :: No connections. Going to sleep.
Jan 17 04:10:35 :: New connection. Waking up.
SYSERR: gethostbyaddr: No such file or directory
SYSERR: Write to socket: Connection reset by peer
Jan 17 04:10:35 :: Losing descriptor without char.
Jan 17 04:10:35 :: Losing descriptor without char.


Notice the 2 loosing descriptor.....on excatly the same Second
Last edit: 5 years 8 months ago by JTP.

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7395 by cunning
That usually means a person tried to connect and than broke their connection to the game.

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago #7396 by JTP
Might be. But Aparently it crash the mud

It happened 2 times now over the last 2 days.


6 seconds after the last log entry, the mud started again

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7397 by WhiskyTest
Replied by WhiskyTest on topic SYSERR: Write to socket: Connection reset by peer
Is it the lookup table bug that Thomas fixed in another thread ? ...

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago #7398 by JTP
That thread looks Like something to do with dg_scripts ?

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7399 by cunning
It is not the bug Thomas fixed. It is preceeded with "add_to_lookup failed. Already there. (uid = 15398)". Now if the bug can be triggered.

Without any GDB traceback and info local, this is all speculation.

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago #7400 by JTP
But is it fixed, If its in the Stock code. We all should get this issue every now Now and Then

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7401 by cunning
You are confusing the issue here. You need to share some GDB output and info local , and a backtrace. There is no idea what is causing your crash. My bug in the stock trigger system has been there for quite a long time.

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago #7403 by JTP
It happened twice in two days, both backtrace shows this:

Program terminated with signal 6, Aborted.
#0 0x008a6402 in __kernel_vsyscall ()
(gdb) Hangup detected on fd 0
Error detected on fd 0
error detected on stdin

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7407 by cunning
This tells me nothing. I suspect you made code changes and re-compiled. I would suggest you run the game in GDB mode and capture the crashes.

You can do this via "gdb bin/circle" than once inside type run <port#>

than if you have a crash you can capture the backtrace, info local, and any other details.

Please Log in or Create an account to join the conversation.

  • JTP
  • Topic Author
  • Offline
  • Platinum Member
  • Platinum Member
More
5 years 8 months ago #7409 by JTP
I didnt make Any change

Im pretty sure this is a Stock problem, that most of you would experience from time to time


My problem is that GDB wont run for very long, Then its run out of momory

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7410 by Treblin
I have the same issue on my test port that is running stock 2018 release with the clan patch as the only addition. I let the mud run idlely and and noticed it had crashed from the same issue. I will try to get it up and running under gdb this weekend to see if i can reproduce it. Although its been running now nearly 3 days without it crashing so results may take time.

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7411 by Treblin
Completely forgot I had it lying around, here is the gdb output.
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Core was generated by `bin/circle -q 6000'.
Program terminated with signal 11, Segmentation fault.
#0  0x003223d0 in _int_free () from /lib/libc.so.6
(gdb) bt
#0  0x003223d0 in _int_free () from /lib/libc.so.6
#1  0x00326a89 in free () from /lib/libc.so.6
#2  0x0809d541 in close_socket (d=0x938cf70) at comm.c:2094
#3  0x0809f8f6 in game_loop (local_mother_desc=3) at comm.c:905
#4  0x080a0a77 in init_game (argc=Cannot access memory at address 0x1
) at comm.c:532
#5  main (argc=Cannot access memory at address 0x1
) at comm.c:352
(gdb) list
193     #endif
194
195       t->tv_sec = (int) (millisec / 1000);
196       t->tv_usec = (millisec % 1000) * 1000;
197     }
198
199     #endif  /* CIRCLE_WINDOWS || CIRCLE_MACINTOSH */
200
201     int main(int argc, char **argv)
202     {
(gdb) info local
No symbol table info available.
(gdb) up
#1  0x00326a89 in free () from /lib/libc.so.6
(gdb) list
203       int pos = 1;
204       const char *dir;
205
206     #ifdef MEMORY_DEBUG
207       zmalloc_init();
208     #endif
209
210     #if CIRCLE_GNU_LIBC_MEMORY_TRACK
211       mtrace();     /* This must come before any use of malloc(). */
212     #endif
(gdb) info local
No symbol table info available.
(gdb) up
#2  0x0809d541 in close_socket (d=0x938cf70) at comm.c:2094
2094            free(d->history[cnt]);
(gdb) list
2089      /* Clear the command history. */
2090      if (d->history) {
2091        int cnt;
2092        for (cnt = 0; cnt < HISTORY_SIZE; cnt++)
2093          if (d->history[cnt])
2094            free(d->history[cnt]);
2095        free(d->history);
2096      }
2097
2098      if (d->showstr_head)
(gdb) info local
temp = <value optimized out>
(gdb) up
#3  0x0809f8f6 in game_loop (local_mother_desc=3) at comm.c:905
905               close_socket(d);
(gdb) list
900         for (d = descriptor_list; d; d = next_d) {
901           next_d = d->next;
902           if (*(d->output) && FD_ISSET(d->descriptor, &output_set)) {
903             /* Output for this player is ready */
904             if (process_output(d) < 0)
905               close_socket(d);
906             else
907               d->has_prompt = 1;
908           }
909         }
(gdb) info local
input_set = {__fds_bits = {16, 0 <repeats 31 times>}}
output_set = {__fds_bits = {16, 0 <repeats 31 times>}}
exc_set = {__fds_bits = {0 <repeats 32 times>}}
null_set = {__fds_bits = {0 <repeats 32 times>}}
last_time = {tv_sec = 1516187435, tv_usec = 770570}
opt_time = {tv_sec = 0, tv_usec = 100000}
process_time = {tv_sec = 0, tv_usec = 800}
temp_time = {tv_sec = 0, tv_usec = 99200}
before_sleep = {tv_sec = 1516187435, tv_usec = 671370}
now = {tv_sec = 1516187435, tv_usec = 771302}
timeout = {tv_sec = 0, tv_usec = 0}
comm = "GET http://www.boxun.com/ HTTP/1.1\000 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)\000\000\334_+\000\370\242)\000\060j2\000\000\000\000\000\374\245)\000\230\242)\000\000\000\000\000\000\070\366\267\002\000+\000\021ii\r\000\000\000\000\000\000\000\000\061\350*\000\000\000\000\000h_+\000\006\000\000\000\f\000\000\000\370\301,\000\230b+\000\374\245)\000\230\242)\000\214&\366\267\330އ\277e\351*\000\b\000\000\000\210\000\000\000h\244)\000\360އ\277\030\352*\000\210\000\000\000\300_+\000\001\000\000\000 \337\207\277<\337\207\277\066\252*\000\000\000\000\000\b\000\000\000p"...
d = <value optimized out>
next_d = <value optimized out>
missed_pulses = <value optimized out>
maxdesc = <value optimized out>
aliased = 0
(gdb) up
#4  0x080a0a77 in init_game (argc=Cannot access memory at address 0x1
) at comm.c:532
532       game_loop(mother_desc);
(gdb) list
527       if (fCopyOver) /* reload players */
528       copyover_recover();
529
530       log("Entering game loop.");
531
532       game_loop(mother_desc);
533
534       Crash_save_all();
535
536       log("Closing all sockets.");
(gdb) info local
No locals.
(gdb) up
#5  main (argc=Cannot access memory at address 0x1
) at comm.c:352
352         init_game(port);
(gdb) list
347
348       if (scheck)
349         boot_world();
350       else {
351         log("Running game on port %d.", port);
352         init_game(port);
353       }
354
355       log("Clearing game world.");
356       destroy_db();
(gdb) info local
pos = <value optimized out>
dir = 0x8480078 "lib"

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7412 by thomas
Ok, so this is memory issues again.

Either a free() on previously free'd memory or a free on a stray pointer. I'll have a look.

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7425 by thomas
Ok, so I've tried replicating this on stock TBA. It just works, no issues, even with multiple simultaneous users timing out.

I've looked through the code, and it stands up to static analysis. So, do you have the exact same file in comm.c as this: github.com/tbamud/tbamud/blob/master/src/comm.c ?

Do you have code messing with d->history other places than in comm.c (you shouldn't)?

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7426 by Treblin
a diff of this comm.c and mine come back identical. As stated at least when I received the error it was on a completely stock 2018.1 version with only the clans patch added. It has also happened only the one time.

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7427 by thomas
Yeah, I was afraid of that.

It's actually most likely a buffer overrun somewhere else; as I said, the code stands up to static analysis.
That takes some more time to track down. Would you mind running these commands in gdb?
up
up
(you should be in frame #2 now)
print *d
print d->history[0]
print d->history[1]
print d->history[2]
print d->history[3]
print d->history[4]
print d->output

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago #7429 by cunning
Guys, i just went through a years worth of core i had, but it was not easy. I also did see this crash 3x last year. I suspected a buffer overflow as well. I have yet to dig into that one because of my normal every day job. Now that i see its more of an issue, I will dig into this a little more.


I had that same issue with Aliases. Once an alias was over 256 characters it corrupted the entire ASCII chain of aliases. I had to go back to the get_line() limit of 256 and noticed that we could handle 512 characters in the buffers we had local.

Please Log in or Create an account to join the conversation.

More
5 years 8 months ago - 5 years 8 months ago #7433 by Treblin
Here is the requested output:
(gdb) up
#1  0x00326a89 in free () from /lib/libc.so.6
(gdb) up
#2  0x0809d541 in close_socket (d=0x938cf70) at comm.c:2094
2094            free(d->history[cnt]);
(gdb) print *d
$1 = {descriptor = 154734240, host = "p!A", '\000' <repeats 37 times>,
  bad_pws = 0 '\000', idle_tics = 0 '\000', connected = 32, desc_num = 6,
  login_time = 1516187435, showstr_head = 0x0, showstr_vector = 0x0,
  showstr_count = 0, showstr_page = 0, str = 0x0, backstr = 0x0, max_str = 0,
  mail_to = 0, has_prompt = 0,
  inbuf = "\000ET http://www.boxun.com/ HTTP/1.1\r\nHost: www.boxun.com\r\nConne                                                                                                                                                             ction: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent                                                                                                                                                             : Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,"...,
  last_input = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.                                                                                                                                                             36 (KHTML, like Gecko)", '\000' <repeats 425 times>,
  small_outbuf = "Attempting to Detect Client, Please Wait...\r\n\377\375\030Col                                                                                                                                                             lecting Protocol Information... Please Wait.\r\n", '\000' <repeats 926 times>,
  output = 0x93901d0 "Attempting to Detect Client, Please Wait...\r\n\377\375\03                                                                                                                                                             0Collecting Protocol Information... Please Wait.\r\n", history = 0x8bde4b0,
  history_pos = 1, bufptr = 97, bufspace = 926, large_outbuf = 0x0, input = {
    head = 0x0, tail = 0x9390c88}, character = 0x0, original = 0x0,
  snooping = 0x0, snoop_by = 0x0, next = 0x0, olc = 0x0,
  pProtocol = 0x9390ea8, events = 0x9390a30}

(gdb) print d->history[0]
$2 = 0x9390b00 ".1"
(gdb) print d->history[1]
$3 = 0x9390b08 "(\v9\t: www.boxun.com"
(gdb) print d->history[2]
$4 = 0x9390b48 "p\v9\tection: keep-alive"
(gdb) print d->history[3]
$5 = 0x9390b98 "\310\v9\tpt-Encoding: gzip, deflate"
(gdb) print d->history[4]
$6 = 0x9390bf8 "\200\f9\tpt: */*"
(gdb) print d->output
$7 = 0x93901d0 "Attempting to Detect Client, Please Wait...\r\n\377\375\030Collecting Protocol Information... Please Wait.\r\n"
Last edit: 5 years 8 months ago by Treblin.

Please Log in or Create an account to join the conversation.

Time to create page: 0.217 seconds