[1.199.025] EOS Extended Queue Overflow

Slushtrap Gamer shared this bug 2 years ago
Need More Information

When playing on EOS servers, sometimes the game automatically disconnects players with the reason being "Extended Queue Oveflow". Yet uknown whats the reason behind this, but a probable cause could be that, if the server is loaded enough, it disconnects players to save on performence(?). On the contrary, i have not experienced this happen to any steam servers whatsoever.

Replies (5)

photo
1

Hello, Slushtrap Gamer!

Sorry to hear you're experiencing this issue. Is this happening on a specific server? Are the servers with mods?

Could you please supply a log from when this happens?

You can access your log files by typing %appdata% into your Windows search bar and you will be redirected to the hidden Roaming folder. After that just follow: \Roaming\SpaceEngineers.

It would also be useful to see your windows event log too if this can also please be supplied

Kind Regards

Laura, QA Department

photo
1

So far only experienced this on one particular server, but then thats the only EOS server ive ever played on, and the server is completely vanilla. I have got the servers log which shows multiple instances of this bug manifesting itself on several players. However i couldn't find any logs on my own computer that showed the bug.

photo
1

Hello, Slushtrap Gamer!

Thank you for the log. Can you please provide a save file of the server? Also, do you have any reproduction steps before the disconnection of the players? Any information on how to get this to happen would be a huge help.

  • You can access your save files by typing %appdata% into your Windows search bar and you will be redirected to the hidden Roaming folder. After that just follow: \Roaming\SpaceEngineers\Saves. There should be a folder with your SteamID and your saves.
  • Please zip the file and attach it here. If you are having difficulty attaching files you can optionally use Google Drive. When sharing a google drive link please make sure it is set to be downloadable by anyone with the link.

Kind Regards

Laura, QA Department

photo
1

Heres the world of the same server, as well as a new log file of the day where the bug manifested itself. While the bug is "relatively" uncommon, it is unfortunately random and i cannot provide any concrete way of reproducing it. However, i have noticed that, when the server has been running for a long time, say 16-20+ hours, and its strained hard enough through performence heavy grids for example, thats when the bug starts to manifest itself.

photo
1

Having experienced the bug multiple times, im almost certain it has to do with the fact that the server has been running for a while.

photo
1

You might be able to get around the problem by making the server restart itself every 12 hours.

photo
1

It does potentially reduce it but doesn't entirely eliminate the problem. There was an instance where the server has been running for only 1-3 hours and a queue overflow still occured. On the contrary, every time the bug occurs it is following by a message stating: "P2P packet not sent". Reason still unclear.

photo
1

Hello, Slushttrap Gamer!

Thank you for the save and log. Do you have a copy of any of the logs from any of the players at all? Do you know if they experience this with any other server? Are they running Windows 10 or 7? I ask because another issue on here I'm dealing with has an issue specifically with Windows 7 only although not actually related I just wanted to check if there were any similarities at all :) I also really could do with some reproduction steps if you do happen to realise anything particular prompts the issue rather than just time spent with it running :)

Kind Regards

Laura, QA Department

photo
1

Hello, Slushtrap Gamer!

Thank you for the save and log. Do you have a copy of any of the logs from any of the players at all? Do you know if they experience this with any other server? Are they running Windows 10 or 7? I ask because another issue on here I'm dealing with has an issue specifically with Windows 7 only although not actually related I just wanted to check if there were any similarities at all :) I also really could do with some reproduction steps if you do happen to realise anything particular prompts the issue rather than just time spent with it running :)

Kind Regards

Laura, QA Department

photo
1

I would love to provide steps but there seems to be no recurring theme or reason as to why the bug manifests itself, mostly the only common factor being either time or heavy performence, or a combination of both. Our own server runs windows 10, and as for if this happens on other servers, unfortunately thats only shown in the server logs which i obviously cant provide. Its also difficult to say, from a players perspective, wether they got disconnected from a queue overflow or the server is simply freezing, since the bug leaves no traces, as far as i know, on our personal logs.

photo
1

Hello, Slushtrap Gamer!

I appreciate the further information. Sadly, we have not received any other reports of this. If you do realise that this is happening due to a certain prompt, please let me know as I will require some steps to pass on the issue or it can make these things really tricky to find and fix which I hope you can appreciate :)

Kind Regards

Laura, QA Department

photo
1

Hello, Slushtrap Gamer!

I appreciate even more further information. Sadly, we have not received any other reports of this. If you do realise that this is happening due to a certain prompt, please let me know as I will require some steps to pass on the issue or it can make these things really tricky to find and fix which I hope you can appreciate :)

Kind Regards

Laura, QA Department

photo
1

Ill spend the next few days to try and work out a repeatable method of bringing forth the bug.

photo
1

For starters, i have found that if running cpu-intensive operations while the server is also running, it has a much higher chance of spawning this bug. For example, while downloading and installing some large files, the bug manifested itself much more visibly than ever previously, occuring several times within the span of a minute, even though the server is not at all loaded and only been running for an hour at max. I can only speculate but, given the name of the log entry of the bug, it likely has to do with that the game is queueing packets, but isn't able to dispense them fast enough, thus causing an overflow which somehow results in the players being disconnected.

photo
1

Hello, Slushtrap Gamer!

Thank you for saying you will do some testing on this.

Thank you also for mentioning about the CPU -intensive operations. Do you think this could be the source of the issue or, not helping the situation? Do you think it would be narrowed down to just when you are performing these tasks? Have you had this happen while not doing this?

Kind Regards

Laura, QA Department

photo
1

I don't think its entirely down to cpu-intensive operations, as we had the bug occur prior without such operations going on at the time, or atleast i heavily lean in that direction. On the contrary it seems to heavily influence the rate at which this issue can be encountered, and it would sure make it easier to create a reproducable method.

photo
1

A very strong lead has been discovered: scripts. Certain scripts have been known to be inapropriate for use in multiplayer, i have heard claims it has to do with them overloading the bandwidth due to how much information is being transferred(?). One such example is Rdav's Fleet Command Mk.2, see https://steamcommunity.com/sharedfiles/filedetails/?id=1162841676. By enabling and properly utilising the LCD UI on a dedicated server of the EOS type, i managed to successfully trigger the bug, and although not 100% certain, it's the first time the bug has been intentionally spawned.

photo
1

Hello, Slushtrap Gamer!

Thank you for the information regarding the linked blueprint. I have not used this before and have set it up with how I believe it should be set up (but may be incorrect) I have not managed to make the bug happen after a few tries. Do you happen to have a save with this that I can test? I appreciate that it is happening with it so you may need to leave the last bit for me to do so that you are able to save it at a state that I could do the last part and test it. Any help would be excellent with this and much appreciated :)

Kind Regards

Laura, QA Department

photo
1

Alright, lets try to put this to rest once and for all. Ive provided a zip with the proper world in it. This world has the command module set up and running, and a large number of drones to make sure the script is properly set up and stressed. According to the experiment ive run, you have to "use" the script a little bit to trigger the bug, as it doesnt come instantly. To do this you simply get in the chair and control the remote; by moving the mouse in this state, the bug should start manifesting itself, if run on a dedicated server.

photo
1

Hello, Slushtrap Gamer!

Thank you for this. So far I have experienced the issue but weirdly not by your steps. I have had a few instances when loading the world, it will immediately disconnect me and I can see the queue overflow. I have also had the same result but by trying to open the admin menu if it is showing 'streaming' just after I have loaded in. I was reliably able to get it to do this but now I have not had it happen for the last 5 times I have loaded the game. I will keep testing this :)

Kind Regards

Laura, QA Departement

photo
1

While my server isn't totally vanilla, I haven't had issues with this connection error until recently. Just gonna attach the log file below to see if anything useful can be found :). As it stands though none of my friends across various platforms can get into the server, and restarting the server and the client doesn't appear to be doing anything either. The game will load exactly once for a client, then the server boots that player, and no other players will even get as far as rendering the world again.

photo
1

Hello, donaldoherty4!

Thank you for the log and input. You say this only started happening recently, has anything changed at all? PC? Mods? Internet? Just trying to establish if anything could have caused the sudden change. If you were to disable any mods/scripts are people able to enter at all? It's interesting that you say it will load in once, I was getting a similar situation with the above save file where I loaded in and immediately it disconnected me. I could then connect without issue after however.

Would it be possible to have a copy of your save file at all?

Kind Regards

Laura, QA Department

photo
1

Hi Laura, Yup there's been no changes in in the mods loaded, hardware, or networking. I haven't tried disable mods, but we can't use scripts on my server because the other players are on console so to the best of my knowledge there shouldn't be any scripts in the mods.

As you describe with load fail once then works, that was the case for me before, but now it disconnects players before they make it past the loading screen with the Extended Queue Overflow error.

I've attached the most recent backup of my server below :)

photo
2

Hello, donaldoherty4!

Thank you for the further information and the save file. So far, I have successfully loaded in without issue every time interestingly. I have not had a single instance of it disconnecting! I will keep trying this, however, and see if I can get the issue to arise :)

Kind Regards

Laura, QA Department

photo
1

Hmmm, could it be that My server is no longer powerful enough? It's a VM running on proxmox with 16GB of ram and 6 cores provided to the system. Underlying hardware is a dual Xeon X5675 system with 96GB of ram. I don't have many other VMs though so it's not as if the server is taxed as is

photo
1

The crux of this story is in the name of the bug itself; what sort of queue is overflowing and how that is affecting the server. Knowing this could help narrow down in which part of the system the bug manifests itself, i.e a queue of network packets, or a queue of tasks maybe.

photo
1

In that case, could PCAPs possibly help to see the network traffic going to a from the server?

photo
1

Hello, Engineers!

I've tried connecting multiple times with donaldoherty4's save file and still haven't triggered it. I've had the file running too to see if that would do anything but strangely I cannot seem to get it to trigger any disconnection sadly.

Kind Regards

Laura, QA Departement

photo
1

Hmmm, I'll allocate more processors to the server and see if that fixes my issue so, though I know frequency is king.

photo
1

Hello, donaldoherty4!

Please let me know how you get on and if anything changes with this :)

Kind Regards

Laura, QA Department

photo
1

[Yup, giving the virtual machine 2 extra cores restored my access, queue overflowed the first time then was able to re-join twice in succession. Console players who had an even harder time loading in are now able to access too. The issue first kick persists even when giving the vm 10 cores to work with.

That said, I'll provide any additional information I can be it PCap or other network analysis where possible if needed

photo
1

Hello, donaldoherty4!

Thank you so much for checking and appreciate the quick response. It would be great to see if this is at all possible for Slushtrap Gamer too. I will speak internally and see if we require any additional information and update the thread :)

Kind Regards

Laura, QA Department

photo
1

I dont think i can "change" how many cores are allocated because we dont run a virtual machine here. But from the replies would seem like its an overflow in processing, so more a cpu issue rather than a networking one. This would also coincide with my earlier observations; that a cpu-intensive process could significantly increase the chances of the bug manifesting, such as when installing large programs and/or downloading large files. While, we never had the issue that doherty describes; by running multiple instances of heavy scripts has at one point caused an almost manifestation of the overflow, regardless of any outside factors.

photo
1

Just gonna add the the server has since locked up again with the prior symptoms again. No changes in the server aside from a few lights and POC ship

photo
1

Hello, donaldoherty4!

Did this happen directly after making the lights/POC ship changes? Or, did it happen randomly?

Kind Regards

Laura, QA Department

photo
1

It happened the when we were trying to log in next. I culled as many grids that weren't owned and that improved availability.

Curiously one player can access without fail from xbox series x, while the other player on the same platform is most frequently affected by it. I myself on pc still have frequent issues trying to connect. No amount of vm restarts, increasing ram or cpu cores is helping anymore.

photo
2

I wonder how big and complex the world youre trying to load is, and if so, does that have any impact on the bug manifesting; as in trying to run the same world, same settings etc. but without anything in it. Also note if there are any particularly heavy scripts laying around; those are one of the more common factors too.

photo
1

I've attached the most recent log and backup here anyhow. I wouldn't have thought it was massively complex, but it must be pushing limits somewhere. Maybe the removal of one of the planets is in order?

As for scripts, the server in run with console compatibility, so it would make sense if both console players could connect 100% of the time, however only 1 can, and they're both using Xbox series X so realistically there shouldn't be a hardware bottleneck on their end. This also rules out scripting as the issue (On my end, but it can definitely be a contributing factor as you've already tested) as they cannot run on console (To the best of my knowledge).

The interesting thing I've observed in task manager though is none of the individual threads, even under loading in conditions seem to go above ~75% utilization, so it's not as if there isn't headroom for the process?

photo
1

Hello, donaldoherty4!

Thank you for this. Have you had a chance to run the server with the mods disabled? I have just been running it without and cannot get anything questionable to arise. I'd like to know if you have any issues without? If you don't, I wonder if you could add them in one by one to see if anything changes here?

Kind Regards

Laura, QA Department

photo
1

Only thing left to try I guess, I'll let ye know when I've had a chance!

photo
1

Can confirm that removing the mods made no difference in my case, extended queue overflow still occurs

photo
1

Also tried running the server from my own PC with a Ryzen 3800x and 16GB of DDR4, issue persisted from there too. Interestingly, disabling console compatibility briefly gave my pc access, but then upon leaving voluntarily and reconnecting the extended queue error appeared again.

photo
2

That is most peculiar... but i believe i may have found something here. Looking at your logs it would almost appear as if EOS is trying to process too much information and is unable to send it fast enough through the network as its queue fills up and overloads, thus denying you access to the server. However that would mean this bug is more on the network end of things as opposed to the cpu, unlike what i thought earlier. This also brings into question why worlds would get an overflow if run for a long time, or why it happens during cpu intensive operations. The first case, imo, seems to be that the network queue isnt emptied properly and so some parts are retained until its full enough and overflows. The second case may be explained by that; since the cpu processes network packets like most things, or so i think, running cpu intensive operations draws the processors ability to handle the network packets, thus again causing an overflow.

photo
1

Whats more interesting is that, like Laura, i don't seem to have any trouble running the world, there is no overflow and it doesnt kick me out. However im also running the server entirely locally; i.e on the same machine that i also play on, so perhaps the real issue lies in whatever medium your server uses to transfer between client and server.

photo
1

Are you running on the dedicated server locally then connecting to it through EOS? Probably silly question but just want to be sure lol. Anywho I'm attaching a Pcap from the server. It's mostly encrypted. I also tested on another server in the house with an EPYC 7551 processor in it that is not behind my personal firewall to see if that made any difference. First connection works, any subsequent connections fail with the over flow.

photo
1

Yeah, just run the inbuilt dedicated server with the world loaded in, proper configs etc... and then just connect through EOS, with the aformentioned results. I do quite wonder if you have the server machine at home with you, connected to the same network as you, and then its just routed through EOS.

photo
1

It's definitely being routed through Eos, even typing the ip in directly fails, I've only ever been able to connect to servers to my servers through eos.

photo
1

In that case, i can only see one possible explanation as to why you have so many buffer overflows; and i believe it has to do with the virtual machine you run the server on. By no means do i know why or how, but everything considered its the only thing that could be potentially interfiering with the network; assuming that you probably have a pretty good connection.

photo
1

Ya see that's the thing. I have a second server that I tested on, and after the first login it did the same thing. The server with the epyc 7551 is running purely Windows on baremetal hardware. That test was to bring the issue away from both virtual machine and from behind my firewalls to make sure they weren't interfering.

It couldn't be a memory issue on that server either cause it has access to 256GB of the stuff lol.

I wonder is there a way to increase that GC buffer value we see getting filled in the logs?

photo
1

Could of sworn, a pity. Unfortunately im not well versed enough to know how to change that, im afraid. However ill upload logs of me running the world and connecting to it same as you, hopefully it will shed some light as to where it goes wrong.

photo
1

I wonder if it has something to do with ownership of grids, maybe assign the May of creation and a few other ships to your character to see if ownership has any affect on this. Other than that though, I honestly don't know where its going wrong.

The packets not being sent log after the queue fills is the most bizarre part to me. Surely it should keep sending packets to try empty the queue? I wonder if this is an epic online services issue rather than a keens?

photo
1

Yeah... its becoming more evident that this issue happens in the medium between the server and the client. After looking through and comparing both side by side, i found this line:

EOS Networking: Dropping client [RE Zippingdonal] for exceeding the maximum queue limit. Queue: 1120 messages totaling 5536960 bytes.

Which begs the question why the queue would not be emptied in the first place. If the packets cannot be processed by EOS that would explain why it keeps bottling up like that. This might also explain why running the server for a long time can also cause the bug to manifest itself. However its still a question why EOS would be unable to process the packets for you in particular and not for say the rest of us. It doesnt really make sense.

photo
1

The only thing I can think of is ISP... and funnily enough both myself and the other console player have the same ISP: Virgin media.... actually what other console player has said is he clears a cache on xbox if he has issues, and that fixes it, but I've never seen a log with his name and the extended queue overflow... I'm due an enterprise firewall soon so I'll configure that and have it log everything to see of I can catch any sessions being dropped at random

photo
1

I have thought of something, what if the server was run on steam instead of eos. Yes it would disable crossplay but the real question is wether the type of service has an effect on this bug. The bug exist on steam too, of course, but from what ive seen it happens much more rarely and so perhaps it could be that the way eos works has some effect on wether the bug manifests itself and how often. Your world seems to consistently get the overflow, and so it would make the perfect experiment to test this with.

photo
1

I'll try that in a little while and update thread with results so, and I'll run it on 2 separate servers to verify

photo
1

Here's the logs from running both. Had no issue connecting to the server running on steam. Even direct connect worked for the first time lol. The moment I switched back to EOS though, I run into the same error again. Also can ye direct connect using EOS? Cause I can't when using my public IP and the necessary port (Port 3636 is open the whole down to the space engineers server and worked for steam)

photo
1

Then perhaps EOS is the real issue here. From what i understand, the way EOS works is that it creats a sort of "translate-link", which is basically just: eos://xxxxxxxxxxxxxxxxxxxxxxx, along these lines. This is the "direct connect" link you have to use to connect to an EOS server. I wonder if the bottleneck occurs due to EOS acting as a medium between client and server, and somehow it clogs up enough information to cause the overflow.

photo
1

An interesting experiment would be if you could run the EOS server for a while, and i would attempt to connect to it from here to see if its on the client side or on the server side that the overflow occurs. If the ISP is the problem, like you mentioned before, then i should not be able to connect either with the same overflow occuring when i try. I do wonder if you attempt to join other EOS servers cause you to overflow too, or if its just this particular server that you yourself run. We could also try this vice-versa, where i run a server with this particular world and you attempt to join, either way we oughta get some interesting results.

photo
1

We'll try that so, it's "dod_server" and the password is Jimothy55. Let me know how you get on!

photo
1

So, at GMT Time 14:20 i managed to join the server without problems it would appear. I was not thrown out which would indicate an overflow didnt occur either.

photo
1

Hmm strange, would it be possible for you to try to spawn where the Maw of creation is?

photo
1

I managed to spawn into a drop pod above alien, so i dont see why i wouldnt be able to; but ill give it a try.

photo
1

There is an interesting phenomenon at play though; i appear to see two instances of the same server but under different links. One i can join as before, however not the other one.

photo
1

Go for the lower one, the service restarted on my main pc, I stopped it earlier though, and disabled it. It mustve restarted, I'll deal with it soon

photo
1

Okay even more interesting, logged in over steam and moved my character about 10km away from where I was trying to spawn, then switched back to Eos and was able to login. So we're back to cpu constraints, but only when communicating with the EOS servers.

photo
1

Yup, it looks like the area we build stuff seems to be too dense for EOS to handle, moved back into my cryo pod and tried connecting and it immediately threw the error again

photo
1

That doesnt explain why i could explore that same area when im running the world. By setting it to EOS you force the game to use EOS as a medium, and because of the translate link you shouldnt be able to direct connect regardless of how local the connection is. Although i do wonder whether running the server on the same machine as the client had an impact on that.

photo
1

I'll reword my last post, it looks like it's too dense for eos to let characters spawn in that area. Once you move towards it once your spawned though, the bite sized chunks allow the area to load normally because everything there isn't being sent at once.

My test was loading the server and connecting through steam. After that I reloaded the server through EOS as a medium and moved back to the dense location. After trying to reconnect through EOS though, the error presented it again.

I wonder if it has something to with ownership as well though. I'll try test that later.

photo
1

Not sure how ownership would have anything to do with it. Thats just a long dedicated to the character/entity that owns the grid, and its present with every grid so its impact should be minimal at best. To add to the point; i could load the world and spawn at the maw of creation without it overflowing (on my local server atleast). On your server i kept respawning at my pod so i couldn't test it unfortunately.

photo
1

A well if that's the case, it probably isn't ownership then lol. I'm just throwing ideas and see what's sticks lol

photo
1

As of today (2022-01-07) the bug can still be observed to happen.

photo
photo
1

Bug can still be observed in the newest version (1.200)

photo
1

So far only solution we could get was to move all our stuff to a new world. So far the same issues haven't appeared again, but the issue is still present in the original save (one person was able to load in and make blueprints of our ships thus making their transfer into another world possible)

photo
1

Still being observed in the newest version (1.201)

Leave a Comment
 
Attach a file