Servers deadlocked on load

Jack Bishop shared this bug 11 months ago
Won't Fix

On random occasions, servers could be deadlocked indefinately when it is trying to load voxel data of a planet, this is happening on both Torch and vanilla DS


2023-04-11 07:18:04.502 - Thread: 9 -> Planet init info - MutableStorage:True StorageName:Moon-1353915701d19000 storage?:False 2023-04-11 07:18:04.502 - Thread: 7 -> Loading voxel storage from file 'C:\Users\Marc\AppData\Roaming\SpaceEngineersDedicated\Saves\Niko GTC UK #1 03-21-2023 10-22-31\EarthLike-1779144428d120000.vx2' 2023-04-11 07:18:04.502 - Thread: 9 -> Loading voxel storage from file 'C:\Users\Marc\AppData\Roaming\SpaceEngineersDedicated\Saves\Niko GTC UK #1 03-21-2023 10-22-31\Moon-1353915701d19000.vx2' 2023-04-11 07:18:04.801 - Thread: 1 -> Planet init info - MutableStorage:True StorageName:Titan-2124704365d19000 storage?:False 2023-04-11 07:18:04.801 - Thread: 1 -> Loading voxel storage from file 'C:\Users\Marc\AppData\Roaming\SpaceEngineersDedicated\Saves\Niko GTC UK #1 03-21-2023 10-22-31\Titan-2124704365d19000.vx2' 2023-04-11 07:18:05.236 - Thread: 8 -> Initialized large grid Black Widow Mk. VI (202 AI Test Setup) 16355 PCU 2023-04-11 07:18:05.236 - Thread: 8 -> Planet init info - MutableStorage:True StorageName:Mars-2044023682d120000 storage?:False 2023-04-11 07:18:05.236 - Thread: 8 -> Loading voxel storage from file 'C:\Users\Marc\AppData\Roaming\SpaceEngineersDedicated\Saves\Niko GTC UK #1 03-21-2023 10-22-31\Mars-2044023682d120000.vx2'


The above is from NikolasMarch


image


And the image is the presentation in torch.


I can provide full keen and torch logs on request but there is no steps to reproduce, sometimes this will happen 5 times in a row and then finally boot up.

Replies (14)

photo
2

Hapens as well seemingly at random, without any other errors. When it happens it just stucks itself on one of the planets and stops loading.

photo
2

the section of the log shown, it is the end of it, it stayed hung there for hours before i decided to force stop it, this wasnt the first occurrence, it had frozen twice more in the week leading up to the major update.

it hasnt happened since then, but the restart was every 6 hours, so it hasnt had many chances to lock up, ive set that restart cycle to every 20mins for testing, so will check again after some sleep

photo
2

i had this happen loading a singleplayer world as well. but not reliably. cant make it force to happen :/

photo
2

Have had this happen on one of our servers as well, it took 5 tries for that server to finally boot up...

photo
3

f37355a6694186385983afbbf3eaf437

photo
2

Has occurred 3 times on one instance for me then forced killed and booted again till fine, can't reproduce on demand though and is random.

photo
1

This has occurred many times on our Rumorsquad Vanilla Creative server. We have to restart it again manually for it to boot up. Please fix ASAP!

photo
1

Confirmed, happens here too.

photo
1

Same issue after update. It’s hit or miss if a server restarts successfully. But if it doesn’t it hangs on the planet portion shown in that log.

photo
1

Related issue: https://support.keenswh.com/spaceengineers/pc/topic/24210-performance-pre-calculate-or-cache-mydefinitionid-tostring-results

While the caching added in 1.202.066 works, it can also deadlock. Lock contention will also lower the performance, so prefer solving it without locking instead.


What would work is adding a read-only dictionary to use directly (without locking) for lookups as a primary cache. If not found, then a secondary read-write cache is used with a regular mutex and newly formatted strings are cached there.


From time to time or if the number of items in the secondary cache exceeds a threshold the union of the primary and secondary caches are produced and the primary cache is replaced by just overwriting its reference (atomic operation). The old primary cache will be just garbage collected. Since basically no new strings are formatted after the game loads it will be very performant.


The above solution had been tested in the Performance Improvements 1.10.5 with game version 1.201.014 (previous release) for almost a year.


This was the implementation as a Harmony patch:

https://github.com/viktor-ferenczi/performance-improvements/blob/a952fb9b7166882bccbd3cb44fd683a64c1779a3/Shared/Patches/Memory/MyDefinitionIdToStringPatch.cs


The two-layer cache implementation:

https://github.com/viktor-ferenczi/performance-improvements/blob/a952fb9b7166882bccbd3cb44fd683a64c1779a3/Shared/Tools/CacheForever.cs

photo
2

I can confirm as well that it happens on both the Official Dedicated Server and Torch Server. It's a random occurrence; sometimes it works six times in a row, and sometimes it does not.

photo
1

Exactly the same problem

photo
2

Hello, Jack Bishop and everyone,


there was a hotfix and there will be another one by tomorrow. Did it fix the issue for you? Thank you for your response.


Kind regards,

Keen Software House: QA Department

photo
1

Hello Engineers,

as there has been no reply to this thread in some time, we will close it. If you're experiencing this issue, please create a new thread.

Kind regards,

Keen Software House: QA Department

Leave a Comment
 
Attach a file