System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

Viktor Ferenczi shared this bug 22 months ago
Need More Information

Dedicated server 1.193.101

Torch v1.3.1.98-master

Exception occurred in original dedicated server code:

18:47:45.6411 [FATAL] Initializer: System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

  at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Sandbox.Game.Entities.Cube.MyDisconnectHelper.AddNeighbours(MySlimBlock firstBlock, Boolean& anyWithPhysics, MySlimBlock testBlock)
at Sandbox.Game.Entities.Cube.MyDisconnectHelper.Disconnect(MyCubeGrid grid, MyTestDisconnectsReason reason, MySlimBlock testBlock, Boolean testDisconnect)
at Sandbox.Game.Entities.MyCubeGrid.DetectDisconnects()
at Sandbox.Game.Entities.MyCubeGrid.DoLazyUpdates()
at Sandbox.Game.Entities.MyCubeGrid.UpdateAfterSimulation()
   at Sandbox.Game.Entities.MyEntities.<>c.<UpdateAfterSimulation>b__113_0(MyEntity x)
at VRage.Collections.MyDistributedUpdater`2.Iterate(Action`1 p)
at Sandbox.Game.Entities.MyEntities.UpdateAfterSimulation()
at Sandbox.Game.World.MySector.UpdateAfterSimulation()
at Sandbox.Game.World.MySession.UpdateComponents()
at Sandbox.Game.World.MySession.Update(MyTimeSpan updateTime)
at Sandbox.MySandboxGame.Update()
at Sandbox.Engine.Platform.Game.UpdateInternal()
at Sandbox.Engine.Platform.Game.RunSingleFrame()
at Sandbox.Engine.Platform.FixedLoop.<>c__DisplayClass11_0.<Run>b__0()
  at Sandbox.Engine.Platform.GenericLoop.Run(VoidAction tickCallback)
at Sandbox.Engine.Platform.Game.RunLoop()
  at Sandbox.MySandboxGame.Run(Boolean customRenderLoop, Action disposeSplashScreen)
at Torch.VRageGame.DoStart() in C:\jenkins\workspace\Torch_Torch_master\Torch\VRageGame.cs:line 245
at Torch.VRageGame.Run() in C:\jenkins\workspace\Torch_Torch_master\Torch\VRageGame.cs:line 118
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
  at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Sandbox.Game.Entities.Cube.MyDisconnectHelper.AddNeighbours(MySlimBlock firstBlock, Boolean& anyWithPhysics, MySlimBlock testBlock)
at Sandbox.Game.Entities.Cube.MyDisconnectHelper.Disconnect(MyCubeGrid grid, MyTestDisconnectsReason reason, MySlimBlock testBlock, Boolean testDisconnect)
at Sandbox.Game.Entities.MyCubeGrid.DetectDisconnects()
at Sandbox.Game.Entities.MyCubeGrid.DoLazyUpdates()
at Sandbox.Game.Entities.MyCubeGrid.UpdateAfterSimulation()
   at Sandbox.Game.Entities.MyEntities.<>c.<UpdateAfterSimulation>b__113_0(MyEntity x)
at VRage.Collections.MyDistributedUpdater`2.Iterate(Action`1 p)
at Sandbox.Game.Entities.MyEntities.UpdateAfterSimulation()
at Sandbox.Game.World.MySector.UpdateAfterSimulation()
at Sandbox.Game.World.MySession.UpdateComponents()
at Sandbox.Game.World.MySession.Update(MyTimeSpan updateTime)
at Sandbox.MySandboxGame.Update()
at Sandbox.Engine.Platform.Game.UpdateInternal()
at Sandbox.Engine.Platform.Game.RunSingleFrame()
at Sandbox.Engine.Platform.FixedLoop.<>c__DisplayClass11_0.<Run>b__0()
  at Sandbox.Engine.Platform.GenericLoop.Run(VoidAction tickCallback)
at Sandbox.Engine.Platform.Game.RunLoop()
  at Sandbox.MySandboxGame.Run(Boolean customRenderLoop, Action disposeSplashScreen)
at Torch.VRageGame.DoStart() in C:\jenkins\workspace\Torch_Torch_master\Torch\VRageGame.cs:line 245
at Torch.VRageGame.Run() in C:\jenkins\workspace\Torch_Torch_master\Torch\VRageGame.cs:line 118
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
I suspect this is a race condition between MyCubeGrid.DetectDisconnects() where the crash happened and some other code actually separating the disconnected grid part.

World was relatively simple, well under 100000 PCU in total. Two bases and two ships, singe grid each. No pistons, nor rotors. None of the connectors were locked.

One ship rammed one of the bases and shooting it to pieces with its turrets from close distance. So there have been a lot of block destruction and grid disconnect events going on at the same time. I suspect this triggered the crash.

Only two Torch plugins were in use. None of them seems to be involved, since they have not changed any grids for sure when the crash happened. Dedicated server has been running for 20 minutes only.

It happened once so far. I will report below if it happens again or we can reproduce reliably.

Best Answer
photo

Analysis of the traceback, because somebody has to do it. Based on your current SE client code:

FirstElement actually creates an iterator and uses MoveNext once:

namespace System.Collections.Generic
{
  public static class HashSetExtensions
  {
    public static T FirstElement<T>(this HashSet<T> hashset)
    {
      using (HashSet<T>.Enumerator enumerator = hashset.GetEnumerator())
        return enumerator.MoveNext() ? enumerator.Current : throw new InvalidOperationException();
    }
  }
}

That's where the iteration happens.

FirstElement is called from MyDisconnectHelper.Disconnect:

this.AddNeighbours(this.m_disconnectHelper.FirstElement<MySlimBlock>(), out group.IsValid, testBlock);

The collection iterated is:

private HashSet<MySlimBlock> m_disconnectHelper = new HashSet<MySlimBlock>();

This is the one being modified in the meantime by another thread at the same time.

There is a race condition here.

Look for all usages of m_disconnectHelper, the code changing it must be somewhere. Since m_disconnectHelper is private, all of them must be inside the MyDisconnectHelper class. Look into where you call those methods from, which thread.

The m_disconnectHelper HashSet either needs synchronization or some other way to guarantee they are not running at the same time. Just add synchronization if in doubt, I suggest using a read-write lock for performance, which is based on a spinlock. (Since accesses to the HashSet are very brief, we don't expect long time lock holding, thus not much lock contention.)

This crash happens only very rarely, since the FirstElement method uses the iterator only for a very short time. It is unlikely that the other code changing the HashSet is running at the some time. But if there is a big collision with lots of grid splits, then it can definitely happen with some finite probability.

Just my 2 cents. Please don't wait until I fix it with a client plugin.

Comments (5)

photo
1

Still happens, and pretty often on all servers!


System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.Collections.Generic.HashSet`1.Enumerator.MoveNext()
at Sandbox.Game.Entities.MyParallelEntityUpdateOrchestrator.UpdateBeforeSimulation()
at Sandbox.Game.Entities.MyParallelEntityUpdateOrchestrator.DispatchBeforeSimulation()
at Sandbox.Game.Entities.MyEntities.UpdateBeforeSimulation()
at Sandbox.Game.World.MySector.UpdateBeforeSimulation()
at Patched_Sandbox.Game.World.MySessionUpdateComponents_0(Object )
at Sandbox.Game.World.MySession.Update(MyTimeSpan updateTime)

photo
1

Hello Viktor!

I appreciate this thread is quite old now. Does this issue still persist or has it been fixed? We have had a number of updates since this post and are now on version 197.075 so hope this is no longer an issue. Please let me know if I can close this case or if you are still having problems.

Kind Regards

Laura, QA Department

photo
1

I haven't seen this happening for a while, at least not until November 2020 when I started to work on SE servers again. We had a lot of grid splits on our racing servers and it never crashed like the this anymore. Looks good so far during the past 3 months, at least for me.

Certainly, it does not guarantee this problem does not happen to others.

photo
1

Please see my exception and code analysis below.

photo
1

Analysis of the traceback, because somebody has to do it. Based on your current SE client code:

FirstElement actually creates an iterator and uses MoveNext once:

namespace System.Collections.Generic
{
  public static class HashSetExtensions
  {
    public static T FirstElement<T>(this HashSet<T> hashset)
    {
      using (HashSet<T>.Enumerator enumerator = hashset.GetEnumerator())
        return enumerator.MoveNext() ? enumerator.Current : throw new InvalidOperationException();
    }
  }
}

That's where the iteration happens.

FirstElement is called from MyDisconnectHelper.Disconnect:

this.AddNeighbours(this.m_disconnectHelper.FirstElement<MySlimBlock>(), out group.IsValid, testBlock);

The collection iterated is:

private HashSet<MySlimBlock> m_disconnectHelper = new HashSet<MySlimBlock>();

This is the one being modified in the meantime by another thread at the same time.

There is a race condition here.

Look for all usages of m_disconnectHelper, the code changing it must be somewhere. Since m_disconnectHelper is private, all of them must be inside the MyDisconnectHelper class. Look into where you call those methods from, which thread.

The m_disconnectHelper HashSet either needs synchronization or some other way to guarantee they are not running at the same time. Just add synchronization if in doubt, I suggest using a read-write lock for performance, which is based on a spinlock. (Since accesses to the HashSet are very brief, we don't expect long time lock holding, thus not much lock contention.)

This crash happens only very rarely, since the FirstElement method uses the iterator only for a very short time. It is unlikely that the other code changing the HashSet is running at the some time. But if there is a big collision with lots of grid splits, then it can definitely happen with some finite probability.

Just my 2 cents. Please don't wait until I fix it with a client plugin.