Fun with micro-optimization in C#


Let's speak about code optimization. It is a real fun making the code run faster. Just imagine: you take a cup of tea, sit down in front of a code block and speed it up.
How many things in this world can be more satisfying?

Recently I spent a good bit of time optimizing our customer's code. In that code we found a simple loop:

  xFilters.ToList().ForEach(n =>
                XmlNodeList nodesToDelete = xml.SelectNodes(BuildXPath(n.EntityName));
                foreach (XmlNode node in nodesToDelete)

My colleague told me something like:

Let's change .ForEach() to regular foreach. Not a big optimization but foreach should be faster.

Surprisingly the method began to work slower after our "optimization"!

So after a couple minutes of thinking I've built a little sample console application to test this behavior in depth.

Stop! Show me the code!

First of all lets create a simple test class to create objects that we will store in our collections.

    public class User
        public string Name { get; set; }
        public Tuple<int, string> Info { get; set; }
        public User(string name, int age, string hobby)
            this.Name = name;
            this.Info = new Tuple<int, string>(age, hobby);
       static double ArrayForeach(User[] array)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            foreach (var item in array)
                sum += item.Info.Item1;
            return watch.Elapsed.TotalMilliseconds;

        static double ArrayFor(User[] array)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            int arrLength = array.Length;
            for (int i = 0; i < arrLength; i++)
                sum += array[i].Info.Item1;
            return watch.Elapsed.TotalMilliseconds;

        static double ArrayForEach(User[] array)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            Array.ForEach(array, i => { sum += i.Info.Item1; });
            return watch.Elapsed.TotalMilliseconds;

        static double ListForeach(List<User> list)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            foreach (var item in list)
                sum += item.Info.Item1;
            return watch.Elapsed.TotalMilliseconds;

        static double ListForEach(List<User> list)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            list.ForEach(i => { sum += i.Info.Item1; });
            return watch.Elapsed.TotalMilliseconds;

        static double ListFor(List<User> list)
            int sum = 0;
            var watch = Stopwatch.StartNew();
            int length = list.Count;
            for (int i = 0; i < length; i++)
                sum += list[i].Info.Item1;
            return watch.Elapsed.TotalMilliseconds;

So we need to make effective time calculation. That's why we should call all methods in cycles, calculate average and print the result after.

 int cycleCount = 1000;
 double sum = 0.0;
 for (int i = 0; i < cycleCount; i++)
     sum += ArrayForeach(arr);
 arrayAverageForeachTime = sum / cycleCount;

And same code for all other methods.


Lets make two charts. One for array and one for list.



Array's for became faster only on really big amount of elements. But for list we can see very interesting result.
For looping is faster than other variants as expected. But it was really surprise for me that instance .ForEach is much faster than usual foreach loop!

Ok. But why?

Let's look at IL code of foreach loop in ListForEach method.

IL_0009:  callvirt   instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<class ConsoleApplication1.User>::GetEnumerator()  
  IL_000e:  stloc.3
    IL_000f:  br.s       IL_0027
    IL_0011:  ldloca.s   CS$5$0000
    IL_0013:  call       instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class ConsoleApplication1.User>::get_Current()
    IL_0018:  stloc.2
    IL_0019:  ldloc.0
    IL_001a:  ldloc.2
    IL_001b:  callvirt   instance class [mscorlib]System.Tuple`2<int32,string> ConsoleApplication1.User::get_Info()
    IL_0020:  callvirt   instance !0 class [mscorlib]System.Tuple`2<int32,string>::get_Item1()
    IL_0025:  add
    IL_0026:  stloc.0
    IL_0027:  ldloca.s   CS$5$0000
    IL_0029:  call       instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class ConsoleApplication1.User>::MoveNext()
    IL_002e:  brtrue.s   IL_0011
    IL_0030:  leave.s    IL_0040
  }  // end .try
    IL_0032:  ldloca.s   CS$5$0000
    IL_0034:  constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class ConsoleApplication1.User>
    IL_003a:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
    IL_003f:  endfinally
  }  // end handler

As we can see on each iteration we have 2 method calls: Enumerator<class ConsoleApplication1.User>::get_Current() and Enumerator<class ConsoleApplication1.User>::MoveNext()

Now lets look how list's ForEach work:

.method public hidebysig instance void  ForEach(class System.Action`1<!T> action) cil managed
  .custom instance void __DynamicallyInvokableAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       91 (0x5b)
  .maxstack  3
  .locals init (int32 V_0,
           int32 V_1)
  IL_0000:  ldarg.1
  IL_0001:  brtrue.s   IL_0009
  IL_0003:  ldc.i4.8
  IL_0004:  call       void System.ThrowHelper::ThrowArgumentNullException(valuetype System.ExceptionArgument)
  IL_0009:  ldarg.0
  IL_000a:  ldfld      int32 class System.Collections.Generic.List`1<!T>::_version
  IL_000f:  stloc.0
  IL_0010:  ldc.i4.0
  IL_0011:  stloc.1
  IL_0012:  br.s       IL_003a
  IL_0014:  ldloc.0
  IL_0015:  ldarg.0
  IL_0016:  ldfld      int32 class System.Collections.Generic.List`1<!T>::_version
  IL_001b:  beq.s      IL_0024
  IL_001d:  call       bool System.Runtime.Versioning.BinaryCompatibility::get_TargetsAtLeast_Desktop_V4_5()
  IL_0022:  brtrue.s   IL_0043
  IL_0024:  ldarg.1
  IL_0025:  ldarg.0
  IL_0026:  ldfld      !0[] class System.Collections.Generic.List`1<!T>::_items
  IL_002b:  ldloc.1
  IL_002c:  ldelem     !T
  IL_0031:  callvirt   instance void class System.Action`1<!T>::Invoke(!0)
  IL_0036:  ldloc.1
  IL_0037:  ldc.i4.1
  IL_0038:  add
  IL_0039:  stloc.1
  IL_003a:  ldloc.1
  IL_003b:  ldarg.0
  IL_003c:  ldfld      int32 class System.Collections.Generic.List`1<!T>::_size
  IL_0041:  blt.s      IL_0014
  IL_0043:  ldloc.0
  IL_0044:  ldarg.0
  IL_0045:  ldfld      int32 class System.Collections.Generic.List`1<!T>::_version
  IL_004a:  beq.s      IL_005a
  IL_004c:  call       bool System.Runtime.Versioning.BinaryCompatibility::get_TargetsAtLeast_Desktop_V4_5()
  IL_0051:  brfalse.s  IL_005a
  IL_0053:  ldc.i4.s   32
  IL_0055:  call       void System.ThrowHelper::ThrowInvalidOperationException(valuetype System.ExceptionResource)
  IL_005a:  ret
} // end of method List`1::ForEach

Also decompiled code of ForEach (used Jetbrains DotPeek 2016.2.2):

  public void ForEach(Action<T> action) {
            if( action == null) {

            int version = _version;

            for(int i = 0 ; i < _size; i++) {
                if (version != _version && BinaryCompatibility.TargetsAtLeast_Desktop_V4_5) {

            if (version != _version && BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)

As we can see, there is only one method call: instance void class System.Action1<!T>::Invoke(!0) or action(_items[i]); from decompiled code. It is just regular for loop with one call of our lambda method.


Micro-optimization is really good way to understand how your code and platform work. Performance is important, and you should always test your apps and looking for the ways to improve it. But keep in mind that in most cases code readability and maintainability is more important then 50ms speed improvements. Don’t sacrifice application maintainability for performance when you don’t actually need it. And if you see that your colleague doesn't use LINQ or foreach because they are a little bit slower, and writes a big piece of code instead of them, please stop them.