How do I sort list of numeric strings without parsing?

I have a list of strings with high values (greater than int 32), How do i sort them in ascending order without parsing?

List = {"4852154879","2652154879","9852154879","1952154879","0652154879"}

I tried parsing as below, but looking for an alternative and better approach without parsing

Sorted List = List.OrderBy(x => long.Parse(x.serialNumber)).ToList();
Jon Skeet
people
quotationmark

Firstly, I almost certainly wouldn't take the approach below. I'd either convert the input to a List<long>, or just use the code you've already got, at least until I'd absolutely proved that it wasn't good enough.

However, as this is quite an interesting problem, let's try to write a fast IComparer<T>. This relies on:

  • No negative values
  • Possible zero padding, but no other padding
  • Assume that all values are valid

When comparing two values, if the values are the same length, we can just use an ordinal string comparison. Otherwise:

  • Find the number of leading zeroes in each string, to work out the "logical" length of each
  • If the resulting logical lengths are different, whichever is longer is bigger
  • Otherwise, compare the strings from just after the leading zeroes

This manages to perform each comparison with no object allocations.

Something like (completely untested):

public sealed class NumericComparer : IComparer<string>
{
    public static readonly IComparer<string> Instance { get; } = new NumericComparer();

    private NumericComparer() {}

    public int Compare(string x, string y)
    {
        if (x.Length == y.Length)
        {
            return string.Compare(x, y, StringComparison.Ordinal);
        }
        int xIndex = FindFirstNonZeroIndex(x);
        int yIndex = FindFirstNonZeroIndex(y);
        int lengthComparison = (x.Length - xIndex).CompareTo(y.Length - yIndex);
        if (lengthComparison != 0)
        {
            return lengthComparison;
        }
        return string.Compare(x, xIndex, y, yIndex, x.Length, StringComparison.Ordinal);
    }

    private static int FindFirstNonZeroIndex(string text)
    {
        for (int i = 0; i < text.Length; i++)
        {
            if (text[i] != '0')
            {
                return i;
            }
        }
        // All zeroes? Return text.Length - 1, so that we treat this as
        // "0".
        return text.Length - 1;
    }
}

You can then sort a list in-place with:

list.Sort(NumericComparer.Instance);

Now I've just been benchmarking this... and it looks like it's roughly the same performance as parsing, as far as I can tell. Actually very slightly worse - but much better than the padding form.

Benchmarking code:

using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Program
{
    private readonly List<string> list;

    public Program()
    {
        list = Enumerable.Range(0, 100000)
            .Select(_ => GenerateValue())
            .ToList();
    }

    // Just to test the impact of copying...
    [Benchmark]
    public List<string> NoSorting()
    {
        var copy = new List<string>(list);
        return copy;
    }

    [Benchmark]
    public List<string> NoParsing()
    {
        var copy = new List<string>(list);
        copy.Sort(NumericComparer.Instance);
        return copy;
    }

    [Benchmark]
    public List<string> WithParsing() => list.OrderBy(x => long.Parse(x)).ToList();

    static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    [Benchmark]
    public List<string> WithPadding()
    {
        int maxLength = list.Max(y => y.Length);
        return list.OrderBy(x => x.PadLeft(maxLength, '0')).ToList();
    }        

    // Use the same seed on all tests
    static readonly Random random = new Random(1);
    static string GenerateValue()
    {
        // Up to 11 digits...
        long leading = random.Next(100000);
        long trailing = random.Next(1000000);
        long value = leading * 1000000 + trailing;
        // Pad to 9, 10 or 11 randomly
        int width = random.Next(3) + 9;
        return value.ToString().PadLeft(width, '0');
    }
}
// NumericComparer as per post

Results:

      Method |         Mean |        Error |      StdDev |
------------ |-------------:|-------------:|------------:|
   NoSorting |     473.3 us |     9.359 us |    25.62 us |
   NoParsing |  46,684.7 us |   932.466 us | 1,366.80 us |
 WithParsing |  43,149.8 us |   790.116 us |   700.42 us |
 WithPadding | 275,843.4 us | 3,083.376 us | 2,733.33 us |

Alternative idea, which is definitely simpler:

  • If the strings are the same length, compare ordinally
  • Otherwise, just parse both

(I haven't benchmarked that yet.)

people

See more on this question at Stackoverflow