Thursday, December 12, 2013

Pattern matching extension methods

Sometimes you need to select a subset of items in a collection based on a pattern match of one of their properties. Think "all of the people who's name starts with D" or "all of the invoices with "CREDIT" in the description".

In Sql you'd use LIKE. In .NET, for simple patterns string has StartsWith, EndsWith, Contatins. For anything more complex you'd probably resort to regular expressions. Personally I've never been a huge fan of regular expressions, though they are very powerful. The syntax is just too arcane for me to keep in my head. I can pound away at a regex and get it right, but as soon as I look away I have absolutely no idea what it is doing. For this reason I find them difficult to maintain, extend and debug. That's just me though.

LIKE on the other hand is simple to understand just by looking at. It's not as expressive as regex nor as powerful, but for everyday usage goes a long way to getting the job done when you need pattern matching.

If you are working in .Net land and using Linq to Sql, you can easily use SqlMethods.Like. If you're working with Linq to objects however you either have to use the built in methods of string or result to regex. The extension class below provides a way to also use LIKE syntax for pattern matching when you are using Linq to Objects. Under the hood, it converts the LIKE pattern to a regex pattern and uses the Regex engine to do the matching. As such it supports both regex and LIKE match queries.

I've found it handy in a couple of spots and it's basic usage is like this:

var invoices = GetInvoices();
var credits = invoices.Like(i => i.Description, "%credit%");

The Like extension will return all of the invoices objects where the description contains the word credit, case insensitively.
using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Text;

namespace PatternMatching
{
    public static class PatternMatchExtensions
    {
        public static IEnumerable<string> Like(this IEnumerable<string> source, string pattern)
        {
            var regex = ConvertLikeToRegex(pattern);
            return source.Match(regex, RegexOptions.IgnoreCase);
        }

        public static IEnumerable<T> Like<T>(this IEnumerable<T> source, Func<T, string> selector, string pattern)
        {
            var regex = ConvertLikeToRegex(pattern);
            return source.Match<T>(selector, regex, RegexOptions.IgnoreCase);
        }

        public static IEnumerable<T> Match<T>(this IEnumerable<T> source, string regex, RegexOptions options = RegexOptions.None)
        {
            return source.Match<T>(t => t == null ? null : t.ToString(), regex, options);
        }

        public static IEnumerable<T> Match<T>(this IEnumerable<T> source, Func<T, string> selector, string regex, RegexOptions options = RegexOptions.None)
        {
            return source.Match<T>(selector, new Regex(regex, options));
        }

        public static IEnumerable<string> Match(this IEnumerable<string> source, Regex regex)
        {
            return source.Where(s => IsMatch(s, regex));
        }

        public static IEnumerable<T> Match<T>(this IEnumerable<T> source, Func<T, string> selector, Regex regex)
        {
            return source.Where<T>(t => IsMatch(selector(t), regex));
        }

        static bool IsMatch(string input, Regex regex)
        {
            if (input == null)
                return false;

            return regex.IsMatch(input);
        }

        static string ConvertLikeToRegex(string pattern)
        {
            StringBuilder builder = new StringBuilder();
            // Turn "off" all regular expression related syntax in the pattern string
            // and add regex beginning of and end of input tokens so '%abc' and 'abc%' work as expected
            builder.Append("^").Append(Regex.Escape(pattern)).Append("$");

            /* Replace the SQL LIKE wildcard metacharacters with the
            * equivalent regular expression metacharacters. */
            builder.Replace("%", ".*").Replace("_", ".");

            /* The previous call to Regex.Escape actually turned off
            * too many metacharacters, i.e. those which are recognized by
            * both the regular expression engine and the SQL LIKE
            * statement ([...] and [^...]). Those metacharacters have
            * to be manually unescaped here. */
            builder.Replace(@"\[", "[").Replace(@"\]", "]").Replace(@"\^", "^");

            // put SQL LIKE wildcard literals back
            builder.Replace("[.*]", "[%]").Replace("[.]", "[_]");

            return builder.ToString();
        }
    }
}