Basics: LINQ to Objects and Extension Methods
While it’s true that .NET 3.5 has been out for some time, many people still haven’t had a chance to learn some of the new .NET features as yet simply because they are still tasked with working on .NET 2.0 code and the opportunity hasn’t arisen.
This post comes as a direct result of a session I did a short while ago (hi! if you were in the room) where I was explaining some of the LINQ and Extension methods features in .NET 3.5 to a group of developers. I thought it might be worth sharing with you as well.
I think the best way to learn something is to try it yourself first, so here’s your mission: take the following strings, sort them alphabetically, and add a prefix to each string showing the count of the word “it” in each sentence using the “[nnn] “ format. Obviously it’s a very contrived example, but it’s hopefully simple enough to get your head around it and it shows how the same goal can be achieved in both .NET 2.0 and 3.5.
The strings are:
“One it”, “It should have one its”, “Three it it it its”, “NAMBLA has no its”.
Before you continue, why don’t you do just that. Write a small app that meets the requirements and then come back here and read on.
A Test Driven Approach
Now those who know me, know that I like to encourage people to write tests first, so it would be remiss of me not to do the same in a blog post. Here’s a simple test that will help us prove that what we want to implement actually works.
[TestMethod]
public void NonLinqMethodShouldWork()
{
var s1 = "One it";
var s2 = "It should have one its";
var s3 = "Three it it it its";
var s4 = "NAMBLA has no its";
var strings = new List<string> {s1, s2, s3, s4};
var result = NonLinqSorter.SortAndPrefix(strings);
Assert.AreEqual("[1] " + s2, result[0]);
Assert.AreEqual("[0] " + s4, result[1]);
Assert.AreEqual("[1] " + s1, result[2]);
Assert.AreEqual("[3] " + s3, result[3]);
Assert.AreEqual(4,result.Count());
}
As you can see it’s a straightforward test. We create the 4 strings, place them in a list and then call the SortAndPrefix method (using a static class). From the results list we then Assert() that the output is as we expected.
Note that I’ve used the var type identifier in most places. This is a .NET 3.5 mechanism that allows the compiler to infer the type of the variable based on what the right hand side of the assignment operator returns. Let’s avoid a discussion over the use of “var” and whether it’s the best way to do things or not and just say that my personal preference is to use var simply because I find it makes the code easier to read.
A .NET 2.0 Implementation
Now we have our test it’s time to actually write an implementation. Let’s do a .NET 2.0 implementation of it, i.e. without using LINQ or extension methods. You may want to compare it to yours (you did do one, didn’t you?).
Thinking about the problem and breaking it down we simply want to sort the strings and then for each string count the “it’ words. We then want to add a prefix to the front of each string and return the result as a new list.
Hopefully you’d end up writing code that looks something like the following:
public static class NonLinqSorter
{
public static List<string> SortAndPrefix(List<string> strings)
{
strings.Sort();
List<string> results = new List<string>();
foreach (var s in strings)
{
var words = s.Split(' ');
var wordCount = 0;
foreach (var word in words)
{
if (word.Equals("it", StringComparison.CurrentCultureIgnoreCase))
{
wordCount++;
}
}
results.Add(String.Format("[{0}] {1}", wordCount, s));
}
return results;
}
}
Let’s see what’s happening here. We start by doing a Sort() on our initial list, then we iterate over that list. For each string in the list we break it down into its words using the Split() method and iterate over the resulting words looking for “it” occurrences. Note that we do the comparison in a case-insensitive manner.
Finally we take our “it” count and the original string and combine them using a String.Format() call, placing the result in the results list.
A quick look at this code and you might say that it’s decent code. Admittedly it has a nested foreach loop and there’s an if statement stuck inside the innermost loop, but it’s overall it’s not too bad.
If we were to use a tool like SourceMonitor to get the metrics for the code block we’d see that it has a cyclomatic complexity of 4. Probably about as good as we can get while still maintaining readability.
A .NET 3.5 Implementation
What we now want to do is try and put this together using LINQ and extension methods and see how it would look then.
For those who don’t know, an extension method is a way of writing code that adds new functionality to existing classes without needing to actually get into those classes and change them.
Before we go there however, we first need to change the test to figure out what we want our code to behave like. Let’s take the original test and just change the two lines that create the string list and call the SortAndPrefix() method to one line like so:
var result = new List<string> { s1, s2, s3, s4 }.SortAndPrefix();
We’re going to use an extension method here. We’re taking the List<string> class and going to extend it by adding a method called SortAndPrefix() to it. It means we can save a line of code that is used solely to pass an object to the next line and instead combine the creation of the List<> and the method call into a single statement.
Let’s have a look at how we might code this using .NET 3.5, LINQ and Extension methods:
public static class LinqSorter
{
public static List<string> SortAndPrefix(this List<string> strings)
{
return (from s in strings
orderby s
select s.Prefixed()).ToList();
}
public static string Prefixed(this string s)
{
return String.Format("[{0}] {1}"
, s.Split().Count(w => w.Equals("it"
, StringComparison.CurrentCultureIgnoreCase))
, s);
}
}
SortAndPrefix()
The SortAndPrefix() method is an extension method. An extension method has two obvious characteristics: First it’s a static method on a static class, and secondly the first parameter has the keyword “this” before the type identifier. The “this” tells the compiler what class the extension method should be attached to.
In this case the SortAndPrefix() method is applied to objects of type List<string>, which is what we wanted to use in our unit test.
The next part of the method is the LINQ statement itself.
Let’s break it down a little to make it easier to understand:
1. "from s in strings”. We’re processing the List<> and for each string in the list assigning it to the variable “s'”. This is equivalent to the “foreach (var s in strings)” from .NET 2.0
2. “orderby s” very simply ensures that the order strings are returned in are in alphabetical order.
3. “select s.Prefixed()” This calls the Prefixed() method on each string – see below - to the strings selected by the LINQ query and returns the result of that method instead of just the unprocessed strings themselves.
4. “.ToList()”. Executes the query and returns the results as a list to the caller.
Now there’s a few things to be aware of here. LINQ queries are just query definitions, not the results themselves and a query definition does nothing until a “closure” is used on it. A closure is simply a method that wants to do something with the results the LINQ query will return. So when LINQ sees a closure it takes the query definition, processes it and passes the results to the closure method. You can think of this a little like ADO.NET where you can write a SQL statement in your code, but until you call a data reader to actually get the statement processed nothing happens.
In our case LINQ the query is executed when the ToList() method is called.
Obviously, the results of the query are returned to the caller as a List<>.
Prefixed()
The Prefixed() extension method (yes, we’re using a second extension method) simply takes a string and counts the words in it – once again, using a LINQ query.
This time we take the string and split it into its components and then counts the “it” words in it. Interestingly the way the counting is done is by supplying a lambda expression to the Count() method.
Lambdas aren’t anything complex, they are just methods defined in line instead of elsewhere in the code, and since they are inline, they don’t need a name (they are sometimes called anonymous methods).
The “=>” operator simply defines the parameter names for the method on the left and the method body on the right. So if we had to read aloud the statement in the Prefixed() method we might say something like the following:
Take a string “s” and Split() it to get an array of strings containing the “words” from s. Next get a count of “it” words by processing the words as follows: Each word is passed in as a parameter “w” to a method that checks if w is the word “it”. if the comparison returns true then the word is counted.
Wrap Up
Now technically there is no need to use two extension methods. We could have just put all the code from the Prefixed() method in the “select” clause of the SortAndPrefix() method and everything would have worked just as well. However, by splitting it out we improve the readability of the code and improve it’s maintainability.
If we look at the metrics and run SourceMonitor over our code base, we find that we now have a total complexity of 2, with each method having a complexity of 1, and there are a total of 2 statements. It’s certainly more concise and less nested than the foreach version of the code.
By the way, Visual Studio does metric analysis at the IL level and will produce different results, especially where LINQ and foreach operators are concerned as these constructs actually wrap a number of behind the scenes calls.
Conclusion
Hopefully that has given you an idea of how LINQ and Extension methods work (at least at a basic level) and shows you how you can improve your existing code to make it easier to read and more maintainable.
If you’ve got any questions, please feel free to ask.