Thursday, November 25, 2010

HashSet VS SortedSet

Today I come across a good article explaining the difference between HashSet VS SortedSet collections in .net

Below is the link of original article
http://msdn.microsoft.com/en-us/vcsharp/ee906600.aspx

The .NET 4.0 library includes an additional collection class: The SortedSet.
SortedSet will typically be faster than HashSet when the majority of your operations require enumerating the set in one particular order. If, instead, most of the operations are searching, you’ll find better performance using the HashSet. The frequency of insert operations also has an effect on which collection would be better. The more frequently insert operations occur, the more likely HashSet will be faster.

Monday, November 8, 2010

LINQ – In-Memory Collections

LINQ – In-Memory Collections

In this article we will cover only the querying of in-memory collections.

This article has been designed to give you a core understanding of LINQ that we will rely heavily on in subsequent parts of this series.

Before diving into the code it is essential to define what LINQ actually is. LINQ is not C# 3.0, and vice versa. LINQ relies heavily on the new language enhancements introduced in C# 3.0; however, LINQ essentially is the composition of many standard query operators that allow you to work with data in a more intuitive way regardless of the data source.

The benefits of using LINQ are significant – queries are a first class citizen within the C# language, benefit from compile time checking of queries, and the ability to debug (step through) queries. We can expect the next Visual Studio IDE to take full advantage of these benefits – certainly the March 2007 CTP of Visual Studio Orcas does!

In-Memory Collections

The best way to teach new technologies is to just to show you an example and then explain what the heck is going on! – That will be my approach throughout this series; hopefully it is a wise decision.

For our first example we will compose a query to retrieve all the items in a generic List collection (Fig. 1).

Figure 1: Selecting all the items in a generic List collection

private static List<string> people = new List<string>() 
{ 
  "Granville", "John", "Rachel", "Betty", 
  "Chandler", "Ross", "Monica" 
};
 
public static void Example1() 
{
  IEnumerable<string> query = from p in people select p;
  foreach (string person in query) 
  {
    Console.WriteLine(person);
  }
}

The code example given in Fig. 1 is very basic and its functionality could have been replicated easier by simply enumerating through the items in the List via a foreach loop.

In Fig.1 we compose a query that will return each of the items in the people List collection by aliasing the people collection with a variable p and then selecting p (p is of type string remember as the people List is a collection of immutable string objects).

You may notice that query is of type IEnumerable - this is because we know that query will hold an enumeration of type string. When we foreach through the query the GetEnumerator of query is invoked.

At this time it is beneficial to look at exactly what the compiler generated code looks like (Fig. 2).

Figure 2: Compiler generated code for Fig. 1

public static void Example1()
{
  IEnumerable<string> query = people.Select<string, string>(delegate (string p) 
  {
    return p;
  });
  foreach (string person in query)
  {
    Console.WriteLine(person);
  }
}

Fig. 2 reveals that our query has actually been converted by the compiler to use an extension method (in this case just the Select extension method is used) taking a delegate as its argument.

You will find that queries and lambda expressions are simply a facade that we deal with in order to make our lives easier – under the covers the compiler is generating the appropriate code using delegates. Be aware of this internal compiler behavior!

Also be aware that a cached anonymous delegate method is generated at compile time as well (Fig. 3) – we will discuss this particular feature in future articles.

Figure 3: Compiler generated cached anonymous delegate method

[CompilerGenerated]
private static Func<string, string> <>9__CachedAnonymousMethodDelegate1;

We will now take a look at a more complex query of the same collection which retrieves a sequence of all strings in the List whose length is greater than 5(Fig. 4).

Figure 4: A more complex query

public static void Example2() 
{
  IEnumerable<string> query = from p in people where p.Length > 5 
  orderby p select p;
 
  foreach (string person in query) 
  {
    Console.WriteLine(person);
  }
}

The example in Fig. 4 relies on the use of two other standard query operators – Where and orderby to achieve the desired results.

If we examine the code generated by the compiler for the Example2 method you will see that shown in Fig. 5 – notice as well that we now have another two cached anonymous delegate methods (Fig. 6) – each of which having the type signature of their corresponding delegates (Where delegate and orderby delegate).

Figure 5: Compiler generated code for Fig. 4

public static void Example2()
{
  IEnumerable<string> query = people.Where<string>(delegate (string p) 
  {
    return (p.Length > 5);
  }).OrderBy<string, string>(delegate (string p) 
  {
    return p;
  });
  foreach (string person in query)
  {
    Console.WriteLine(person);
  }
}

Figure 6: Cached anonymous delegate methods for their respective Where and orderby delegates defined in Fig. 5

[CompilerGenerated]
private static Func<string, bool> <>9__CachedAnonymousMethodDelegate4;
[CompilerGenerated]
private static Func<string, string> <>9__CachedAnonymousMethodDelegate5;

The type signature of the Where delegate (Fig. 5) is Funcdelegate takes a string argument and returns a bool depending on whether the string was greater than 5 characters in length. Similarly the orderby delegate (Fig. 5) takes a string argument and returns a string.

Linq query operators :Part 3

AsEnumerable Operator

I found the AsEnumerable operator to be really important in understanding where the query gets executed, meaning is it going to get converted to SQL and the query would be performed on SQL server or LINQ to objects would be used and query would be performed in memory. The ideal use I have found for AsEnumerable would be when I know that a certain functionality is not available in SQL server, I can perform part of the query using LINQ to SQL (Iqueryable) and the rest executed as LINQ to objects (IEnumerable<T>). Basically, AsEnumerable is a hint to perform this part of the execution using LINQ to objects. This is how the prototype looks:

public static IEnumerable AsEnumerable(

this IEnumerable source);

The prototype operates on the source of IEnumerable<T> and also returns an IEnumerable<T>. This is because standard query operators operate on IEnumerable<T>, whereas LINQ to SQL operates on IQueryable<T>, which also happens to implement IEnumerable<T>. So when you execute an operator like on an IQueryable <T> (domain objects), it uses a LINQ to SQL implementation for the where clause. As a result, the query gets executed on SQL Server. But what if we knew in advance that a certain operator would fail on SQL Server since SQL Server has no implementation for it. It's good to use the AsEnumerable operator to tell the query engine to perform this part of the query in memory using LINQ to objects. Let's see an example:

public static void AsEnumerableExample()

{

NorthwindDataContext db = new NorthwindDataContext();

var firstproduct = (from product in db.Products

where product.Category.CategoryName == "Beverages"

select product

).ElementAt(0);

Console.WriteLine(firstproduct.ProductName);

}

When you run this query, it would throw an exception saying that elementat is not supported because SQL Server does not know how to execute elementAt. In this case, when I add as enumerable, the query would execute fine as follows:

public static void AsEnumerableExample()

{

NorthwindDataContext db = new NorthwindDataContext();

var firstproduct = (from product in db.Products

where product.Category.CategoryName == "Beverages"

select product

).AsEnumerable().ElementAt(0);

Console.WriteLine(firstproduct.ProductName);

}

DefaultIfEmpty

The DefaultIfEmpty operator returns a default element if the input sequence is empty. If the input sequence is empty, the DefaultIfEmpty operator returns a sequence with a single element of default (T) which, for reference types, is null. Furthermore, the operator also allows you to specify the default operator that will be returned.

public static void DefaultIfEmptyExample()

{

string[] fruits = { "Apple", "pear", "grapes", "orange" };

string banana = fruits.Where(f => f.Equals("Banana")).First();

Console.WriteLine(banana);

}

The above example throws an exception because the first operator requires that sequence not be empty. Therefore if we were to use defaultifEmpty, this is how it would look:

public static void DefaultIfEmptyExample1()

{

string[] fruits = { "Apple", "pear", "grapes", "orange" };

string banana =

fruits.Where(f => f.Equals("Banana")).DefaultIfEmpty("Not Found").First();

Console.WriteLine(banana);

}

Another interesting use of DefaultIfEmpty is to perform a left outer join using GroupJoin. Here is an example that illustrates that:

public class Category

{

public string CategoryName { get; set; }

}

public class Product

{

public string ProductName { get; set; }

public string CategoryName { get; set; }

}

public static void LeftOuterJoin()

{

Category[] categories = {

new Category{CategoryName="Beverages"},

new Category{CategoryName="Condiments"},

new Category{CategoryName="Dairy Products"},

new Category{CategoryName="Grains/Cereals"}

};

Product[] products = {

new Product{ProductName="Chai",

CategoryName="Beverages"},

new Product{ProductName="Northwoods Cranberry Sauce",

CategoryName="Condiments"},

new Product{ProductName="Butter",

CategoryName="Dairy Products"},

};

var prodcategory =

categories.GroupJoin(

products,

c => c.CategoryName,

p => p.CategoryName,

(category, prodcat) => prodcat.DefaultIfEmpty()

.Select(pc => new { category.CategoryName,

ProductName = pc != null ? pc.ProductName : "No" })

).SelectMany(s => s);

foreach (var product in prodcategory)

{

Console.WriteLine("Category :{0}, Product = {1}", product.CategoryName,

product.ProductName);

}

}

In the example above, I am using left outer join to list all categories, regardless of whether they have any products or not.

Linq Query operators :part 2

OrderBy Operators

The order operator allows collections to be ordered using orderby, orderbydescending, thenby and thenbydescending. Here is what the prototype looks like:

public static IOrderedEnumerable OrderBy(
this IEnumerable source,
Func keySelector)
where
K : IComparable
 

The operator takes an IEnumerable<T> and orders the collection based on the key selector that will return the key value for each element in the sequence. The only important point that is worth mentioning is that the sorting of orderby and orderbydescending is considered unstable, meaning that if two elements return the same key based on the key selector, the order of the output could be the maintained or could be different. You should never rely on the order of elements coming out of an orderby call other than the field specified in the orderby clause.

Another interesting point that I found was that orderby, orderbydescending and orderbyascending take as an input source IEnumerable, as you can see from the prototype. However, the return type is IORderedEnumerable. The problem with that is if you want to order by more than 1 column, you cannot pass IOrderedEnumerable since orderby takes IEnumerable. The solution to this is to use the ThenBy or ThenByDescending operators, which take IOrderedEnumerble. Let's proceed to an example:

 
public static void OrderByOperators()
    {
        Book[] books =
        {
            new Book{Title="ASP.NET 2.0 Website Programming: Problem -
                Design - Solution",
                Author="Marco Bellinaso"},
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                Author="MacDonald and Mario Szpuszta"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
        };
        IEnumerable orderedbooks =
              books.OrderBy(b => b.Author).ThenBy(b => b.Title);
 
    }
 

In the above example, I make use of both orderby and thenby. Since orderby only works on IEnumerable<T>, I make use of the thenby extension method as well. The second prototype for orderby takes in Icomparable and does not require the return type of the keyselector delegate to return a Collection that implements IComparable. Here is what the prototype looks like:

public static IOrderedEnumerable OrderBy(
this IEnumerable source,
Func keySelector,
IComparer comparer);
interface IComparer {
int Compare(T x,  T y);
}

The compare method will return a greater int if the second argument is greater than the first, equal to zero if both arguments are equal and less than zero if the first argument is less than the second. Let's see an example of that:

 
public class LengthComparer : IComparer
{
    public int Compare(string title1,  string title2)
    {
        if (title1.Length < class="code-keyword">return -1;
        else if (title1.Length > title2.Length) return 1;
        else return 0;
    }
}
 

I have created a custom comparer which compares the book title based on the title's length. When the length of the first title is less than the second title's length, I return -1. If it's greater, I return 1. Otherwise, I return 0 for being equal. This is how I make use of the comparer in my LINQ query.

public static void OrderByUsingCustomComparer()
    {
        Book[] books =
        {
            new Book{Title="ASP.NET 2.0 Website Programming: Problem -
                 Design - Solution",
                Author="Marco Bellinaso"},
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                Author="MacDonald and Mario Szpuszta"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
        };
        IEnumerable orderedbooks = books.OrderBy(b => b.Title,
            new LengthComparer());
    }