Background: Over the next month, I'll be giving three talks about or at least including LINQ
in the context of C#
. I'd like to know which topics are worth giving a fair amount of attention to, based on what people may find hard to understand, or what they may have a mistaken impression of. I won't be specifically talking about LINQ
to SQL
or the Entity Framework except as examples of how queries can be executed remotely using expression trees (and usually IQueryable
).
So, what have you found hard about LINQ
? What have you seen in terms of misunderstandings? Examples might be any of the following, but please don't limit yourself!
C#
compiler treats query expressionsIQueryable
Delayed execution
I know the deferred execution concept should be beaten into me by now, but this example really helped me get a practical grasp of it:
static void Linq_Deferred_Execution_Demo()
{
List<String> items = new List<string> { "Bob", "Alice", "Trent" };
var results = from s in items select s;
Console.WriteLine("Before add:");
foreach (var result in results)
{
Console.WriteLine(result);
}
items.Add("Mallory");
//
// Enumerating the results again will return the new item, even
// though we did not re-assign the Linq expression to it!
//
Console.WriteLine("\nAfter add:");
foreach (var result in results)
{
Console.WriteLine(result);
}
}
The above code returns the following:
Before add:
Bob
Alice
Trent
After add:
Bob
Alice
Trent
Mallory
That there is more than just LINQ
to SQL
and the features are more than just a SQL
parser embedded in the language.
Big O notation [1]. LINQ makes it incredibly easy to write O(n^4) algorithms without realizing it, if you don't know what you're doing.
[1] http://en.wikipedia.org/wiki/Big_O_notationI think the fact that a Lambda
expression can resolve to both an expression tree and an anonymous delegate, so you can pass the same declarative lambda
expression to both IEnumerable<T>
extension methods and IQueryable<T>
extension methods.
Took me way too long to realize that many LINQ extension methods such as Single()
, SingleOrDefault()
etc have overloads that take lambdas.
You can do :
Single(x => x.id == id)
and don't need to say this - which some bad tutorial got me in the habit of doing
Where(x => x.id == id).Single()
Count()
, among others. Do you know if there's any performance difference, in addition to the obvious bonus of code readability? - Justin Morgan
In LINQ to SQL I constantly see people not understanding the DataContext, how it can be used and how it should be used. Too many people don't see the DataContext for what it is, a Unit of Work object, not a persistant object.
I've seen plenty of times where people are trying to singleton a DataContext/ session it/ etc rather than making a new time for each operation.
And then there's disposing of the DataContext before the IQueryable has been evaluated but that's more of a prople with people not understanding IQueryable than the DataContext.
The other concept I see a lot of confusion with is Query Syntax vs Expression Syntax. I will use which ever is the easiest at that point, often sticking with Expression Syntax. A lot of people still don't realise that they will produce the same thing in the end, Query is compiled into Expression after all.
I think the misunderstood part of LINQ is that it is a language extension, not a database extension or construct.
LINQ
is so much more than LINQ to SQL
.
Now that most of us have used LINQ
on collections, we will NEVER go back!
LINQ
is the single most significant feature to .NET since Generics in 2.0, and Anonymous Types in 3.0.
And now that we have Lambda's, I can't wait for parallel programming!
I for one would sure like to know if I need to know what expression trees are, and why.
I'm fairly new to LINQ. Here's the things I stumbled over in my first attempt
Something that I didn't originally realise was that the LINQ syntax doesn't require IEnumerable<T>
or IQueryable<T>
to work, LINQ is just about pattern matching.
alt text http://bartdesmet.info/images_wlw/QIsIQueryabletheRightChoiceforMe_13478/image_thumb_3.png [1]
Here is the answer [2] (no, I didn't write that blog, Bart De Smet did, and he's one of the best bloggers on LINQ I've found).
[1] http://bartdesmet.info/images_wlw/QIsIQueryabletheRightChoiceforMe_13478/image_thumb_3.pngI still have trouble with the "let" command (which I've never found a use for) and SelectMany (which I've used, but I'm not sure I've done it right)
Understanding when the abstraction among Linq providers leaks. Some things work on objects but not SQL (e.g., .TakeWhile). Some methods can get translated into SQL (ToUpper) while others can't. Some techniques are more efficient in objects where others are more effective in SQL (different join methods).
Couple of things.
OK, due to demand, I've written up some of the Expression stuff. I'm not 100% happy with how blogger and LiveWriter have conspired to format it, but it'll do for now...
Anyway, here goes... I'd love any feedback, especially if there are areas where people want more information.
Here it is [1], like it or hate it...
[1] http://marcgravell.blogspot.com/2008/10/express-yourself.htmlSome of the error messages, especially from LINQ to SQL can be pretty confusing. grin
I've been bitten by the deferred execution a couple of times like everyone else. I think the most confusing thing for me has been the SQL Server Query Provider and what you can and can't do with it.
I'm still amazed by the fact you can't do a Sum() on a decimal/money column that's sometimes empty. Using DefaultIfEmpty() just won't work. :(
I think a great thing to cover in LINQ is how you can get yourself in trouble performance-wise. For instance, using LINQ's count as a loop condition is really, really not smart.
That IQueryable accept both, Expression<Func<T1, T2, T3, ...>>
and Func<T1, T2, T3, ...>
, without giving a hint about performance degradation in 2nd case.
Here is code example, that demonstrates what I mean:
[TestMethod]
public void QueryComplexityTest()
{
var users = _dataContext.Users;
Func<User, bool> funcSelector = q => q.UserName.StartsWith("Test");
Expression<Func<User, bool>> expressionSelector = q => q.UserName.StartsWith("Test");
// Returns IEnumerable, and do filtering of data on client-side
IQueryable<User> func = users.Where(funcSelector).AsQueryable();
// Returns IQuerible and do filtering of data on server side
// SELECT ... FROM [dbo].[User] AS [t0] WHERE [t0].[user_name] LIKE @p0
IQueryable<User> exp = users.Where(expressionSelector);
}
I don't know if it qualifies as misunderstood - but for me, simply unknown.
I was pleased to learn about DataLoadOptions and how I can control which tables are joined when I make a particular query.
See here for more info: MSDN: DataLoadOptions [1]
[1] http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.aspxI would say the most misunderstood (or should that be non-understood?) aspect of LINQ is IQueryable and custom LINQ providers.
I have been using LINQ for a while now and am completely comfortable in the IEnumerable world, and can solve most problems with LINQ.
But when I started to look at and read about IQueryable, and Expressions and custom linq providers it made my head spin. Take a look at how LINQ to SQL works if you want to see some pretty complex logic.
I look forward to understanding that aspect of LINQ...
As most people said, i think the most misunderstood part is assuming LINQ is a just a replacement for T-SQL. My manager who considers himself as a TSQL guru would not let us use LINQ in our project and even hates MS for releasing such a thing!!!
What does var represent when a query is executed?
Is it iQueryable
, iSingleResult
, iMultipleResult
, or does it change based on the the implementation. There's some speculation about using (what appears to be) dynamic-typing vs the standard static-typing in C#.
How easy it is to nest a loop is something I don't think everyone understands.
For example:
from outerloopitem in outerloopitems
from innerloopitem in outerloopitem.childitems
select outerloopitem, innerloopitem
group by
still makes my head spin.
Any confusion about deferred execution [1] should be able to be resolved by stepping through some simple LINQ-based code and playing around in the watch window.
[1] http://blogs.msdn.com/charlie/archive/2007/12/09/deferred-execution.aspxThe fact that you can't chain IQueryable
because they are method calls (while still nothing else but SQL translateable!) and that it is almost impossible to work around it is mindboggling and creates a huge violation of DRY. I need my IQueryable
's for ad-hoc in which I don't have compiled queries (I only have compiled queries for the heavy scenarios), but in compiled queries I can't use them and instead need to write regular query syntax again. Now I'm doing the same subqueries in 2 places, need to remember to update both if something changes, and so forth. A nightmare.
I think the #1 misconception about LINQ to SQL is that you STILL HAVE TO KNOW SQL in order to make effective use of it.
Another misunderstood thing about Linq to Sql is that you still have to lower your database security to the point of absurdity in order to make it work.
A third point is that using Linq to Sql along with Dynamic classes (meaning the class definition is created at runtime) causes a tremendous amount of just-in-time compiling. Which can absolutely kill performance.
Lazy Loading.
As mentioned, lazy loading and deferred execution
How LINQ to Objects and LINQ to XML (IEnumerable) are different from LINQ to SQL(IQueryable)
HOW to build a Data Access Layer, Business Layer, and Presentation Layer with LINQ in all layers....and a good example.
As most people said, i think the most misunderstood part is assuming LINQ is a just a replacement for T-SQL. My manager who considers himself as a TSQL guru would not let us use LINQ in our project and even hates MS for releasing such a thing!!!
Transactions (without using TransactionScope)
I think you should give more attention to the most commonly used features of LINQ in detail - Lambda expressions and Anonymous types, rather than wasting time on "hard to understand" stuff that is rarely used in real world programs.
Which is faster, inline Linq-to-Sql or Linq-to-Sql using Tsql Sprocs
... and are there cases where it's better to use server-side (Sproc) or client-side (inline Linq) queries.
Comprehension syntax 'magic'. How does comprehension syntax gets translated into method calls and what method calls are chosen.
How does, for example:
from a in b
from c in d
where a > c
select new { a, c }
gets translated into method calls.
For LINQ2SQL : Getting your head around some of the generated SQL and writing LINQ queries that translate to good (fast) SQL. This is part of the larger issue of knowing how to balance the declarative nature of LINQ queries with the realism that they need to execute fast in a known environment (SQL Server).
You can get a completely different SQL generated query by changing a tiny tiny thing in the LINQ code. Can be especially dangerous if you are creating an expression tree based on conditional statements (i.e. adding optional filtering criteria).
I find it a bit disappointing that the query expression syntax only supports a subset of the LINQ functionality, so you cannot avoid chaining extension methods every now and then. E.g. the Distinct
method cannot be called using the query expression syntax. To use the Distinct
method you need to call the extension method. On the other hand the query expression syntax is very handy in many cases, so you don't want to skip that either.
A talk on LINQ could include some practical guidelines for when to prefer one syntax over the other and how to mix them.
This is of course not 'the most hardest' but just something to add to the list :
ThenBy() extension method
Without looking at its implementation I'm initially puzzled as to how it works. Everyone understands just fine how comma separated sort fields work in SQL - but on face value I'm skeptical that ThenBy is going to do what I really want it to do. How can it 'know' what the previous sort field was - it seems like it ought to.
I'm off to research it now...
Suppose that we have a table with 3 fields; A, B & C (They are integers and table name is "Table1").
I show it like this:
[A, B, C]
Now we want to get some result such as this:
[X = A, Y = B + C]
And we have such a class:
public class Temp
{
public Temp(int x, int y)
{
this.X = x;
this.Y = y;
}
public int X { get; private set; }
public int Y { get; private set; }
}
Then we use it like this:
using (MyDataContext db = new MyDataContext())
{
var result = db.Table1.Select(row =>
new Temp(row.A, row.B + row.C)).ToList();
}
The generated SQL query is:
SELECT [t0].[A] AS [x], [t0].[B] + [t0].[C] AS [y]
FROM [Table1] AS [t0]
It translates the .ctor of the Temp. It knows that I want "row.B + row.C" (even more...) to put on the "y" paramter of my class constructor!
These translations are very intrested to me. I like that and I think writing such translators (LINQ to Something) is a little hard!
Of course! It's a bad news: the LINQ to Entities (4.0) does not support constructors with parameters. (Why not?)
I find "Creating an Expression Tree" to be tough. There are many things that bug me w.r.t what you can to with LINQ, LINQ to SQL and ADO.Net altogether.
Explain why Linq does not handle left outer join as simple as in sql syntax. See this articles: Implementing a Left Join with LINQ [1], How to: Perform Left Outer Joins (C# Programming Guide) [2] I got so disappointed when I came across this obstacle that all my respect for the language vanished and I decedid that it was just something that quickly would fade away. No serious person would want to work with a syntax that lacks these battlefield proven primitives. If you could explain why these sort of set operation are not supported. I would become a better and more openminded person.
[1] http://www.developer.com/db/article.php/3739391/Implementing-a-Left-Join-with-LINQ.htmI have found hard to find clear information about Anonymous types specially in regard of performances in web application. Also I would suggest better and practical Lamda expressions examples and "How to" section in quering and performance related topics.
Hope my brief list can help!
The fact that you can't chain IQueryable because they are method calls (while still nothing else but SQL translateable!) and that it is almost impossible to work around it is mindboggling and creates a huge violation of DRY. I need my IQueryable's for ad-hoc in which I don't have compiled queries (I only have compiled queries for the heavy scenarios), but in compiled queries I can't use them and instead need to write regular query syntax again. Now I'm doing the same subqueries in 2 places, need to remember to update both if something changes, and so forth. A nightmare.
Something i bet almost on one knows: you can use inline ifs in a linq query. Something like this:
var result = from foo in bars where (
((foo.baz != null) ? foo.baz : false) &&
foo.blah == "this")
select foo;
I would suppose you can insert lambdas as well although i haven't tried.
(foo.baz != null) ? foo.baz : false
is equivalent to (foo.baz != null) && foo.baz
. I think this can be applied to any ternary expression that could be passed as a where
condition. So that's not all that surprising, IMHO - Justin Morgan