LINQ to Entities, Cross Apply, and Left Outer Join

0
65

Introduction

The more I use it, the more I learn about the Entity Framework and LINQ to Entities. I started writing stored procedures long before I ever started using Object/Relational Mapping (O/RM) tools like the Entity Framework, so sometimes I play with getting my LINQ code to generate the exact query syntax I want. In this article, I share some of the patterns I’ve discovered.

The Pattern and the Problem

Many of you are familiar with the database structure presented here – it’s a fairly standard pattern that many developers have implemented in some form over the course of their careers. My implementation consists of a Person table, a set of logon names in a PersonLogonName table, and a User table. Users may or may not be Persons, and Persons may or may not be Users.

Figure 1 - Sample Schema

Although the User table and the PersonLogonName table both store a logon name, the User table is only concerned with the logon name currently used to access the system while the PersonLogonName table is more concerned with associating a history of logon names with a particular person. The User table is part of a code module that handles settings such as how many rows the user prefers to display on a certain grid, while the Person table actually associates people (who may or may not be users) with other system objects. Since the two tables are used for entirely different purposes, and since a logon name may exist in either table while not in the other, a foreign key does not exist. This lack of a foreign key will become important later.

The goal is to retrieve a list of active logon names for use during data entry. I decided to start with active Persons, which is easy:

SELECT PersonLogonName
FROM Person INNER JOIN PersonLogonName
 ON Person.PersonId = PersonLogonName.PersonId
WHERE Person.IsActive = 1

Creating this query with LINQ to Entities is also relatively easy. Here is one way to write this code after generating a data model based on the database schema (and pluralizing entity names):

var query = entities.People
 .Include(
 personEntity => personEntity.PeopleToLogonNames)
 .Where(personEntity => personEntity.IsActive);

Working the Problem

Although simple, the query above was returning the entire history of logon names when all I wanted was the latest, active, non-expired logon name. Retrieving one-out-of-many child objects is a Top per Group pattern, which is something I’ve implemented in the past with a SELECT MAX in a subquery. SQL Server 2005 introduced the CROSS APPLY operator, which is a great tool for implementing this pattern when coupled with an ordered SELECT TOP 1 subquery like the following:

SELECT PersonLogonName
FROM Person
CROSS APPLY (SELECT TOP (1) PersonLogonName
 FROM PersonLogonName
 WHERE Person.PersonId = PersonLogonName.PersonId
 AND PersonLogonName.ActiveDate <= SysDateTime()
 AND (PersonLogonName.ExpirationDate IS NULL
 OR PersonLogonName.ExpirationDate > SysDateTime())
 ORDER BY PersonLogonName.ActiveDate DESC) AS LatestLogonName
WHERE Person.IsActive = 1

I originally thought the LINQ to Entities version would be straight-forward and started with this simple projection technique:

var query = entities.People
 .Where(personEntity => personEntity.IsActive)
 .Select(personEntity => personEntity.PeopleToLogonNames
 .Where(personLogonNameEntity =>
 personLogonNameEntity.ActiveDate <= DateTime.Now
 && (personLogonNameEntity.ExpirationDate == null
 || personLogonNameEntity.ExpirationDate > DateTime.Now))
 .OrderByDescending(personLogonNameEntity => personLogonNameEntity.ActiveDate)
 .FirstOrDefault());

This resulted in the generation of an OUTER APPLY, which gave me some NULL logon names (not all Persons have logon names in my system). What I needed was a CROSS APPLY, and the only way I could get LINQ to Entities generate a CROSS APPLY operator was to add an additional WHERE clause, like this:

var query = entities.People
 .Where(personEntity => personEntity.IsActive)
 .Select(personEntity => personEntity.PeopleToLogonNames
 .Where(personLogonNameEntity =>
 personLogonNameEntity.ActiveDate <= DateTime.Now
 && (personLogonNameEntity.ExpirationDate == null
 || personLogonNameEntity.ExpirationDate > DateTime.Now))
 .OrderByDescending(personLogonNameEntity => personLogonNameEntity.ActiveDate)
 .FirstOrDefault())
 .Where(entity => entity.LogonName != null);

This may seem strange, but the Entity Framework is actually intelligent enough to know that the best way to eliminate NULLs from your results is to use a CROSS APPLY in lieu of an OUTER APPLY. EF would still render the WHERE clause in the generated SQL, but it is ignored by the query processor when the query is compiled.

Next I needed to include data from the User table, which presented its own problems. As previously mentioned, there is no foreign key between the User and PersonLogonName tables, which precludes the use of the Include method or any Navigation Properties the Entity Framework automatically generates. Also, the Entity Framework doesn’t support the T-SQL FULL OUTER JOIN operator (which is understandable, since not all database systems include support for this). So I decided to use a UNION operator, like this:

var query = entities.People
 .Where(personEntity => personEntity.IsActive)
 .Select(personEntity => personEntity.PeopleToLogonNames
 .Where(personLogonNameEntity =>
 personLogonNameEntity.ActiveDate <= DateTime.Now
 && (personLogonNameEntity.ExpirationDate == null
 || personLogonNameEntity.ExpirationDate > DateTime.Now))
 .OrderByDescending(personLogonNameEntity => personLogonNameEntity.ActiveDate)
 .FirstOrDefault())
 .Where(entity => entity.LogonName != null)
 .Select(entity => new { entity.LogonName })
 .Union(entities.Users
 .Select(userEntity => new { userEntity.LogonName }))
 .OrderBy(entity => entity.LogonName);

Notice the use of Select projections to limit the column lists in both result sets and ensure the lists are identical. Failing to do this will result in compilation errors. This worked well, but caused logon names marked as expired in the PersonLogonName table to show when they also existed in the User table. I needed a way to filter them out, and I figured a LEFT OUTER JOIN on the second SELECT of my UNION was the best way to accomplish this. So I was looking to generate SQL similar to the following:

SELECT PersonLogonName
FROM Person
CROSS APPLY (SELECT TOP (1) PersonLogonName
 FROM PersonLogonName
 WHERE Person.PersonId = PersonLogonName.PersonId
 AND PersonLogonName.ActiveDate <= SysDateTime()
 AND (PersonLogonName.ExpirationDate IS NULL
 OR PersonLogonName.ExpirationDate > SysDateTime())
 ORDER BY PersonLogonName.ActiveDate DESC) AS LatestLogonName
WHERE Person.IsActive = 1
UNION
SELECT [User].UserLogonName
FROM [User] LEFT OUTER JOIN PersonLogonName
 ON [User].UserLogonName = PersonLogonName.PersonLogonName
WHERE PersonLogonName.ExpirationDate IS NULL
 OR PersonLogonName.ExpirationDate > SysDateTime()
ORDER BY PersonLogonName

Since there were no Navigation Properties available, I had no choice but to use the Join method in LINQ to Entities. I originally tried the following:

var query = entities.People
 .Where(personEntity => personEntity.IsActive)
 .Select(personEntity => personEntity.PeopleToLogonNames
 .Where(personLogonNameEntity =>
 personLogonNameEntity.ActiveDate <= DateTime.Now
 && (personLogonNameEntity.ExpirationDate == null
 || personLogonNameEntity.ExpirationDate > DateTime.Now))
 .OrderByDescending(personLogonNameEntity => personLogonNameEntity.ActiveDate)
 .FirstOrDefault())
 .Where(entity => entity.LogonName != null)
 .Select(entity => new { entity.LogonName })
 .Union(entities.Users
 .Join(entities.PersonLogonNames,
 outerEntity => outerEntity.LogonName,
 innerEntity => innerEntity.LogonName,
 (userEntity, innerEntity) => new
 {
 LogonName = userEntity.LogonName,
 ExpirationDate = innerEntity.ExpirationDate
 })
 .Where(entity => entity.ExpirationDate == null
 || entity.ExpirationDate > DateTime.Now)
 .Select(entity => new { entity.LogonName }))
 .OrderBy(entity => entity.LogonName);

Notice the addition of the Join method. The table being joined and both the operands to be used in the generated ON clause are represented in the first three parameters. The fourth parameter specifies the structure of the result set, so this method also performs a projection. Looks perfect, but the SQL generated would always be an INNER JOIN, not a LEFT OUTER JOIN.

The Final Solution

I tried modifying the above code with many different combinations of DefaultIfEmpty method calls and NULL comparisons, but could not persuade the Entity Framework to generate a LEFT OUTER JOIN. I researched this behavior, found others experiencing the same problem, and saw that the GroupJoin method would produce the LEFT OUTER JOIN that I desired. Unfortunately, the purpose of the GroupJoin method is to create result sets with multiple child records in the form of a parent-to-children tree, so I needed to flatten this resulting set with a call to the SelectMany method. Here is the final code:

var query = entities.People
 .Where(personEntity => personEntity.IsActive)
 .Select(personEntity => personEntity.PeopleToLogonNames
 .Where(personLogonNameEntity =>
 personLogonNameEntity.ActiveDate <= DateTime.Now
 && (personLogonNameEntity.ExpirationDate == null
 || personLogonNameEntity.ExpirationDate > DateTime.Now))
 .OrderByDescending(personLogonNameEntity => personLogonNameEntity.ActiveDate)
 .FirstOrDefault())
 .Where(entity => entity.LogonName != null)
 .Select(entity => new { entity.LogonName })
 .Union(entities.Users
 .GroupJoin(entities.PersonLogonNames,
 outerEntity => outerEntity.LogonName,
 innerEntity => innerEntity.LogonName,
 (innerEntity, outerEntities) => new
 {
 LogonName = innerEntity.LogonName,
 PersonLogonNames = outerEntities
 })
 .SelectMany(personLogonNameEntities =>
 personLogonNameEntities.PersonLogonNames
 .DefaultIfEmpty(),
 (combinedUserEntity, personLogonNameEntity) => new
 {
 combinedUserEntity.LogonName,
 personLogonNameEntity.ExpirationDate
 })
 .Where(combinedUserEntity =>
 combinedUserEntity.ExpirationDate == null
 || combinedUserEntity.ExpirationDate > DateTime.Now)
 .Select(combinedUserEntity => new { combinedUserEntity.LogonName }))
 .OrderBy(entity => entity.LogonName);

I admit, that looks like a lot of complicated code for something so simple, but the SQL it generates is exactly what I originally intended and executes very quickly. Here is the generated SQL:

SELECT
 [Distinct1].[C1] AS [C1],
 [Distinct1].[C2] AS [C2]
FROM (SELECT DISTINCT
 [UnionAll1].[C1] AS [C1],
 [UnionAll1].[PersonLogonName] AS [C2]
 FROM (SELECT
 1 AS [C1],
 [Limit1].[PersonLogonName] AS [PersonLogonName]
 FROM (SELECT
 [Extent1].[PersonId] AS [PersonId]
 FROM [dbo].[Person] AS [Extent1]
 WHERE [Extent1].[IsActive] = 1 ) AS [Filter1]
 CROSS APPLY (SELECT TOP (1)
 [Project1].[PersonLogonName] AS [PersonLogonName]
 FROM (SELECT
 [Extent2].[PersonLogonName] AS [PersonLogonName],
 [Extent2].[ActiveDate] AS [ActiveDate]
 FROM [dbo].[PersonLogonName] AS [Extent2]
 WHERE ([Filter1].[PersonId] = [Extent2].[PersonId])
 AND ([Extent2].[ActiveDate] <= SysDateTime())
 AND (([Extent2].[ExpirationDate] IS NULL)
 OR ([Extent2].[ExpirationDate] > SysDateTime()))
 ) AS [Project1]
 ORDER BY [Project1].[ActiveDate] DESC) AS [Limit1]
 WHERE [Limit1].[PersonLogonName] IS NOT NULL
 UNION ALL
 SELECT
 1 AS [C1],
 [Extent3].[UserLogonName] AS [UserLogonName]
 FROM [dbo].[User] AS [Extent3]
 LEFT OUTER JOIN [dbo].[PersonLogonName] AS [Extent4]
  ON [Extent3].[UserLogonName] = [Extent4].[PersonLogonName]
 WHERE ([Extent4].[ExpirationDate] IS NULL)
  OR ([Extent4].[ExpirationDate] > SysDateTime())) AS [UnionAll1]
 ) AS [Distinct1]
 ORDER BY [Distinct1].[C2] ASC

Points of Interest

Using a meticulous troubleshooting process that broke the problem down into smaller pieces and concentrated on each piece individually, I was able to finally solve the larger problem with unexpected code. It was an interesting trip with a satisfying result.

Entity Framework version 6.1.3 was used for this article.

History

  • 12 May, 2017: First version

LEAVE A REPLY