Find unreferenced entities with Entity Framework Core

One of my projects stores meta data of files in the database. The information is held in a FileEntity. Any other entity can then have a relation to this FileEntity. The idea is, that files are uploaded first and are referenced afterwards. For example when a file for a photo album is uploaded, it will first create a FileEntity and then a PhotoAlbumEntity that contains a reference to the FileEntity. The downside is that unreferenced files are using up space. So I had to find a way to find unreferenced file entities with Entity Framework Core.

It's important to know that I didn't want to do anything manually. So using inverse navigation properties to find all the FileEntities was not an option. Also I didn't want to adjust code if a new entity with a relation to the FileEntity was added.

Eventually I came up with two solutions, both have pros and cons.

Using reflection

I have to admit that I'm not very familiar with reflection and this is probably the reason it gave me headaches to do this. But in the end I was able to get a working solution and I'm also a bit proud of it tbh.

public async Task DeleteAsync(DbContext dbContext, CancellationToken cancellationToken) { var query = dbContext.Files.AsQueryable(); foreach (var entityType in dbContext.Model.GetEntityTypes()) { foreach (var navigation in entityType.GetNavigations()) { if (navigation.ClrType == typeof(FileEntity)) { query = query.Where( GetNotAnyEqualExpression( GetDbSetQueryable(dbContext as DbContext, entityType.ClrType)!.Expression, // Use the expression of the db-set query entityType.ClrType, navigation.ForeignKey.Properties.Single().Name) ); } } } var fileEntities = await query.ToListAsync(cancellationToken); dbContext.RemoveRange(fileEntities); dbContext.SaveChangeAsync(cancellationToken); // TODO: Delete files from file-storage or similar... } private IQueryable GetDbSetQueryable(DbContext dbContext, Type type) { var setMethod = typeof(DbContext) .GetMethods() .Single(p => p.Name == nameof(DbContext.Set) && p.ContainsGenericParameters && !p.GetParameters().Any()) .MakeGenericMethod(type); var queryable = setMethod.Invoke(dbContext, null) as IQueryable; return queryable!; } private Expression<Func<FileEntity, bool>> GetNotAnyEqualExpression(Expression dbSetExpression, Type type, string foreignKeyPropertyName) { var file = Expression.Parameter(typeof(FileEntity), "file"); var relationToFile = Expression.Parameter(type, "relationToFile"); var equal = Expression.Equal( Expression.Property(relationToFile, foreignKeyPropertyName), Expression.PropertyOrField(file, nameof(FileEntity.Id)) ); var equalLambda = Expression.Lambda(equal, relationToFile); var notAnyEqual = Expression.Not(Expression.Call(typeof(Enumerable), "Any", new[] { type }, dbSetExpression, equalLambda)); return Expression.Lambda<Func<FileEntity, bool>>(notAnyEqual, file); }

The DeleteAsync method starts by building a query on the file table and then loops over all entity type definitions of the DbContext. Then checks if there is a navigation property related to a FileEntity. If that's the case it will add an additional where condition to the current query. This is built with reflection. It will make a subquery to table that contains the foreign key with the condition that it must not find any entry. This will be done for every other foreign key of an entity that is related to the FileEntity. At the end, the query will fetch all the FileEntities that are not referenced by any other entity. We are then able to delete these entities our file storage.

The pros are that this solution works without having to change anything to our existing constellation. On the other hand it might get slow to query all the tables over time, if the schema and the amount of data grows.

Using change tracking

Another way to do this was to use the change tracking system.

public class MyDbContext : DbContext { public UniversityContext(DbContextOptions<MyDbContext> options) : base(options) { SavingChanges += UniversityContext_SavingChanges; } private void MyDbContext_SavingChanges(object sender, SavingChangesEventArgs e) { foreach (EntityEntry entry in dbContext.ChangeTracker.Entries()) { foreach (var navigation in entry.Navigations) { if (navigation.Metadata.ClrType == typeof(FileEntity)) { if (navigation.Metadata is RuntimeNavigation runtimeNavigation) { var foreignKeyProperty = runtimeNavigation.ForeignKey.Properties.First(); var originalValue = entry.OriginalValues[foreignKeyProperty] as Guid?; var currentValue = entry.CurrentValues[foreignKeyProperty] as Guid?; var modified = originalValue != currentValue; // Was file reference removed? if ((modified || entry.State == EntityState.Deleted) && originalValue != default) { var removedFileEntity = await dbContext.Set<FileEntity>().FindAsync([originalValue], cancellationToken: cancellationToken); if (removedFileEntity != null) { removedFileEntity.References--; } } // Was file reference added? if ((modified || entry.State == EntityState.Added) && currentValue != default) { var addedFileEntity = await dbContext.Set<FileEntity>().FindAsync([currentValue], cancellationToken: cancellationToken); if (addedFileEntity != null) { addedFileEntity.References++; } } } } } } } } public async Task DeleteAsync(DbContext dbContext, CancellationToken cancellationToken) { var fileEntities = await dbContext.Files.Where(file => file.References <= 0).ToListAsync(cancellationToken); dbContext.RemoveRange(fileEntities); dbContext.SaveChangeAsync(cancellationToken); // TODO: Delete files from file-storage or similar... }

We loop over all entities before they are saved and find out if either a reference to a FileEntity was added or removed. Then we are incrementing or decrementing the count of references on the FileEntity.

If the amount is zero then we know that the entity is not being referenced and is free to be deleted. This information can be used in a batchjob or in a background task to frequently delete unused files.

This approach is much better regarding scalability and not using as many resources as the first approach. But it requires another property to track the reference amount, is not atomic and is only reliable if the database is only manipulated by the application.