Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Cosmos DB now offers support for full-text search. It enables efficient and effective text searches using advanced techniques like stemming, as well as evaluating the relevance of documents to a given search query. It can be used in combination with vector search (i.e. hybrid search) to improve the accuracy of responses in some AI scenarios. EF Core allows for modeling the database with full-text search enabled properties and using full-text search functions inside queries targeting Azure Cosmos DB.
Model configuration
A property can be configured inside OnModelCreating
to use full-text search by enabling it for the property and defining a full-text index:
public class Blog
{
...
public string Contents { get; set; }
}
public class BloggingContext
{
...
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Blog>(b =>
{
b.Property(x => x.Contents).EnableFullTextSearch();
b.HasIndex(x => x.Contents).IsFullTextIndex();
});
}
}
Note
Configuring the index is not mandatory, but it is recommended as it greatly improves performance of full-text search queries.
Full-text search operations are language specific, using American English (en-US
) by default. You can customize the language for individual properties as part of EnableFullTextSearch
call:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Blog>(b =>
{
b.Property(x => x.Contents).EnableFullTextSearch();
b.HasIndex(x => x.Contents).IsFullTextIndex();
b.Property(x => x.ContentsGerman).EnableFullTextSearch("de-DE");
b.HasIndex(x => x.ContentsGerman).IsFullTextIndex();
});
}
You can also set a default language for the container - unless overridden in the EnableFullTextSearch
method, all full-text properties inside the container will use that language.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Blog>(b =>
{
b.HasDefaultFullTextLanguage("de-DE");
b.Property(x => x.ContentsEnglish).EnableFullTextSearch("en-US");
b.HasIndex(x => x.ContentsEnglish).IsFullTextIndex();
b.Property(x => x.ContentsGerman).EnableFullTextSearch();
b.HasIndex(x => x.ContentsGerman).IsFullTextIndex();
b.Property(x => x.TagsGerman).EnableFullTextSearch();
b.HasIndex(x => x.TagsGerman).IsFullTextIndex();
});
}
Querying
As part of the full-text search feature, Azure Cosmos DB introduced several built-in functions which allow for efficient querying of content inside the full-text search enabled properties. These functions are: FullTextContains
, FullTextContainsAll
, FullTextContainsAny
, which look for specific keyword or keywords and FullTextScore
, which returns BM25 score based on provided keywords.
Note
FullTextScore
can only be used inside OrderBy
to rank the documents based on the score.
EF Core exposes these functions as part of EF.Functions
so they can be used in queries:
var cosmosBlogs = await context.Blogs.Where(x => EF.Functions.FullTextContainsAll(x.Contents, "database", "cosmos")).ToListAsync();
var keywords = new string[] { "AI", "agent", "breakthrough" };
var mostInteresting = await context.Blogs.OrderBy(x => EF.Functions.FullTextScore(x.Contents, keywords)).Take(5).ToListAsync();
Hybrid search
Full-text search can be used with vector search in the same query (i.e. hybrid search), by combining results of FullTextScore
and VectorDistance
functions. It can be done using the RRF
(Reciprocal Rank Fusion) function, which EF Core also provides inside EF.Functions
:
public class Blog
{
...
public float[] Vector { get; set; }
public string Contents { get; set; }
}
public class BloggingContext
{
...
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Blog>(b =>
{
b.Property(x => x.Contents).EnableFullTextSearch();
b.HasIndex(x => x.Contents).IsFullTextIndex();
b.Property(x => x.Vector).IsVectorProperty(DistanceFunction.Cosine, dimensions: 1536);
b.HasIndex(x => x.Vector).IsVectorIndex(VectorIndexType.Flat);
});
}
}
float[] myVector = /* generate vector data from text, image, etc. */
var hybrid = await context.Blogs.OrderBy(x => EF.Functions.Rrf(
EF.Functions.FullTextScore(x.Contents, "database"),
EF.Functions.VectorDistance(x.Vector, myVector)))
.Take(10)
.ToListAsync();
Tip
You can combine more than two scoring functions inside Rrf
call, as well as using only FullTextScore
, or only VectorDistance
.