Edit

Share via


Full text search

Azure Cosmos DB now offers support for full-text search. It enables efficient and effective text searches using advanced techniques like stemming, as well as evaluating the relevance of documents to a given search query. It can be used in combination with vector search (i.e. hybrid search) to improve the accuracy of responses in some AI scenarios. EF Core allows for modeling the database with full-text search enabled properties and using full-text search functions inside queries targeting Azure Cosmos DB.

Model configuration

A property can be configured inside OnModelCreating to use full-text search by enabling it for the property and defining a full-text index:

public class Blog
{
    ...

    public string Contents { get; set; }
}

public class BloggingContext
{
    ...

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>(b =>
        {
            b.Property(x => x.Contents).EnableFullTextSearch();
            b.HasIndex(x => x.Contents).IsFullTextIndex();
        });
    }
}

Note

Configuring the index is not mandatory, but it is recommended as it greatly improves performance of full-text search queries.

Full-text search operations are language specific, using American English (en-US) by default. You can customize the language for individual properties as part of EnableFullTextSearch call:

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>(b =>
        {
            b.Property(x => x.Contents).EnableFullTextSearch();
            b.HasIndex(x => x.Contents).IsFullTextIndex();
            b.Property(x => x.ContentsGerman).EnableFullTextSearch("de-DE");
            b.HasIndex(x => x.ContentsGerman).IsFullTextIndex();
        });
    }

You can also set a default language for the container - unless overridden in the EnableFullTextSearch method, all full-text properties inside the container will use that language.

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>(b =>
        {
            b.HasDefaultFullTextLanguage("de-DE");
            b.Property(x => x.ContentsEnglish).EnableFullTextSearch("en-US");
            b.HasIndex(x => x.ContentsEnglish).IsFullTextIndex();
            b.Property(x => x.ContentsGerman).EnableFullTextSearch();
            b.HasIndex(x => x.ContentsGerman).IsFullTextIndex();
            b.Property(x => x.TagsGerman).EnableFullTextSearch();
            b.HasIndex(x => x.TagsGerman).IsFullTextIndex();
        });
    }

Querying

As part of the full-text search feature, Azure Cosmos DB introduced several built-in functions which allow for efficient querying of content inside the full-text search enabled properties. These functions are: FullTextContains, FullTextContainsAll, FullTextContainsAny, which look for specific keyword or keywords and FullTextScore, which returns BM25 score based on provided keywords.

Note

FullTextScore can only be used inside OrderBy to rank the documents based on the score.

EF Core exposes these functions as part of EF.Functions so they can be used in queries:

var cosmosBlogs = await context.Blogs.Where(x => EF.Functions.FullTextContainsAll(x.Contents, "database", "cosmos")).ToListAsync();

var keywords = new string[] { "AI", "agent", "breakthrough" };
var mostInteresting = await context.Blogs.OrderBy(x => EF.Functions.FullTextScore(x.Contents, keywords)).Take(5).ToListAsync();

Full-text search can be used with vector search in the same query (i.e. hybrid search), by combining results of FullTextScore and VectorDistance functions. It can be done using the RRF (Reciprocal Rank Fusion) function, which EF Core also provides inside EF.Functions:

public class Blog
{
    ...

    public float[] Vector { get; set; }
    public string Contents { get; set; }
}

public class BloggingContext
{
    ...

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>(b =>
        {
            b.Property(x => x.Contents).EnableFullTextSearch();
            b.HasIndex(x => x.Contents).IsFullTextIndex();

            b.Property(x => x.Vector).IsVectorProperty(DistanceFunction.Cosine, dimensions: 1536);
            b.HasIndex(x => x.Vector).IsVectorIndex(VectorIndexType.Flat);
        });
    }
}

float[] myVector = /* generate vector data from text, image, etc. */
var hybrid = await context.Blogs.OrderBy(x => EF.Functions.Rrf(
        EF.Functions.FullTextScore(x.Contents, "database"), 
        EF.Functions.VectorDistance(x.Vector, myVector)))
    .Take(10)
    .ToListAsync();

Tip

You can combine more than two scoring functions inside Rrf call, as well as using only FullTextScore, or only VectorDistance.