Vertex AI Matching Engine - Vector Similarity Search

Поделиться
HTML-код
  • Опубликовано: 25 июл 2024
  • Putting a similarity index into production at scale is a pretty hard challenge. It requires a whole bunch of infrastructure working closely together. You need to handle a large amount of data at low latency. It introduces you to topics like sharding, hashing, trees, load balancing, efficient data transfer, data replication, and much more.
    Check out the notebook and the article on how to get started with Google Cloud Vertex AI Matching Engine
    📓 Notebook: colab.research.google.com/dri...
    📖 Article: / all-you-need-to-know-a...
    If you enjoyed this video, please subscribe to the channel ❤️
    🎉 Subscribe for Article and Video Updates!
    / subscribe
    / membership
    You can find me here:
    LinkedIn: / saschaheyer
    Twitter: / heyersascha
    If you or your company is looking for advice on the cloud or ML, check out the company I work for.
    www.doit.com/
    We offer consulting, workshops, and training at zero cost. Imagine an extension for your team without additional costs.
    #vertexai #googlecloud #machinelearning #mlengineer #doit
    ▬ My current recording equipment ▬▬▬▬▬▬▬▬
    ► Camera for recording and streaming in 4K amzn.to/3QQzwiN
    ► Lens with nice background blur amzn.to/3dVDAjb
    ► Connect the camera to PC 4K amzn.to/3ciYyrE
    ► Light amzn.to/3Rb065M
    ► Most flexible way to mount your camera + mic amzn.to/3TedZC5
    ► Microphone (I love it) amzn.to/3QV3mmb
    ► Audio Interface amzn.to/3CBxj5M
    Support my channel if you buy with those links on Amazon
    ▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    00:00 Introduction
    00:32 Statement
    00:47 Use Cases
    01:25 Embedding
    01:47 Input
    02:23 Types
    02:54 VPC
    04:05 Create Embeddings
    06:50 Setup
    07:00 VPC Setup
    08:39 Create Index
    11:58 Create Endpoint
    13:00 Deploy Index
    14:23 Update Index
    15:10 Scale Index
    16:46 Query
    22:23 Bye
  • НаукаНаука

Комментарии • 73

  • @jobiquirobi123
    @jobiquirobi123 2 года назад +4

    Nice tutorial. Matching engine is really promising but it does require some setup, I will try to reproduce this tutorial and see what happens.

    • @ml-engineer
      @ml-engineer  2 года назад +2

      See many customers moving to Matching Engine, they're all happy with it. Only time to update the index could be quicker. But I guess this depends on the requirements. Getting new embeddings into the index in real time is not possible. Though there are workarounds.

    • @rubenszimbres
      @rubenszimbres Год назад

      @@ml-engineer Sascha, do you think online inference is possible by running Cloud Run/ Cloud Functions on a Vertex AI endpoint, getting the embeddings and then submitting to Matching Engine ANN ? I was wondering if this may be a solution ....

    • @ml-engineer
      @ml-engineer  Год назад +1

      Hi Rubens
      yes embeddings models are usually hosted with Vertex AI Endpoints or with Cloud Run (if you don't need a GPU).
      After you did the inference you need to tell the Matching Engine Index to take the new embeddings / vector into account, stored on Cloud Storage. And that's currently the bottleneck as this indexing process takes quite a long time.

  • @user-fw7bj3jj7x
    @user-fw7bj3jj7x Год назад +1

    Amazing thank you!
    I'm really keen to see that video about how to use Cloud Run to make the Vertex AI Endpoint more accessible, did you end up making that video?

    • @ml-engineer
      @ml-engineer  Год назад

      Hi
      Google released public endpoints it is no longer required to use a VPC network. Therefore you not necessarily need a Cloud Run service in front.
      Here is the documentation for the public Matching Engine endpoint:
      cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public
      In case you are still interested in the Cloud Run approach here I have a sample implementation for a image similarity matching solution
      github.com/SaschaHeyer/image-similarity-search/tree/main/query-service.
      The critical part is in the cloudbuild.yaml that contains the reference to the VPC network github.com/SaschaHeyer/image-similarity-search/blob/main/query-service/cloudbuild.yaml
      Let me know if that helps

  • @LucasGomide
    @LucasGomide Год назад +1

    Great content. Can u tell me about some alternatives? I am studying some options such as using pgvector with some model to generate embedding VS matching engine.
    I would like to understand pros /cons about those approaches

    • @ml-engineer
      @ml-engineer  Год назад +1

      Hi Lucas
      Pinecone is also a highly recommended product. Or you can go open source with Faiss or Annoy but this requires you to take care of the infrastructure yourself.
      If you want a similarity search I recommend to either go with Matching Engine or Pinecone.

  • @tyronehou3553
    @tyronehou3553 9 месяцев назад +1

    Great tutorial! Can you update algorithm parameters like leafNodeEmbeddingCount and leafNodesToSearchPercent on the fly? I tried using the gcloud update index command, but nothing changes when I describe the index afterward, even when the operation is complete

    • @ml-engineer
      @ml-engineer  9 месяцев назад

      Hi
      no they can only be set during index creation it is not possible to update them. That's because a update would require a full index re-build which in the end is the same as creating a new index.

  • @ramsure9246
    @ramsure9246 Год назад +1

    Thanks for tutorial. Is there any Langchain compatible retriever for this matching engine index ?

    • @ml-engineer
      @ml-engineer  Год назад +1

      Yes there is langchain support for Matching Engine. The Google team implemented it a few weeks ago.
      github.com/hwchase17/langchain/pull/3104
      Currently writing an article on it that will be published in the next days.
      from langchain.vectorstores.matching_engine import MatchingEngine
      vector_store = MatchingEngine.from_components(
      index_id=INDEX_NAME,
      region=MATCHING_ENGINE_REGION,
      embedding=embeddings_llm,
      project_id=PROJECT_ID,
      endpoint_id=ENDPOINT_NAME,
      gcs_bucket_name=DOCS_BUCKET)
      relevant_documentation=vector_store.similarity_search(question, k=8)

  • @anjanak8303
    @anjanak8303 2 года назад +1

    Thank you for the tutorial. With the avro format there is an allow and deny option that you can set for the embeddings inserted. There is little documentation as to how to use this in a query. Could you help with this?

    • @ml-engineer
      @ml-engineer  2 года назад +1

      Hello Anjana
      are you refering to the filtering functionality?
      cloud.google.com/vertex-ai/docs/matching-engine/filtering

    • @anjanak8303
      @anjanak8303 2 года назад

      @@ml-engineer Yes, the same. Could you tell me how to incoorporate that into a query? I have an idea on how to have it inserted in the index, but would be good if you could give a clarity there as well. Thanks for replying :)

    • @ml-engineer
      @ml-engineer  2 года назад +1

      Got it. Yeah it's not well documented. But if you check the proto file you can get an understanding on how you can use it when querying the matching engine.
      As simple as applying it to your query request
      namespace = match_service_pb2.Namespace()
      namespace.name = 'color'
      namespace.allow_tokens.append('red')
      request = match_service_pb2.MatchRequest()
      request.deployed_index_id = DEPLOYED_INDEX_ID
      request.restricts.append(namespace)

    • @anjanak8303
      @anjanak8303 2 года назад +1

      @@ml-engineer This worked!! I had not looked at the proto file in much detail, thank you so much😃

    • @ml-engineer
      @ml-engineer  2 года назад +1

      @@anjanak8303 perfect. I add this to the article I hope we can help more people having the same question.

  • @federicoph3407
    @federicoph3407 Год назад +2

    Thank you for the tutorial!
    Is it possible to choose the machine type? I tried with 100 vectors (94 kb), and in the endpoint's basic info I see machine-type: n1-standard-16. In the documentation it seems that there is a default machine based on shard size. The documentation says: "When you create an index you must specify the shard size of the index", but there is no parameter that refers to shard size during Index creation. There is also written "you can determine what machine type to use when you deploy your index" but, same as before, there is no parameter that refers to machine-type. I am a bit confused :/

    • @federicoph3407
      @federicoph3407 Год назад

      documentation: matching-engine -> create-manage-index?hl=en#create-index

    • @ml-engineer
      @ml-engineer  Год назад

      Hello Federicoph,
      That is indeed a good question that is not covered in the video nor the article =).
      The machine-type can be defined when deploying the index. Like mentioned in the documentation deploy_index
      But if you actually check the the gcloud command there is nothing documented
      cloud.google.com/sdk/gcloud/reference/alpha/ai/index-endpoints/deploy-index
      So I always fall back to the actual implementation and there you can see the deploy_index method is indeed accepting a machine type.
      See here
      github.com/googleapis/python-aiplatform/blob/90bb8ef3d675af62b7cc1f0d2fdf99b476e8dde5/google/cloud/aiplatform/matching_engine/matching_engine_index_endpoint.py#L542
      In your use case you can set it to the smallest machine. Also because you only have 100 vectors I recommend to use the brute force algorithm.
      Let me know if that helps.

    • @ml-engineer
      @ml-engineer  Год назад

      Quick appendix
      This is reflected in the API documentation as well
      cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints/deployIndex
      see the request body
      cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints#DeployedIndex
      especially this part
      cloud.google.com/vertex-ai/docs/reference/rest/v1/DedicatedResources

  • @alexchan5643
    @alexchan5643 11 месяцев назад +3

    Thanks for the walkthrough. The documentation from GCP is quite messy
    It doesn't seem to have great support for metadata filtering compared to other stores, only very basic operations. Any thoughts from your experience?

    • @ArmandoCuevas-sx5cf
      @ArmandoCuevas-sx5cf 11 месяцев назад

      I would like to know the answer to this one too. I don't see support for metadata as pinecone does.

    • @alexchan5643
      @alexchan5643 11 месяцев назад

      ​@@ArmandoCuevas-sx5cf Based on my further investigations over the past week the metadata filtering is restricted to string matching only with key/value pairs (so no comparators on numeric values) and the idea is to pair the matching engine IDs with another key-value store like Bigtable where you could possibly do further complex filtering-comparing this setup to to Pinecone or Qdrant and considering the costs, I don't think I would use Matching Engine

    • @ml-engineer
      @ml-engineer  11 месяцев назад +1

      Hi Alex
      Hi Armando
      Matching Engine as Alex already said supports string matching on metadata.
      cloud.google.com/vertex-ai/docs/matching-engine/filtering
      Pinecone is indeed more flexible on this point.

    • @ArmandoCuevas-sx5cf
      @ArmandoCuevas-sx5cf 11 месяцев назад +1

      @@alexchan5643 thanks a lot, that's helpful and you're right having metadata filtering availablo is a big advantage for Pinecone.

  • @MOHAMMADAUSAF
    @MOHAMMADAUSAF 5 месяцев назад +1

    Hey awesome starter, just a question, given i have a index created with a bucket, if i were to add new files to the same bucket, will the index reflect the new data files, either by itself or even by triggering ? or simply put, how can i add new data from a bucket to an existing index without rebuilding entire index again, something equivalent of pinecone or weaviate upsert functionalities ? the docs arent helping me here

    • @ml-engineer
      @ml-engineer  4 месяца назад

      Hi Mohammad
      I can recommend to use Vertex AI Vector Search / Matching Engines streaming capabilities. This way you can simply send new data via SDK to the vector database.
      Check out my sample repo to get you started
      github.com/SaschaHeyer/Real-Time-Deep-Learning-Vector-Similarity-Search
      It's the same process likes pinecones upsert.

  • @nooralsmadi5017
    @nooralsmadi5017 Год назад +1

    Hi ,
    How can I make it work from outside the network?, I mean send a request and get a response from out side the network ?

    • @ml-engineer
      @ml-engineer  Год назад +2

      To send your request to the Matching Engine you need to be "inside" of the network. This can be complicated if you want to integrate it into an service that is running outside of that network.
      There is one simple approach that I really like. You can implement a Cloud Run Service that is part of the VPC network and takes your requests. This Cloud Run service can be also reached from outside the network.
      I have implemented exactly that in one of my other articles
      medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571

  • @akarshjainable
    @akarshjainable Год назад +1

    where did you mention the schema of the data file(the one with input embedding vector)?

    • @ml-engineer
      @ml-engineer  Год назад

      What do you mean with schema? The Matching engine does not need a schema as we just provide the embeddings.
      Can you rephrase 🙂 in case I misunderstood your question.

    • @akarshjainable
      @akarshjainable Год назад +1

      @@ml-engineer aah got it , so the embedding input vector file has to be in the format{"id":"string","embedding":[vector]}

    • @ml-engineer
      @ml-engineer  Год назад

      are you referring to this section of the video?
      ruclips.net/video/KMTApM5ajAw/видео.html

    • @akarshjainable
      @akarshjainable Год назад

      @@ml-engineer yes precisely.

    • @ml-engineer
      @ml-engineer  Год назад

      Yes exactly. Alternative file formats are CSV or AVRO

  • @kadapa-rl6jg
    @kadapa-rl6jg Год назад +1

    Hi,
    Can you please help me understand how to orchastretate vertex AI through cloud composer

    • @ml-engineer
      @ml-engineer  Год назад +1

      Hi
      I have written a comparison article between Cloud Composer and Vertex AI Pipelines to orchestrate ML pipelines.
      medium.com/google-cloud/vertex-ai-pipelines-vs-cloud-composer-for-orchestration-4bba129759de
      In general, if you want to use Vertex AI's capabilities as part of Cloud Composer, you can simply use the Vertex AI SDK as part of your composer tasks.
      Though I would highly recommend switching to Vertex AI Pipelines:
      ruclips.net/video/gtVHw5YCRhE/видео.html

    • @kadapa-rl6jg
      @kadapa-rl6jg Год назад

      @@ml-engineer my requirement is to orchastretate vertex ai pipeline through cloud composer via Terraform code

  • @akarshjainable
    @akarshjainable Год назад +1

    Can I do a batch prediction on index, if Yes , Do I need a vpc network for that?

    • @ml-engineer
      @ml-engineer  Год назад

      You need a VPC network this is a requirement to run queries against the index.
      Batch prediction over the complete index is not possible. This is due to the nature of the index you only get the k_nearest neighbors.

    • @akarshjainable
      @akarshjainable Год назад +1

      probably getting a bit greedy here, do you have plans to upload tutorial on two tower?

    • @ml-engineer
      @ml-engineer  Год назад

      no worries, love all the comments here on youtube.
      Yes I release an article next week
      It's a deep dive on how to use the two-tower algorithm + Matching Engine + Vertex AI Pipelines to build a Deep Learning Recommendation Engine.

    • @akarshjainable
      @akarshjainable Год назад +1

      @@ml-engineer Thanks a ton

    • @ml-engineer
      @ml-engineer  Год назад

      The article is published
      medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571

  • @elijahdecalmer613
    @elijahdecalmer613 Год назад +1

    you are a legend

    • @ml-engineer
      @ml-engineer  Год назад

      ¯\_(ツ)_/¯

    • @elijahdecalmer613
      @elijahdecalmer613 Год назад

      Excuse me, you briefly mention that there are workarounds to simulate real time indexing. Could you explain the options for this? Or point me to some docs. Beginner trying to work it out for a project :)

    • @ml-engineer
      @ml-engineer  Год назад +1

      The feasibility of the solution depends on the number of new vectors you get between the indexing updates.
      You store the vectors that need to be indexes in the next index update round in Memorystore for fast millisecond access.
      Build a for example Cloud Run application that takes the vectors from Memorystore and calculate the distance yourself (it's just simple math). The same Cloud Run application also calls the Matching Engine. And in the end you combine the results if the distance is in your desired range.
      On long term I hope for quicker index updates using GPUs.

    • @ml-engineer
      @ml-engineer  Год назад

      @@elijahdecalmer613 Google added streaming support which makes it easier to get new vectors into the index.

    • @ahmedmansouri2054
      @ahmedmansouri2054 6 месяцев назад +1

      @@ml-engineer if I want to update the new indexes in real-time can I just add new files in the GCS folder where your vector data is stored or do I have to add it Programmatically?

  • @majidalikhani2765
    @majidalikhani2765 Год назад +1

    Hey what is the parameter that decides the number of neighbours returned? I tried changing num_neighbours to no avail. it only returns 10 neighbours

    • @ml-engineer
      @ml-engineer  Год назад +1

      Hi Majid
      You can define the number of neighbors you you want to retrieve when calling the matching endpoint
      response = my_index_endpoint.match( deployed_index_id=DEPLOYED_INDEX_ID,
      queries=..., num_neighbors=NUM_NEIGHBOURS )

    • @majidalikhani2765
      @majidalikhani2765 Год назад

      @@ml-engineer But in this tutorial you don't query this way. Instead match_service.proto is used which has a field num_neighbours = 3. But always returns 10 neighbours

    • @ml-engineer
      @ml-engineer  Год назад +1

      @@majidalikhani2765 yes Google changed the way to get matching results, since o released the video. No need for complex .proto file handling anymore. Just us the SDK in the same way like creating the index much easier.

    • @ml-engineer
      @ml-engineer  Год назад +1

      Will add the new way to the notebook in the next few days and publish an additional video.

    • @majidalikhani2765
      @majidalikhani2765 Год назад +1

      @@ml-engineer Google's documentation very poor smh. I got it working via the sdk. thanks

  • @AyushMandloi
    @AyushMandloi 8 месяцев назад +1

    What is need to endpoints ?
    When u will be uploading more videos ?

    • @ml-engineer
      @ml-engineer  8 месяцев назад

      Hi Ayush
      what do you mean with your endpoint question?
      Recording 4 new videos about Generative AI on Google Cloud at the moment will be released in the next weeks.

  • @niladrishekhardutt
    @niladrishekhardutt Год назад +1

    Great tutorial! How does the deny list work?
    Let's say I have a class fruit which will ONLY have deny list tokens (no allow) such as "apple", "mango", etc. How do I filter out "mango" in the query (search all fruits except mango)?
    I have tried the following method but it does not work as expected
    json
    {"id": "1", "embedding":[0.002792,0.000492], "restricts": [{"namespace": "fruit", "deny": ["mango"]}]}
    query
    deny_namespace = match_service_pb2.Namespace()
    deny_namespace.name = "fruit"
    deny_namespace.deny_tokens.append("mango")
    request.restricts.append(deny_namespace)

    • @ml-engineer
      @ml-engineer  Год назад

      Hello Niladri
      thanks a lot.
      (Anjana in the comments had a similar question about allow tokens.)
      Your JSON and query are definitely correct. I don't see any issues here.
      Did you make sure to update the index after adding the restricts filter into the JSON?

    • @niladrishekhardutt
      @niladrishekhardutt Год назад

      @@ml-engineer Hey
      Thanks for the quick reply. Yes, I have completely overwritten the index twice now (just to be sure) but it still doesn't seem to work. Is there any requirement for the token to be on the allow list as well?

    • @ml-engineer
      @ml-engineer  Год назад +1

      Deny alone is without allow possible.
      see documentation:
      cloud.google.com/vertex-ai/docs/matching-engine/filtering#denylist
      {} // empty set matches everything
      {red} // only a 'red' token
      {blue} // only a 'blue' token
      {orange} // only an 'orange' token
      {red, blue} // multiple tokens
      {red, !blue} // deny the 'blue' token
      {red, blue, !blue} // a weird edge-case
      {!blue} // deny-only (similar to empty-set)
      See the following description:
      When a query denylists a token, matches are excluded for any datapoint that has the denylisted token. If a query namespace has only denylisted tokens, all points not explicitly denylisted, match, in exactly the same way that an empty namespace matches with all points.
      So the issues has to be somewhere else

    • @niladrishekhardutt
      @niladrishekhardutt Год назад

      @@ml-engineer Unfortunately, this does not seem to be working :(
      I have looked at my JSON multiple times now and tried different variations but it still fails. Do you have any ideas?

    • @federicoph3407
      @federicoph3407 Год назад +1

      Hi @Nilandri, did you solve your problem?
      If yes, can you explain how please?
      If no, I got the same problem with the allow-list-tokens, I opened an issue on github and on googlecloudcommunity.
      Thank you in advance!
      @ML Engineer