Skip to content

Embedding & Vector DB

Embedding và Vector Database là thành phần cốt lõi cho tìm kiếm ngữ nghĩa, chuyển đổi văn bản thành vector số học để có thể so sánh độ tương đồng.

🎯 Mục đích

  • Semantic search: Tìm kiếm dựa trên ý nghĩa, không chỉ từ khóa
  • Context understanding: Hiểu ngữ cảnh câu hỏi
  • Scalability: Xử lý khối lượng lớn dữ liệu
  • Performance: Tìm kiếm nhanh trong không gian vector

🧮 Embedding Models

Sentence Transformers

  • all-MiniLM-L6-v2: Nhanh, hiệu quả cho tiếng Anh
  • paraphrase-multilingual-MiniLM-L12-v2: Hỗ trợ đa ngôn ngữ
  • Vietnamese-specific models: Fine-tuned cho tiếng Việt

Transformer-based

  • BERT variants: RoBERTa, DistilBERT
  • Domain adaptation: Fine-tuned cho legal text
  • Multilingual support: XLM-RoBERTa

🗄️ Vector Databases

Pinecone

  • Managed service: Fully managed vector database
  • Scalability: Auto-scaling, high availability
  • Features: Metadata filtering, hybrid search

Weaviate

  • Open source: Self-hosted option
  • GraphQL API: Rich query capabilities
  • Modules: Text2Vec, QnA transformers

Alternatives

  • Qdrant: Rust-based, high performance
  • Milvus: Distributed vector database
  • FAISS: Library cho similarity search

🔄 Indexing Process

📊 Indexing Strategies

Flat Index

  • Simple: No approximation
  • Accurate: Exact nearest neighbors
  • Slow: O(n) search time

Approximate Index

  • ANN: Approximate nearest neighbors
  • Fast: Sub-linear search time
  • Trade-off: Slight accuracy loss

Hierarchical Index

  • HNSW: Hierarchical navigable small world
  • IVF: Inverted file index
  • Hybrid: Combine multiple strategies

🔧 Configuration

Embedding Parameters

  • Dimension: 384-768 (tùy model)
  • Normalization: L2 normalization
  • Pooling: Mean pooling cho sentence embeddings

Index Parameters

  • M: HNSW parameter (neighbors)
  • efConstruction: Build time parameter
  • ef: Search time parameter

📈 Performance Optimization

Batch Processing

  • Parallel embedding: Process multiple chunks together
  • GPU acceleration: Use CUDA cho embedding
  • Caching: Cache frequent embeddings

Index Optimization

  • Incremental updates: Add vectors without rebuild
  • Filtering: Metadata-based filtering
  • Compression: Quantization cho storage

🔍 Query Processing

  • Cosine similarity: Standard cho text embeddings
  • Dot product: Alternative similarity measure
  • Euclidean distance: For some use cases

Hybrid Queries

  • Text + Vector: Combine keyword và semantic search
  • Metadata filtering: Filter by document type, date
  • Re-ranking: Post-process results

📊 Monitoring & Maintenance

Quality Metrics

  • Embedding quality: Cosine similarity distributions
  • Retrieval accuracy: Precision@K, Recall@K
  • Index health: Query latency, throughput

Maintenance Tasks

  • Index rebuild: Periodic optimization
  • Vector updates: Handle document changes
  • Backup: Vector data backup strategies

Embedding và vector indexing cung cấp nền tảng cho khả năng tìm kiếm thông minh và trả lời câu hỏi chính xác.