Chế độ tối
Reranking & Filtering
Reranking cải thiện thứ tự kết quả từ retrieval bằng cách đánh giá lại mức độ liên quan với query.
🎯 Mục đích
- Improve ranking: Nâng cao chất lượng thứ tự kết quả
- Query relevance: Tăng độ phù hợp với ý định query
- Diversity: Giảm redundancy, tăng đa dạng
- Precision: Đưa kết quả tốt nhất lên top
🔄 Process
🛠️ Techniques
Cross-Encoder Reranking
- Pairwise comparison: So sánh query với từng document
- High accuracy: Better than bi-encoder
- Computational cost: Expensive cho large sets
Learning to Rank
- Feature engineering: Query-doc features
- ML models: Gradient boosting, neural networks
- Training data: Human-labeled relevance
Diversity-based Reranking
- Maximal marginal relevance: Balance relevance và diversity
- Query aspect coverage: Cover different query aspects
- Result clustering: Group similar results
🤖 Models
Cross-Encoders
- MS MARCO models: Pre-trained cho passage ranking
- Domain-specific: Fine-tuned cho legal documents
- Multilingual: Support Vietnamese
Neural Rankers
- BERT-based: Deep semantic understanding
- RoBERTa variants: Improved performance
- Ensemble methods: Combine multiple models
📊 Implementation
Batch Processing
- Top-K reranking: Rerank top candidates only
- Parallel execution: Process multiple documents together
- Caching: Cache reranking results
Integration
- Post-retrieval: After initial retrieval
- Hybrid search: Rerank fusion results
- Multi-stage: Multiple reranking passes
🚀 Optimization
Performance
- Model compression: Smaller, faster models
- Approximation: Approximate reranking
- Early stopping: Stop if confidence high
Quality
- A/B testing: Compare reranking strategies
- User feedback: Learn from interactions
- Continuous learning: Adapt to user preferences
Reranking nâng cao chất lượng kết quả cuối cùng, cải thiện trải nghiệm người dùng.