[FIX] 파트너 검색 병합 전략을 RRF로 교체 및 키워드 품질 개선#55
Conversation
벡터 점수(0~1)와 BM25 텍스트 점수(unbounded)의 스케일 불일치로 가중합이 정규화 실패 → 키워드부스트(+0.3 flat)가 관련성보다 키워드 일치 여부에 지배되는 문제 수정. RRF(K=60)는 절대 점수 대신 순위 기반 융합으로 스케일 문제를 근본적으로 해결. 한쪽 결과에만 존재하는 문서는 GHOST_RANK=500으로 처리해 결과 집합 크기와 무관한 고정 페널티 부여. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LLM이 생성한 "솔루션", "Solution", "Technology" 같은 범용 키워드가 MongoDB $text 검색에 사용되면 관련 없는 기업이 텍스트 점수를 얻는 문제 수정. GENERIC_KEYWORD_BLOCKLIST로 한/영 범용어를 필터링하고, 구분자를 쉼표 전용(\s*,\s*)으로 변경해 "Smart Farm" 같은 복합어가 분리되지 않도록 보존. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
filterGenericKeywords 적용 후 실제 $text 검색에 사용된 키워드를 debug 응답에서 확인할 수 있도록 aiKeywords 필드 노출. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughHyDE로 생성된 키워드에서 제네릭 토큰을 제거하고, 벡터/텍스트 검색 결과를 각각 랭크화한 뒤 Reciprocal Rank Fusion(RRF)으로 최종 score를 계산하도록 파트너 검색 로직이 변경되었습니다. debug 응답에 aiKeywords가 포함됩니다. 변경 사항Hybrid Search 구현
주요 로직 검토 포인트🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/modules/partners/partners.service.ts (1)
392-428: 💤 Low value
GHOST_RANK상수를K와 함께 선언하는 것이 좋습니다.
GHOST_RANK가 map 콜백 내부에서 매번 선언되고 있습니다. 가독성과 일관성을 위해K와 함께 콜백 외부에 선언하는 것이 더 적절합니다.♻️ 제안 수정
const K = 60; + const GHOST_RANK = 500; const vectorRankMap = new Map( [...vectorResults] .sort((a, b) => b.score - a.score) .map((r, i) => [r._id.toString(), i + 1]), ); const textRankMap = new Map( [...textResults] .sort((a, b) => b.textScore - a.textScore) .map((r, i) => [r._id.toString(), i + 1]), ); // ... allIds, vectorDocMap, textDocMap ... dbResults = [...allIds] .map((id) => { - const GHOST_RANK = 500; const vRank = vectorRankMap.get(id) ?? GHOST_RANK; const tRank = textRankMap.get(id) ?? GHOST_RANK;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modules/partners/partners.service.ts` around lines 392 - 428, The GHOST_RANK constant is being re-declared inside the map callback; move its declaration out with K (declare const GHOST_RANK = 500 alongside const K = 60) and then remove the inner declaration in the dbResults mapping so the code uses the outer GHOST_RANK; update references inside the map callback (vRank, tRank computations in the dbResults = [...allIds].map(...) block) to use the hoisted GHOST_RANK.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/modules/partners/partners.service.ts`:
- Around line 392-428: The GHOST_RANK constant is being re-declared inside the
map callback; move its declaration out with K (declare const GHOST_RANK = 500
alongside const K = 60) and then remove the inner declaration in the dbResults
mapping so the code uses the outer GHOST_RANK; update references inside the map
callback (vRank, tRank computations in the dbResults = [...allIds].map(...)
block) to use the hoisted GHOST_RANK.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee3a2974-e4e7-4082-b5f2-1dcba4e9df2c
📒 Files selected for processing (1)
src/modules/partners/partners.service.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/modules/partners/partners.service.ts`:
- Line 220: The assignment aiKeywords =
filterGenericKeywords(aiAnalysis.keywords) can throw if aiAnalysis.keywords is
undefined; either guard before calling in the code that assigns aiKeywords (e.g.
only call filterGenericKeywords when aiAnalysis.keywords is a string and
otherwise set aiKeywords = [] or ''), or make filterGenericKeywords robust to
undefined/null inputs by treating non-string inputs as empty string/empty array.
Update the call site in the function that handles generateHyDEAndKeywords
(reference aiAnalysis and aiKeywords) or update filterGenericKeywords to safely
handle undefined to prevent undefined.split() runtime errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c2f97443-7d0d-4316-9f48-7f77a0ca8b5a
📒 Files selected for processing (1)
src/modules/partners/partners.service.ts
가중합+키워드부스트 기반 score 기대값을 RRF 공식 1/(K+ghostRank) + 1/(K+textRank)으로 업데이트. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves #54
배경
기존 검색은 벡터 점수(cosine, 0~1)와 BM25 텍스트 점수(unbounded)를
가중합으로 합산한 뒤 키워드 일치 기업에 +0.3 flat boost를 적용했습니다.
문제:
무관한 기업이 키워드 부스트로 상위 랭크
변경 사항
1. 병합 전략 → Reciprocal Rank Fusion (RRF)
score = 1/(K+vectorRank) + 1/(K+textRank)(K=60)GHOST_RANK=500고정 페널티 부여(기존 N×3 방식은 결과 집합 크기에 따라 페널티가 달라지는 버그 있었음)
2. LLM 키워드 범용어 필터
filterGenericKeywords(): 솔루션/기술/서비스/system/solution 등한·영 범용어 블록리스트로 MongoDB
$text검색 전 사전 제거\s*,\s*(쉼표만)으로 변경 → "Smart Farm" 같은복합 영단어가 분리되지 않고 통째로 검색에 사용됨
3. debug 응답에
aiKeywords노출$text검색에 사용된 키워드 확인 가능영향 범위
partners.service.ts단일 파일, API 스펙 변경 없음debug객체에aiKeywords필드 추가 (하위 호환)Summary by CodeRabbit
릴리즈 노트