[FIX] 파트너 검색 병합 전략을 RRF로 교체 및 키워드 품질 개선 by Takch02 · Pull Request #55 · K-Statra/backend

Takch02 · 2026-05-16T12:46:01Z

Resolves #54

배경

기존 검색은 벡터 점수(cosine, 0~1)와 BM25 텍스트 점수(unbounded)를
가중합으로 합산한 뒤 키워드 일치 기업에 +0.3 flat boost를 적용했습니다.

문제:

두 점수의 스케일이 달라 정규화가 불안정
"솔루션", "Technology" 같은 LLM 생성 범용 키워드가 이름에 포함된
무관한 기업이 키워드 부스트로 상위 랭크
예: "스마트팜 농기계" 검색 시 애완동물 사료 기업이 "Farm" 단어 매칭으로 상위 진입

변경 사항

1. 병합 전략 → Reciprocal Rank Fusion (RRF)

score = 1/(K+vectorRank) + 1/(K+textRank) (K=60)
절대 점수 대신 순위 기반 융합 → 스케일 불일치 근본 해결
한쪽 결과에만 있는 문서는 GHOST_RANK=500 고정 페널티 부여
(기존 N×3 방식은 결과 집합 크기에 따라 페널티가 달라지는 버그 있었음)

2. LLM 키워드 범용어 필터

filterGenericKeywords(): 솔루션/기술/서비스/system/solution 등
한·영 범용어 블록리스트로 MongoDB $text 검색 전 사전 제거
구분자를 \s*,\s* (쉼표만)으로 변경 → "Smart Farm" 같은
복합 영단어가 분리되지 않고 통째로 검색에 사용됨

3. debug 응답에 `aiKeywords` 노출

실제 $text 검색에 사용된 키워드 확인 가능

영향 범위

partners.service.ts 단일 파일, API 스펙 변경 없음
응답 debug 객체에 aiKeywords 필드 추가 (하위 호환)

Summary by CodeRabbit

릴리즈 노트

개선사항
- 파트너 검색 결과의 정렬·랭킹 로직을 개선해 더 적합한 순위가 반환되도록 했습니다 (다중 랭킹을 결합한 방식).
- 검색 입력의 범용(제네릭) 키워드를 자동으로 필터링해 불필요한 매칭을 줄였습니다.
- 검색 결과의 디버그 정보에 AI 키워드 관련 정보가 추가되어 문제 분석이 용이합니다.

벡터 점수(0~1)와 BM25 텍스트 점수(unbounded)의 스케일 불일치로 가중합이 정규화 실패 → 키워드부스트(+0.3 flat)가 관련성보다 키워드 일치 여부에 지배되는 문제 수정. RRF(K=60)는 절대 점수 대신 순위 기반 융합으로 스케일 문제를 근본적으로 해결. 한쪽 결과에만 존재하는 문서는 GHOST_RANK=500으로 처리해 결과 집합 크기와 무관한 고정 페널티 부여. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

LLM이 생성한 "솔루션", "Solution", "Technology" 같은 범용 키워드가 MongoDB $text 검색에 사용되면 관련 없는 기업이 텍스트 점수를 얻는 문제 수정. GENERIC_KEYWORD_BLOCKLIST로 한/영 범용어를 필터링하고, 구분자를 쉼표 전용(\s*,\s*)으로 변경해 "Smart Farm" 같은 복합어가 분리되지 않도록 보존. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

filterGenericKeywords 적용 후 실제 $text 검색에 사용된 키워드를 debug 응답에서 확인할 수 있도록 aiKeywords 필드 노출. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-16T12:46:11Z

Warning

Rate limit exceeded

@Takch02 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 59 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 592c413b-3b0b-466b-905c-85d2b4d8eaaa

📥 Commits

Reviewing files that changed from the base of the PR and between 79aad06 and 7470cca.

📒 Files selected for processing (2)

src/modules/partners/partners.service.spec.ts
src/modules/partners/partners.service.ts

📝 Walkthrough

Walkthrough

HyDE로 생성된 키워드에서 제네릭 토큰을 제거하고, 벡터/텍스트 검색 결과를 각각 랭크화한 뒤 Reciprocal Rank Fusion(RRF)으로 최종 score를 계산하도록 파트너 검색 로직이 변경되었습니다. debug 응답에 aiKeywords가 포함됩니다.

변경 사항

Hybrid Search 구현

Layer / File(s)	요약
Generic Keyword Filtering and Application `src/modules/partners/partners.service.ts`	`GENERIC_KEYWORD_BLOCKLIST`와 `filterGenericKeywords()`를 추가하고, HyDE의 `aiAnalysis.keywords`에 필터를 적용해 `aiKeywords`에 할당합니다.
RRF-based Result Merging and Debug Output `src/modules/partners/partners.service.ts`	벡터/텍스트 결과를 각자 랭크 맵으로 변환(누락 시 ghost rank 적용)하고 RRF 수식(1/(K+vRank)+1/(K+tRank))으로 `score`를 재계산하며, `debug` 응답에 `aiKeywords`를 포함합니다.

주요 로직 검토 포인트

🎯 4 (Complex) | ⏱️ ~45 minutes

벡터와 텍스트 두 길이 합쳐서 🔍
RRF로 점수를 재구성하고
범용 키워드는 걸러내니
도메인 특화가 더 빛나리라 ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR 제목은 주요 변경사항인 RRF 병합 전략 도입과 키워드 품질 개선을 명확히 요약합니다.
Linked Issues check	✅ Passed	PR의 모든 코드 변경사항(RRF 기반 병합, 제네릭 키워드 필터링, 하이브리드 검색)이 `#54의` 도메인 특화 키워드 누락 문제 해결 목표를 충족합니다.
Out of Scope Changes check	✅ Passed	모든 변경사항이 partners.service.ts 파일 내에서 검색 로직 개선에 집중되어 있으며 `#54` 요구사항과 일치합니다.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/search-rrf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/modules/partners/partners.service.ts (1)

392-428: 💤 Low value

GHOST_RANK 상수를 K와 함께 선언하는 것이 좋습니다.

GHOST_RANK가 map 콜백 내부에서 매번 선언되고 있습니다. 가독성과 일관성을 위해 K와 함께 콜백 외부에 선언하는 것이 더 적절합니다.

♻️ 제안 수정

     const K = 60;
+    const GHOST_RANK = 500;

     const vectorRankMap = new Map(
       [...vectorResults]
         .sort((a, b) => b.score - a.score)
         .map((r, i) => [r._id.toString(), i + 1]),
     );
     const textRankMap = new Map(
       [...textResults]
         .sort((a, b) => b.textScore - a.textScore)
         .map((r, i) => [r._id.toString(), i + 1]),
     );

     // ... allIds, vectorDocMap, textDocMap ...

     dbResults = [...allIds]
       .map((id) => {
-        const GHOST_RANK = 500;
         const vRank = vectorRankMap.get(id) ?? GHOST_RANK;
         const tRank = textRankMap.get(id) ?? GHOST_RANK;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modules/partners/partners.service.ts` around lines 392 - 428, The
GHOST_RANK constant is being re-declared inside the map callback; move its
declaration out with K (declare const GHOST_RANK = 500 alongside const K = 60)
and then remove the inner declaration in the dbResults mapping so the code uses
the outer GHOST_RANK; update references inside the map callback (vRank, tRank
computations in the dbResults = [...allIds].map(...) block) to use the hoisted
GHOST_RANK.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/modules/partners/partners.service.ts`:
- Around line 392-428: The GHOST_RANK constant is being re-declared inside the
map callback; move its declaration out with K (declare const GHOST_RANK = 500
alongside const K = 60) and then remove the inner declaration in the dbResults
mapping so the code uses the outer GHOST_RANK; update references inside the map
callback (vRank, tRank computations in the dbResults = [...allIds].map(...)
block) to use the hoisted GHOST_RANK.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee3a2974-e4e7-4082-b5f2-1dcba4e9df2c

📥 Commits

Reviewing files that changed from the base of the PR and between 6a7d974 and 22417aa.

📒 Files selected for processing (1)

src/modules/partners/partners.service.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/modules/partners/partners.service.ts`:
- Line 220: The assignment aiKeywords =
filterGenericKeywords(aiAnalysis.keywords) can throw if aiAnalysis.keywords is
undefined; either guard before calling in the code that assigns aiKeywords (e.g.
only call filterGenericKeywords when aiAnalysis.keywords is a string and
otherwise set aiKeywords = [] or ''), or make filterGenericKeywords robust to
undefined/null inputs by treating non-string inputs as empty string/empty array.
Update the call site in the function that handles generateHyDEAndKeywords
(reference aiAnalysis and aiKeywords) or update filterGenericKeywords to safely
handle undefined to prevent undefined.split() runtime errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c2f97443-7d0d-4316-9f48-7f77a0ca8b5a

📥 Commits

Reviewing files that changed from the base of the PR and between 22417aa and 79aad06.

📒 Files selected for processing (1)

src/modules/partners/partners.service.ts

가중합+키워드부스트 기반 score 기대값을 RRF 공식 1/(K+ghostRank) + 1/(K+textRank)으로 업데이트. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Takch02 and others added 3 commits May 16, 2026 21:39

feat: 검색 debug 응답에 aiKeywords 필드 추가

22417aa

filterGenericKeywords 적용 후 실제 $text 검색에 사용된 키워드를 debug 응답에서 확인할 수 있도록 aiKeywords 필드 노출. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

fix: 상수 밖으로 빼기

79aad06

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread src/modules/partners/partners.service.ts

Takch02 and others added 2 commits May 17, 2026 14:58

fix: RRF 전략 변경에 맞게 partners 서비스 유닛 테스트 수정

ba19b17

가중합+키워드부스트 기반 score 기대값을 RRF 공식 1/(K+ghostRank) + 1/(K+textRank)으로 업데이트. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aiKeywords 없을 경우 예외 케이스 추가

7470cca

Takch02 merged commit 048f4a9 into main May 17, 2026
3 checks passed

Takch02 deleted the feat/search-rrf branch May 17, 2026 06:06

coderabbitai Bot mentioned this pull request May 20, 2026

[REFACTOR] 파트너 검색 API 필터링 제거 및 개선 #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] 파트너 검색 병합 전략을 RRF로 교체 및 키워드 품질 개선#55

[FIX] 파트너 검색 병합 전략을 RRF로 교체 및 키워드 품질 개선#55
Takch02 merged 6 commits into
mainfrom
feat/search-rrf

Takch02 commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

변경 사항

주요 로직 검토 포인트

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Takch02 commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

배경

변경 사항

1. 병합 전략 → Reciprocal Rank Fusion (RRF)

2. LLM 키워드 범용어 필터

3. debug 응답에 aiKeywords 노출

영향 범위

Summary by CodeRabbit

릴리즈 노트

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

변경 사항

주요 로직 검토 포인트

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Takch02 commented May 16, 2026 •

edited by coderabbitai Bot

Loading

3. debug 응답에 `aiKeywords` 노출

coderabbitai Bot commented May 16, 2026 •

edited

Loading