Skip to content

Fix content errors #4

@clincolnoz

Description

@clincolnoz

Fix Gold File Inaccuracies

Scope

Fix all inaccuracies that were verified against extractable PDF source documents. Flag image-only resume files that could not be verified for later manual review.

  1. Finance/10kq -- Company Names (6 files)

Fix the company field to use the exact legal name from each filing's cover page:

adp_10q_fy2025q2.gold.json: "Automatic Data Processing" -> "Automatic Data Processing, Inc."

csco_10q_fy2025q2.gold.json: "Cisco Systems" -> "Cisco Systems, Inc."

dell_10q_fy2025q2.gold.json: "Dell Technologies" -> "Dell Technologies Inc."

nke_10q_fy2025q2.gold.json: "Nike" -> "NIKE, Inc."

tho_10q_fy2025q2.gold.json: "Thor Industries" -> "Thor Industries, Inc."

wdc_10q_fy2025q2.gold.json: "Western Digital" -> "Western Digital Corporation"

  1. Finance/10kq -- Date and Period Fixes (3 fixes across 2 files)

mck_10q_fy2025q2.gold.json: report_period_end_date "2024-08-02" -> "2024-09-30" (PDF: "quarterly period ended September 30, 2024")

tho_10q_fy2025q2.gold.json: report_period_end_date "2025-01-2025" -> "2025-01-31" (invalid format; PDF: "January 31, 2025")

nke_10q_fy2025q2.gold.json: balance_sheet.retained_earnings[1].data_period "FY2025 Q2" -> "FY2024" (that entry is for May 31, 2024)

  1. Finance/credit_agreement (5 files, 7 fixes)

adbe_credit_agreement_2000_08_09.gold.json: Remove trailing comma from parties.administrative_agent; fix "setforth" -> "set forth" and "arrying" -> "carrying" in terms.use_of_proceeds

amzn_credit_agreement_2014_09_05.gold.json: terms.governing_law "New York" -> "State of New York"

ba_credit_agreement_2003_11_21.gold.json: Fix lender names: "ABN AmroBank" -> "ABN Amro Bank"; merge split "Australia andNew Zealand" / "Banking GroupLimited" into single entry; fix "Argerntaria" -> "Argentaria"

csco_credit_agreement_2007_08_17.gold.json: Remove trailing comma from parties.administrative_agent

ibm_credit_agreement_2019_07_18.gold.json: Fix period in parties.lead_arranger[0] "BANK. N.A." -> "BANK, N.A."; set terms.maturity_date to "2020-07-16"

  1. Hiring/resume -- Verified Fixes (2 files, 3 fixes)

These are the two PDFs with extractable text:

Resume-Academic01.gold.json: personalInfo.fullName "Isabella Rossi" -> "Dr. Isabella Rossi"

Resume-Academic02.gold.json: workExperience[2].jobTitle "Associate Professor" -> "Assistant Professor"; certificationsAndAwards[5].organization "NSF" -> "NRF"

  1. Sport/swimming (1 file, 1 fix)

ma_2023_sw_M-table2.gold.json: Fix "GUADALAJARRA" -> "GUADALAJARA"

  1. Academic/research (3 files, 8 fixes)

NIPS-1989...gold.json: authors[0].name "Le Cun" -> "Y. Le Cun"; venue null -> "Neural Information Processing Systems (NIPS 1989)"

[[fan24] rag survey.gold.json](dataset/academic/research/pdf+gold/[fan24] rag survey.gold.json): Fix title spaces "ASurvey on RAGMeetingLLMs" -> "A Survey on RAG Meeting LLMs"; fix authors[5].affiliation to "Baidu Inc, China"; fix publication_date from 2017 to correct 2024 date

[[li25] vlm survey.gold.json](dataset/academic/research/pdf+gold/[li25] vlm survey.gold.json): Fix title spaces "ASurvey" -> "A Survey"; fix authors[2].name "HongyangDu" -> "Hongyang Du"; fix authors[5].name "GuangyaoShi" -> "Guangyao Shi"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions