Skip to content

Unexpected output when running ripgrep with --multiline --pcre2 --json and a non-multiline regex #3364

@bfrg

Description

@bfrg

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 15.1.0

features:+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2

PCRE2 10.45 is available (JIT is available)

How did you install ripgrep?

pacman

What operating system are you using ripgrep on?

Arch Linux 6.19.11-arch1-1

Describe your bug.

When running rg --multiline --pcre2 --json with a non-multiline regex that matches on two consecutive lines, ripgrep outputs one match object with two submatches, instead of two separate match objects, each with one submatch.

This only happens with the --pcre2 option. When running rg --multiline --json with the same regex, the output will contain two separate match objects, each with one submatch.

In addition, this happens only when the match is on two consecutive lines but not when a non-matching line is in-between.

What are the steps to reproduce the behavior?

Consider the following input file test.txt:

test test foobar test
test foobar test test

To reproduce, run:

rg --multiline --pcre2 --json foobar test.txt

What is the actual behavior?

Output:

rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:97: no extra arguments found from configuration file
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:954: read CWD from environment: /tmp
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1092: number of paths given to search: 1
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1103: is_one_file? true
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1278: found hostname for hyperlink configuration: minipc
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1288: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:175: using 1 thread(s)
rg: DEBUG|ignore::gitignore|crates/ignore/src/gitignore.rs:398: opened gitignore file: /home/bfrg/.config/git/ignore
rg: DEBUG|globset|crates/globset/src/lib.rs:515: built glob set; 0 literals, 11 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
{
  "type": "begin",
  "data": {
    "path": {
      "text": "multi-test"
    }
  }
}
{
  "type": "match",
  "data": {
    "path": {
      "text": "multi-test"
    },
    "lines": {
      "text": "test test foobar test\ntest foobar test test\n"
    },
    "line_number": 1,
    "absolute_offset": 0,
    "submatches": [
      {
        "match": {
          "text": "foobar"
        },
        "start": 10,
        "end": 16
      },
      {
        "match": {
          "text": "foobar"
        },
        "start": 27,
        "end": 33
      }
    ]
  }
}
{
  "type": "end",
  "data": {
    "path": {
      "text": "multi-test"
    },
    "binary_offset": null,
    "stats": {
      "elapsed": {
        "secs": 0,
        "nanos": 32451,
        "human": "0.000032s"
      },
      "searches": 1,
      "searches_with_match": 1,
      "bytes_searched": 44,
      "bytes_printed": 323,
      "matched_lines": 2,
      "matches": 2
    }
  }
}
{
  "data": {
    "elapsed_total": {
      "human": "0.000662s",
      "nanos": 662190,
      "secs": 0
    },
    "stats": {
      "bytes_printed": 323,
      "bytes_searched": 44,
      "elapsed": {
        "human": "0.000032s",
        "nanos": 32451,
        "secs": 0
      },
      "matched_lines": 2,
      "matches": 2,
      "searches": 1,
      "searches_with_match": 1
    }
  },
  "type": "summary"
}

What is the expected behavior?

I would have expected the following output (which can be obtained by omitting --pcre2):

{
  "type": "begin",
  "data": {
    "path": {
      "text": "multi-test"
    }
  }
}
{
  "type": "match",
  "data": {
    "path": {
      "text": "multi-test"
    },
    "lines": {
      "text": "test test foobar test\n"
    },
    "line_number": 1,
    "absolute_offset": 0,
    "submatches": [
      {
        "match": {
          "text": "foobar"
        },
        "start": 10,
        "end": 16
      }
    ]
  }
}
{
  "type": "match",
  "data": {
    "path": {
      "text": "multi-test"
    },
    "lines": {
      "text": "test foobar test test\n"
    },
    "line_number": 2,
    "absolute_offset": 22,
    "submatches": [
      {
        "match": {
          "text": "foobar"
        },
        "start": 5,
        "end": 11
      }
    ]
  }
}
{
  "type": "end",
  "data": {
    "path": {
      "text": "multi-test"
    },
    "binary_offset": null,
    "stats": {
      "elapsed": {
        "secs": 0,
        "nanos": 41638,
        "human": "0.000042s"
      },
      "searches": 1,
      "searches_with_match": 1,
      "bytes_searched": 44,
      "bytes_printed": 449,
      "matched_lines": 2,
      "matches": 2
    }
  }
}
{
  "data": {
    "elapsed_total": {
      "human": "0.000554s",
      "nanos": 553546,
      "secs": 0
    },
    "stats": {
      "bytes_printed": 449,
      "bytes_searched": 44,
      "elapsed": {
        "human": "0.000042s",
        "nanos": 41638,
        "secs": 0
      },
      "matched_lines": 2,
      "matches": 2,
      "searches": 1,
      "searches_with_match": 1
    }
  },
  "type": "summary"
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions