Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 15.1.0
features:+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2
PCRE2 10.45 is available (JIT is available)
How did you install ripgrep?
pacman
What operating system are you using ripgrep on?
Arch Linux 6.19.11-arch1-1
Describe your bug.
When running rg --multiline --pcre2 --json with a non-multiline regex that matches on two consecutive lines, ripgrep outputs one match object with two submatches, instead of two separate match objects, each with one submatch.
This only happens with the --pcre2 option. When running rg --multiline --json with the same regex, the output will contain two separate match objects, each with one submatch.
In addition, this happens only when the match is on two consecutive lines but not when a non-matching line is in-between.
What are the steps to reproduce the behavior?
Consider the following input file test.txt:
test test foobar test
test foobar test test
To reproduce, run:
rg --multiline --pcre2 --json foobar test.txt
What is the actual behavior?
Output:
rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:97: no extra arguments found from configuration file
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:954: read CWD from environment: /tmp
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1092: number of paths given to search: 1
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1103: is_one_file? true
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1278: found hostname for hyperlink configuration: minipc
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1288: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:175: using 1 thread(s)
rg: DEBUG|ignore::gitignore|crates/ignore/src/gitignore.rs:398: opened gitignore file: /home/bfrg/.config/git/ignore
rg: DEBUG|globset|crates/globset/src/lib.rs:515: built glob set; 0 literals, 11 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
{
"type": "begin",
"data": {
"path": {
"text": "multi-test"
}
}
}
{
"type": "match",
"data": {
"path": {
"text": "multi-test"
},
"lines": {
"text": "test test foobar test\ntest foobar test test\n"
},
"line_number": 1,
"absolute_offset": 0,
"submatches": [
{
"match": {
"text": "foobar"
},
"start": 10,
"end": 16
},
{
"match": {
"text": "foobar"
},
"start": 27,
"end": 33
}
]
}
}
{
"type": "end",
"data": {
"path": {
"text": "multi-test"
},
"binary_offset": null,
"stats": {
"elapsed": {
"secs": 0,
"nanos": 32451,
"human": "0.000032s"
},
"searches": 1,
"searches_with_match": 1,
"bytes_searched": 44,
"bytes_printed": 323,
"matched_lines": 2,
"matches": 2
}
}
}
{
"data": {
"elapsed_total": {
"human": "0.000662s",
"nanos": 662190,
"secs": 0
},
"stats": {
"bytes_printed": 323,
"bytes_searched": 44,
"elapsed": {
"human": "0.000032s",
"nanos": 32451,
"secs": 0
},
"matched_lines": 2,
"matches": 2,
"searches": 1,
"searches_with_match": 1
}
},
"type": "summary"
}
What is the expected behavior?
I would have expected the following output (which can be obtained by omitting --pcre2):
{
"type": "begin",
"data": {
"path": {
"text": "multi-test"
}
}
}
{
"type": "match",
"data": {
"path": {
"text": "multi-test"
},
"lines": {
"text": "test test foobar test\n"
},
"line_number": 1,
"absolute_offset": 0,
"submatches": [
{
"match": {
"text": "foobar"
},
"start": 10,
"end": 16
}
]
}
}
{
"type": "match",
"data": {
"path": {
"text": "multi-test"
},
"lines": {
"text": "test foobar test test\n"
},
"line_number": 2,
"absolute_offset": 22,
"submatches": [
{
"match": {
"text": "foobar"
},
"start": 5,
"end": 11
}
]
}
}
{
"type": "end",
"data": {
"path": {
"text": "multi-test"
},
"binary_offset": null,
"stats": {
"elapsed": {
"secs": 0,
"nanos": 41638,
"human": "0.000042s"
},
"searches": 1,
"searches_with_match": 1,
"bytes_searched": 44,
"bytes_printed": 449,
"matched_lines": 2,
"matches": 2
}
}
}
{
"data": {
"elapsed_total": {
"human": "0.000554s",
"nanos": 553546,
"secs": 0
},
"stats": {
"bytes_printed": 449,
"bytes_searched": 44,
"elapsed": {
"human": "0.000042s",
"nanos": 41638,
"secs": 0
},
"matched_lines": 2,
"matches": 2,
"searches": 1,
"searches_with_match": 1
}
},
"type": "summary"
}
Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 15.1.0
features:+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2
PCRE2 10.45 is available (JIT is available)
How did you install ripgrep?
pacman
What operating system are you using ripgrep on?
Arch Linux 6.19.11-arch1-1
Describe your bug.
When running
rg --multiline --pcre2 --jsonwith a non-multiline regex that matches on two consecutive lines, ripgrep outputs onematchobject with twosubmatches, instead of two separatematchobjects, each with onesubmatch.This only happens with the
--pcre2option. When runningrg --multiline --jsonwith the same regex, the output will contain two separatematchobjects, each with one submatch.In addition, this happens only when the match is on two consecutive lines but not when a non-matching line is in-between.
What are the steps to reproduce the behavior?
Consider the following input file
test.txt:To reproduce, run:
What is the actual behavior?
Output:
{ "type": "begin", "data": { "path": { "text": "multi-test" } } } { "type": "match", "data": { "path": { "text": "multi-test" }, "lines": { "text": "test test foobar test\ntest foobar test test\n" }, "line_number": 1, "absolute_offset": 0, "submatches": [ { "match": { "text": "foobar" }, "start": 10, "end": 16 }, { "match": { "text": "foobar" }, "start": 27, "end": 33 } ] } } { "type": "end", "data": { "path": { "text": "multi-test" }, "binary_offset": null, "stats": { "elapsed": { "secs": 0, "nanos": 32451, "human": "0.000032s" }, "searches": 1, "searches_with_match": 1, "bytes_searched": 44, "bytes_printed": 323, "matched_lines": 2, "matches": 2 } } } { "data": { "elapsed_total": { "human": "0.000662s", "nanos": 662190, "secs": 0 }, "stats": { "bytes_printed": 323, "bytes_searched": 44, "elapsed": { "human": "0.000032s", "nanos": 32451, "secs": 0 }, "matched_lines": 2, "matches": 2, "searches": 1, "searches_with_match": 1 } }, "type": "summary" }What is the expected behavior?
I would have expected the following output (which can be obtained by omitting
--pcre2):{ "type": "begin", "data": { "path": { "text": "multi-test" } } } { "type": "match", "data": { "path": { "text": "multi-test" }, "lines": { "text": "test test foobar test\n" }, "line_number": 1, "absolute_offset": 0, "submatches": [ { "match": { "text": "foobar" }, "start": 10, "end": 16 } ] } } { "type": "match", "data": { "path": { "text": "multi-test" }, "lines": { "text": "test foobar test test\n" }, "line_number": 2, "absolute_offset": 22, "submatches": [ { "match": { "text": "foobar" }, "start": 5, "end": 11 } ] } } { "type": "end", "data": { "path": { "text": "multi-test" }, "binary_offset": null, "stats": { "elapsed": { "secs": 0, "nanos": 41638, "human": "0.000042s" }, "searches": 1, "searches_with_match": 1, "bytes_searched": 44, "bytes_printed": 449, "matched_lines": 2, "matches": 2 } } } { "data": { "elapsed_total": { "human": "0.000554s", "nanos": 553546, "secs": 0 }, "stats": { "bytes_printed": 449, "bytes_searched": 44, "elapsed": { "human": "0.000042s", "nanos": 41638, "secs": 0 }, "matched_lines": 2, "matches": 2, "searches": 1, "searches_with_match": 1 } }, "type": "summary" }