Skip to content

Commit 9f7b8d8

Browse files
committed
initial change
1 parent 1ac9d13 commit 9f7b8d8

3 files changed

Lines changed: 806 additions & 471 deletions

File tree

Doc/library/difflib.rst

Lines changed: 169 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,26 @@ about file differences in various formats, including HTML and context and unifie
1919
diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
2020

2121

22+
.. class:: SequenceMatcherBase
23+
:noindex:
24+
25+
Base class for implementing sequence matchers.
26+
27+
At minimum, derived classes must implement ``_get_matching_blocks`` method,
28+
which returns a list of blocks tuple[start_in_a, start_in_b, length].
29+
See ``_get_matching_blocks`` and ``get_matching_blocks`` for more information.
30+
31+
Once implemented, the following methods make use of it and are available:
32+
:meth:`~SequenceMatcherBase.get_matching_blocks`
33+
:meth:`~SequenceMatcherBase.get_opcodes`
34+
:meth:`~SequenceMatcherBase.get_grouped_opcodes`
35+
:meth:`~SequenceMatcherBase.ratio`
36+
:meth:`~SequenceMatcherBase.quick_ratio`
37+
:meth:`~SequenceMatcherBase.real_quick_ratio`
38+
39+
See :class:`SequenceMatcher` for example implementation.
40+
41+
2242
.. class:: SequenceMatcher
2343
:noindex:
2444

@@ -88,7 +108,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
88108
The constructor for this class is:
89109

90110

91-
.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
111+
.. method:: __init__(tabsize=8, wrapcolumn=None,
112+
linejunk=None, charjunk=IS_CHARACTER_JUNK, differ=None)
92113

93114
Initializes instance of :class:`HtmlDiff`.
94115

@@ -98,9 +119,12 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
98119
*wrapcolumn* is an optional keyword to specify column number where lines are
99120
broken and wrapped, defaults to ``None`` where lines are not wrapped.
100121

101-
*linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
102-
(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
103-
:func:`ndiff` documentation for argument default values and descriptions.
122+
*linejunk*, *charjunk* and *differ* are optional keyword arguments passed into
123+
:func:`ndiff` (used by :class:`HtmlDiff` to generate the side by side HTML differences).
124+
See :func:`ndiff` documentation for argument default values and descriptions.
125+
126+
.. versionchanged:: 3.15
127+
Added *differ* argument.
104128

105129
The following methods are public:
106130

@@ -143,7 +167,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
143167

144168

145169

146-
.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
170+
.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='',
171+
n=3, lineterm='\n', matcher=None)
147172

148173
Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
149174
generating the delta lines) in context diff format.
@@ -161,6 +186,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
161186
For inputs that do not have trailing newlines, set the *lineterm* argument to
162187
``""`` so that the output will be uniformly newline free.
163188

189+
Optional argument *matcher* is a callable with 3 optional arguments and returns
190+
:class:`SequenceMatcherBase` instance. i.e. ``matcher(isjunk=None, a='', b='')``.
191+
Default (if ``None``) is a :class:`SequenceMatcher` class.
192+
164193
The context diff format normally has a header for filenames and modification
165194
times. Any or all of these may be specified using strings for *fromfile*,
166195
*tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
@@ -189,8 +218,11 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
189218

190219
See :ref:`difflib-interface` for a more detailed example.
191220

221+
.. versionchanged:: 3.15
222+
Added *matcher* argument.
223+
192224

193-
.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
225+
.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6, matcher=None)
194226

195227
Return a list of the best "good enough" matches. *word* is a sequence for which
196228
close matches are desired (typically a string), and *possibilities* is a list of
@@ -202,6 +234,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
202234
Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
203235
Possibilities that don't score at least that similar to *word* are ignored.
204236

237+
Optional argument *matcher* is a callable with 3 optional arguments and returns
238+
:class:`SequenceMatcherBase` instance. i.e. ``matcher(isjunk=None, a='', b='')``.
239+
Default (if ``None``) is a :class:`SequenceMatcher` class.
240+
205241
The best (no more than *n*) matches among the possibilities are returned in a
206242
list, sorted by similarity score, most similar first.
207243

@@ -215,8 +251,11 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
215251
>>> get_close_matches('accept', keyword.kwlist)
216252
['except']
217253

254+
.. versionchanged:: 3.15
255+
Added *matcher* argument.
256+
218257

219-
.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
258+
.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK, differ=None)
220259

221260
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
222261
delta (a :term:`generator` generating the delta lines).
@@ -237,6 +276,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
237276
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
238277
blank or tab; it's a bad idea to include newline in this!).
239278

279+
*differ*: callable that takes 2 optional arguments and returns
280+
:class:`Differ` instance. i.e. ``differ(linejunk=None, charjunk=None)``.
281+
Default (if ``None``) is a :class:`Differ` class.
282+
240283
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
241284
... 'ore\ntree\nemu\n'.splitlines(keepends=True))
242285
>>> print(''.join(diff), end="")
@@ -250,6 +293,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
250293
+ tree
251294
+ emu
252295

296+
.. versionchanged:: 3.15
297+
Added *differ* argument.
298+
253299

254300
.. function:: restore(sequence, which)
255301

@@ -274,7 +320,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
274320
emu
275321

276322

277-
.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n', *, color=False)
323+
.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='',
324+
n=3, lineterm='\n', *, color=False, matcher=None)
278325
279326
Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
280327
generating the delta lines) in unified diff format.
@@ -297,6 +344,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
297344
:program:`git diff --color`. Even if enabled, it can be
298345
:ref:`controlled using environment variables <using-on-controlling-color>`.
299346

347+
Optional argument *matcher* is a callable with 3 optional arguments and returns
348+
:class:`SequenceMatcherBase` instance. i.e. ``matcher(isjunk=None, a='', b='')``.
349+
Default (if ``None``) is a :class:`SequenceMatcher` class.
350+
300351
The unified diff format normally has a header for filenames and modification
301352
times. Any or all of these may be specified using strings for *fromfile*,
302353
*tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
@@ -321,6 +372,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
321372

322373
.. versionchanged:: 3.15
323374
Added the *color* parameter.
375+
Added *matcher* argument.
324376

325377

326378
.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')
@@ -360,15 +412,14 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
360412
was published in Dr. Dobb's Journal in July, 1988.
361413

362414

363-
.. _sequence-matcher:
364-
365-
SequenceMatcher Objects
366-
-----------------------
415+
.. _sequence-matcher-base:
367416

368-
The :class:`SequenceMatcher` class has this constructor:
417+
SequenceMatcherBase
418+
-------------------
369419

420+
The :class:`SequenceMatcherBase` class has this constructor:
370421

371-
.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
422+
.. class:: SequenceMatcherBase(isjunk=None, a='', b='')
372423

373424
Optional argument *isjunk* must be ``None`` (the default) or a one-argument
374425
function that takes a sequence element and returns true if and only if the
@@ -384,33 +435,19 @@ The :class:`SequenceMatcher` class has this constructor:
384435
The optional arguments *a* and *b* are sequences to be compared; both default to
385436
empty strings. The elements of both sequences must be :term:`hashable`.
386437

387-
The optional argument *autojunk* can be used to disable the automatic junk
388-
heuristic.
389-
390-
.. versionchanged:: 3.2
391-
Added the *autojunk* parameter.
392-
393-
SequenceMatcher objects get three data attributes: *bjunk* is the
394-
set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
395-
non-junk elements considered popular by the heuristic (if it is not
396-
disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
397-
of positions where they occur. All three are reset whenever *b* is reset
398-
with :meth:`set_seqs` or :meth:`set_seq2`.
399-
400438
.. versionadded:: 3.2
401439
The *bjunk* and *bpopular* attributes.
402440

403-
:class:`SequenceMatcher` objects have the following methods:
441+
:class:`SequenceMatcherBase` objects have the following methods:
404442

405443
.. method:: set_seqs(a, b)
406444

407445
Set the two sequences to be compared.
408446

409-
:class:`SequenceMatcher` computes and caches detailed information about the
410-
second sequence, so if you want to compare one sequence against many
411-
sequences, use :meth:`set_seq2` to set the commonly used sequence once and
412-
call :meth:`set_seq1` repeatedly, once for each of the other sequences.
413-
447+
:class:`SequenceMatcherBase` inends to cache detailed information about the
448+
second sequence. :meth:`set_seq2` clears cache of :meth:`quick_ratio` method.
449+
In addition :meth:`_prepare_seq2`, which is called at the end of :meth:`set_seq2`,
450+
can be implemented by derrived class for alignment algorithm cache logic.
414451

415452
.. method:: set_seq1(a)
416453

@@ -423,49 +460,7 @@ The :class:`SequenceMatcher` class has this constructor:
423460
Set the second sequence to be compared. The first sequence to be compared
424461
is not changed.
425462

426-
427-
.. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
428-
429-
Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
430-
431-
If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
432-
``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
433-
<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
434-
k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
435-
<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
436-
all maximal matching blocks, return one that starts earliest in *a*, and
437-
of all those maximal matching blocks that start earliest in *a*, return
438-
the one that starts earliest in *b*.
439-
440-
>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
441-
>>> s.find_longest_match(0, 5, 0, 9)
442-
Match(a=0, b=4, size=5)
443-
444-
If *isjunk* was provided, first the longest matching block is determined
445-
as above, but with the additional restriction that no junk element appears
446-
in the block. Then that block is extended as far as possible by matching
447-
(only) junk elements on both sides. So the resulting block never matches
448-
on junk except as identical junk happens to be adjacent to an interesting
449-
match.
450-
451-
Here's the same example as before, but considering blanks to be junk. That
452-
prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
453-
second sequence directly. Instead only the ``'abcd'`` can match, and
454-
matches the leftmost ``'abcd'`` in the second sequence:
455-
456-
>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
457-
>>> s.find_longest_match(0, 5, 0, 9)
458-
Match(a=1, b=0, size=4)
459-
460-
If no blocks match, this returns ``(alo, blo, 0)``.
461-
462-
This method returns a :term:`named tuple` ``Match(a, b, size)``.
463-
464-
.. versionchanged:: 3.9
465-
Added default arguments.
466-
467-
468-
.. method:: get_matching_blocks()
463+
.. method:: get_matching_blocks()
469464

470465
Return list of triples describing non-overlapping matching subsequences.
471466
Each triple is of the form ``(i, j, n)``,
@@ -487,6 +482,14 @@ The :class:`SequenceMatcher` class has this constructor:
487482
[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
488483

489484

485+
.. method:: _prepare_seq2()
486+
487+
Preparation method that is called at the end of :meth:`set_seq2`.
488+
489+
By default it does nothing, but can be implemented by derrived class
490+
for alignment algorithm cache logic.
491+
492+
490493
.. method:: get_opcodes()
491494

492495
Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
@@ -588,6 +591,87 @@ are always at least as large as :meth:`~SequenceMatcher.ratio`:
588591
1.0
589592

590593

594+
.. _sequence-matcher:
595+
596+
SequenceMatcher Objects
597+
-----------------------
598+
599+
The :class:`SequenceMatcher` class has this constructor:
600+
601+
602+
.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
603+
604+
*isjunk*, *a* and *b* are passed on to ``SequenceMatcherBase`` constructor.
605+
See :class:`SequenceMatcherBase` documentation.
606+
607+
The optional argument *autojunk* can be used to disable the automatic junk
608+
heuristic.
609+
610+
SequenceMatcher objects get three data attributes: *bjunk* is the
611+
set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
612+
non-junk elements considered popular by the heuristic (if it is not
613+
disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
614+
of positions where they occur. All three are reset whenever *b* is reset
615+
with :meth:`set_seqs` or :meth:`set_seq2`.
616+
617+
.. versionchanged:: 3.2
618+
Added the *autojunk* parameter.
619+
620+
:class:`SequenceMatcher` computes and caches detailed information about the
621+
second sequence, so if you want to compare one sequence against many
622+
sequences, use :meth:`set_seq2` to set the commonly used sequence once and
623+
call :meth:`set_seq1` repeatedly, once for each of the other sequences.
624+
625+
In addition to methods implemented by :class:`SequenceMatcherBase`,
626+
:class:`SequenceMatcher` objects have the following methods:
627+
628+
629+
.. method:: _prepare_seq2()
630+
631+
Implemented to prepare *b2j*, *bjunk* and *bpopular* caches.
632+
633+
634+
.. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
635+
636+
Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
637+
638+
If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
639+
``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
640+
<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
641+
k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
642+
<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
643+
all maximal matching blocks, return one that starts earliest in *a*, and
644+
of all those maximal matching blocks that start earliest in *a*, return
645+
the one that starts earliest in *b*.
646+
647+
>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
648+
>>> s.find_longest_match(0, 5, 0, 9)
649+
Match(a=0, b=4, size=5)
650+
651+
If *isjunk* was provided, first the longest matching block is determined
652+
as above, but with the additional restriction that no junk element appears
653+
in the block. Then that block is extended as far as possible by matching
654+
(only) junk elements on both sides. So the resulting block never matches
655+
on junk except as identical junk happens to be adjacent to an interesting
656+
match.
657+
658+
Here's the same example as before, but considering blanks to be junk. That
659+
prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
660+
second sequence directly. Instead only the ``'abcd'`` can match, and
661+
matches the leftmost ``'abcd'`` in the second sequence:
662+
663+
>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
664+
>>> s.find_longest_match(0, 5, 0, 9)
665+
Match(a=1, b=0, size=4)
666+
667+
If no blocks match, this returns ``(alo, blo, 0)``.
668+
669+
This method returns a :term:`named tuple` ``Match(a, b, size)``.
670+
671+
.. versionchanged:: 3.9
672+
Added default arguments.
673+
674+
591675
.. _sequencematcher-examples:
592676

593677
SequenceMatcher Examples
@@ -653,7 +737,7 @@ locality, at the occasional cost of producing a longer diff.
653737
The :class:`Differ` class has this constructor:
654738

655739

656-
.. class:: Differ(linejunk=None, charjunk=None)
740+
.. class:: Differ(linejunk=None, charjunk=None, linematcher=None, charmatcher=None)
657741
:noindex:
658742

659743
Optional keyword parameters *linejunk* and *charjunk* are for filter functions
@@ -673,6 +757,14 @@ The :class:`Differ` class has this constructor:
673757
:meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
674758
parameter for an explanation.
675759

760+
*linematcher*: callable with 3 optional arguments which returns
761+
:class:`~SequenceMatcherBase` instance. i.e. ``matcher(isjunk=None, a='', b='')``.
762+
Default (if ``None``) is a :class:`SequenceMatcher` class.
763+
764+
*charmatcher*: callable with 3 optional arguments which returns
765+
:class:`~SequenceMatcherBase` instance. i.e. ``matcher(isjunk=None, a='', b='')``.
766+
Default (if ``None``) is a :class:`SequenceMatcher` class.
767+
676768
:class:`Differ` objects are used (deltas generated) via a single method:
677769

678770

0 commit comments

Comments
 (0)