@@ -19,6 +19,26 @@ about file differences in various formats, including HTML and context and unifie
1919diffs. For comparing directories and files, see also, the :mod: `filecmp ` module.
2020
2121
22+ .. class :: SequenceMatcherBase
23+ :noindex:
24+
25+ Base class for implementing sequence matchers.
26+
27+ At minimum, derived classes must implement ``_get_matching_blocks `` method,
28+ which returns a list of blocks tuple[start_in_a, start_in_b, length].
29+ See ``_get_matching_blocks `` and ``get_matching_blocks `` for more information.
30+
31+ Once implemented, the following methods make use of it and are available:
32+ :meth: `~SequenceMatcherBase.get_matching_blocks `
33+ :meth: `~SequenceMatcherBase.get_opcodes `
34+ :meth: `~SequenceMatcherBase.get_grouped_opcodes `
35+ :meth: `~SequenceMatcherBase.ratio `
36+ :meth: `~SequenceMatcherBase.quick_ratio `
37+ :meth: `~SequenceMatcherBase.real_quick_ratio `
38+
39+ See :class: `SequenceMatcher ` for example implementation.
40+
41+
2242.. class :: SequenceMatcher
2343 :noindex:
2444
@@ -88,7 +108,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
88108 The constructor for this class is:
89109
90110
91- .. method :: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
111+ .. method :: __init__(tabsize=8, wrapcolumn=None,
112+ linejunk=None, charjunk=IS_CHARACTER_JUNK, differ=None)
92113
93114 Initializes instance of :class: `HtmlDiff `.
94115
@@ -98,9 +119,12 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
98119 *wrapcolumn * is an optional keyword to specify column number where lines are
99120 broken and wrapped, defaults to ``None `` where lines are not wrapped.
100121
101- *linejunk * and *charjunk * are optional keyword arguments passed into :func: `ndiff `
102- (used by :class: `HtmlDiff ` to generate the side by side HTML differences). See
103- :func: `ndiff ` documentation for argument default values and descriptions.
122+ *linejunk *, *charjunk * and *differ * are optional keyword arguments passed into
123+ :func: `ndiff ` (used by :class: `HtmlDiff ` to generate the side by side HTML differences).
124+ See :func: `ndiff ` documentation for argument default values and descriptions.
125+
126+ .. versionchanged :: 3.15
127+ Added *differ * argument.
104128
105129 The following methods are public:
106130
@@ -143,7 +167,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
143167
144168
145169
146- .. function :: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
170+ .. function :: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='',
171+ n=3, lineterm='\n ', matcher=None)
147172
148173 Compare *a * and *b * (lists of strings); return a delta (a :term: `generator `
149174 generating the delta lines) in context diff format.
@@ -161,6 +186,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
161186 For inputs that do not have trailing newlines, set the *lineterm * argument to
162187 ``"" `` so that the output will be uniformly newline free.
163188
189+ Optional argument *matcher * is a callable with 3 optional arguments and returns
190+ :class: `SequenceMatcherBase ` instance. i.e. ``matcher(isjunk=None, a='', b='') ``.
191+ Default (if ``None ``) is a :class: `SequenceMatcher ` class.
192+
164193 The context diff format normally has a header for filenames and modification
165194 times. Any or all of these may be specified using strings for *fromfile *,
166195 *tofile *, *fromfiledate *, and *tofiledate *. The modification times are normally
@@ -189,8 +218,11 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
189218
190219 See :ref: `difflib-interface ` for a more detailed example.
191220
221+ .. versionchanged :: 3.15
222+ Added *matcher * argument.
223+
192224
193- .. function :: get_close_matches(word, possibilities, n=3, cutoff=0.6)
225+ .. function :: get_close_matches(word, possibilities, n=3, cutoff=0.6, matcher=None )
194226
195227 Return a list of the best "good enough" matches. *word * is a sequence for which
196228 close matches are desired (typically a string), and *possibilities * is a list of
@@ -202,6 +234,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
202234 Optional argument *cutoff * (default ``0.6 ``) is a float in the range [0, 1].
203235 Possibilities that don't score at least that similar to *word * are ignored.
204236
237+ Optional argument *matcher * is a callable with 3 optional arguments and returns
238+ :class: `SequenceMatcherBase ` instance. i.e. ``matcher(isjunk=None, a='', b='') ``.
239+ Default (if ``None ``) is a :class: `SequenceMatcher ` class.
240+
205241 The best (no more than *n *) matches among the possibilities are returned in a
206242 list, sorted by similarity score, most similar first.
207243
@@ -215,8 +251,11 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
215251 >>> get_close_matches(' accept' , keyword.kwlist)
216252 ['except']
217253
254+ .. versionchanged :: 3.15
255+ Added *matcher * argument.
256+
218257
219- .. function :: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
258+ .. function :: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK, differ=None )
220259
221260 Compare *a * and *b * (lists of strings); return a :class: `Differ `\ -style
222261 delta (a :term: `generator ` generating the delta lines).
@@ -237,6 +276,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
237276 function :func: `IS_CHARACTER_JUNK `, which filters out whitespace characters (a
238277 blank or tab; it's a bad idea to include newline in this!).
239278
279+ *differ *: callable that takes 2 optional arguments and returns
280+ :class: `Differ ` instance. i.e. ``differ(linejunk=None, charjunk=None) ``.
281+ Default (if ``None ``) is a :class: `Differ ` class.
282+
240283 >>> diff = ndiff(' one\n two\n three\n ' .splitlines(keepends = True ),
241284 ... ' ore\n tree\n emu\n ' .splitlines(keepends = True ))
242285 >>> print (' ' .join(diff), end = " " )
@@ -250,6 +293,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
250293 + tree
251294 + emu
252295
296+ .. versionchanged :: 3.15
297+ Added *differ * argument.
298+
253299
254300.. function :: restore(sequence, which)
255301
@@ -274,7 +320,8 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
274320 emu
275321
276322
277- .. function :: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n', *, color=False)
323+ .. function :: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='',
324+ n=3, lineterm='\n ', *, color=False, matcher=None)
278325
279326 Compare *a * and *b * (lists of strings); return a delta (a :term: `generator `
280327 generating the delta lines) in unified diff format.
@@ -297,6 +344,10 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
297344 :program: `git diff --color `. Even if enabled, it can be
298345 :ref: `controlled using environment variables <using-on-controlling-color >`.
299346
347+ Optional argument *matcher * is a callable with 3 optional arguments and returns
348+ :class: `SequenceMatcherBase ` instance. i.e. ``matcher(isjunk=None, a='', b='') ``.
349+ Default (if ``None ``) is a :class: `SequenceMatcher ` class.
350+
300351 The unified diff format normally has a header for filenames and modification
301352 times. Any or all of these may be specified using strings for *fromfile *,
302353 *tofile *, *fromfiledate *, and *tofiledate *. The modification times are normally
@@ -321,6 +372,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
321372
322373 .. versionchanged :: 3.15
323374 Added the *color * parameter.
375+ Added *matcher * argument.
324376
325377
326378.. function :: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')
@@ -360,15 +412,14 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
360412 was published in Dr. Dobb's Journal in July, 1988.
361413
362414
363- .. _sequence-matcher :
364-
365- SequenceMatcher Objects
366- -----------------------
415+ .. _sequence-matcher-base :
367416
368- The :class: `SequenceMatcher ` class has this constructor:
417+ SequenceMatcherBase
418+ -------------------
369419
420+ The :class: `SequenceMatcherBase ` class has this constructor:
370421
371- .. class :: SequenceMatcher (isjunk=None, a='', b='', autojunk=True )
422+ .. class :: SequenceMatcherBase (isjunk=None, a='', b='')
372423
373424 Optional argument *isjunk * must be ``None `` (the default) or a one-argument
374425 function that takes a sequence element and returns true if and only if the
@@ -384,33 +435,19 @@ The :class:`SequenceMatcher` class has this constructor:
384435 The optional arguments *a * and *b * are sequences to be compared; both default to
385436 empty strings. The elements of both sequences must be :term: `hashable `.
386437
387- The optional argument *autojunk * can be used to disable the automatic junk
388- heuristic.
389-
390- .. versionchanged :: 3.2
391- Added the *autojunk * parameter.
392-
393- SequenceMatcher objects get three data attributes: *bjunk * is the
394- set of elements of *b * for which *isjunk * is ``True ``; *bpopular * is the set of
395- non-junk elements considered popular by the heuristic (if it is not
396- disabled); *b2j * is a dict mapping the remaining elements of *b * to a list
397- of positions where they occur. All three are reset whenever *b * is reset
398- with :meth: `set_seqs ` or :meth: `set_seq2 `.
399-
400438 .. versionadded :: 3.2
401439 The *bjunk * and *bpopular * attributes.
402440
403- :class: `SequenceMatcher ` objects have the following methods:
441+ :class: `SequenceMatcherBase ` objects have the following methods:
404442
405443 .. method :: set_seqs(a, b)
406444
407445 Set the two sequences to be compared.
408446
409- :class: `SequenceMatcher ` computes and caches detailed information about the
410- second sequence, so if you want to compare one sequence against many
411- sequences, use :meth: `set_seq2 ` to set the commonly used sequence once and
412- call :meth: `set_seq1 ` repeatedly, once for each of the other sequences.
413-
447+ :class: `SequenceMatcherBase ` inends to cache detailed information about the
448+ second sequence. :meth: `set_seq2 ` clears cache of :meth: `quick_ratio ` method.
449+ In addition :meth: `_prepare_seq2 `, which is called at the end of :meth: `set_seq2 `,
450+ can be implemented by derrived class for alignment algorithm cache logic.
414451
415452 .. method :: set_seq1(a)
416453
@@ -423,49 +460,7 @@ The :class:`SequenceMatcher` class has this constructor:
423460 Set the second sequence to be compared. The first sequence to be compared
424461 is not changed.
425462
426-
427- .. method :: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
428-
429- Find longest matching block in ``a[alo:ahi] `` and ``b[blo:bhi] ``.
430-
431- If *isjunk * was omitted or ``None ``, :meth: `find_longest_match ` returns
432- ``(i, j, k) `` such that ``a[i:i+k] `` is equal to ``b[j:j+k] ``, where ``alo
433- <= i <= i+k <= ahi `` and ``blo <= j <= j+k <= bhi ``. For all ``(i', j',
434- k') `` meeting those conditions, the additional conditions ``k >= k' ``, ``i
435- <= i' ``, and if ``i == i' ``, ``j <= j' `` are also met. In other words, of
436- all maximal matching blocks, return one that starts earliest in *a *, and
437- of all those maximal matching blocks that start earliest in *a *, return
438- the one that starts earliest in *b *.
439-
440- >>> s = SequenceMatcher(None , " abcd" , " abcd abcd" )
441- >>> s.find_longest_match(0 , 5 , 0 , 9 )
442- Match(a=0, b=4, size=5)
443-
444- If *isjunk * was provided, first the longest matching block is determined
445- as above, but with the additional restriction that no junk element appears
446- in the block. Then that block is extended as far as possible by matching
447- (only) junk elements on both sides. So the resulting block never matches
448- on junk except as identical junk happens to be adjacent to an interesting
449- match.
450-
451- Here's the same example as before, but considering blanks to be junk. That
452- prevents ``' abcd' `` from matching the ``' abcd' `` at the tail end of the
453- second sequence directly. Instead only the ``'abcd' `` can match, and
454- matches the leftmost ``'abcd' `` in the second sequence:
455-
456- >>> s = SequenceMatcher(lambda x : x== " " , " abcd" , " abcd abcd" )
457- >>> s.find_longest_match(0 , 5 , 0 , 9 )
458- Match(a=1, b=0, size=4)
459-
460- If no blocks match, this returns ``(alo, blo, 0) ``.
461-
462- This method returns a :term: `named tuple ` ``Match(a, b, size) ``.
463-
464- .. versionchanged :: 3.9
465- Added default arguments.
466-
467-
468- .. method :: get_matching_blocks()
463+ .. method :: get_matching_blocks()
469464
470465 Return list of triples describing non-overlapping matching subsequences.
471466 Each triple is of the form ``(i, j, n) ``,
@@ -487,6 +482,14 @@ The :class:`SequenceMatcher` class has this constructor:
487482 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
488483
489484
485+ .. method :: _prepare_seq2()
486+
487+ Preparation method that is called at the end of :meth: `set_seq2 `.
488+
489+ By default it does nothing, but can be implemented by derrived class
490+ for alignment algorithm cache logic.
491+
492+
490493 .. method :: get_opcodes()
491494
492495 Return list of 5-tuples describing how to turn *a * into *b *. Each tuple is
@@ -588,6 +591,87 @@ are always at least as large as :meth:`~SequenceMatcher.ratio`:
588591 1.0
589592
590593
594+ .. _sequence-matcher :
595+
596+ SequenceMatcher Objects
597+ -----------------------
598+
599+ The :class: `SequenceMatcher ` class has this constructor:
600+
601+
602+ .. class :: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
603+
604+ *isjunk *, *a * and *b * are passed on to ``SequenceMatcherBase `` constructor.
605+ See :class: `SequenceMatcherBase ` documentation.
606+
607+ The optional argument *autojunk * can be used to disable the automatic junk
608+ heuristic.
609+
610+ SequenceMatcher objects get three data attributes: *bjunk * is the
611+ set of elements of *b * for which *isjunk * is ``True ``; *bpopular * is the set of
612+ non-junk elements considered popular by the heuristic (if it is not
613+ disabled); *b2j * is a dict mapping the remaining elements of *b * to a list
614+ of positions where they occur. All three are reset whenever *b * is reset
615+ with :meth: `set_seqs ` or :meth: `set_seq2 `.
616+
617+ .. versionchanged :: 3.2
618+ Added the *autojunk * parameter.
619+
620+ :class: `SequenceMatcher ` computes and caches detailed information about the
621+ second sequence, so if you want to compare one sequence against many
622+ sequences, use :meth: `set_seq2 ` to set the commonly used sequence once and
623+ call :meth: `set_seq1 ` repeatedly, once for each of the other sequences.
624+
625+ In addition to methods implemented by :class: `SequenceMatcherBase `,
626+ :class: `SequenceMatcher ` objects have the following methods:
627+
628+
629+ .. method :: _prepare_seq2()
630+
631+ Implemented to prepare *b2j *, *bjunk * and *bpopular * caches.
632+
633+
634+ .. method :: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
635+
636+ Find longest matching block in ``a[alo:ahi] `` and ``b[blo:bhi] ``.
637+
638+ If *isjunk * was omitted or ``None ``, :meth: `find_longest_match ` returns
639+ ``(i, j, k) `` such that ``a[i:i+k] `` is equal to ``b[j:j+k] ``, where ``alo
640+ <= i <= i+k <= ahi `` and ``blo <= j <= j+k <= bhi ``. For all ``(i', j',
641+ k') `` meeting those conditions, the additional conditions ``k >= k' ``, ``i
642+ <= i' ``, and if ``i == i' ``, ``j <= j' `` are also met. In other words, of
643+ all maximal matching blocks, return one that starts earliest in *a *, and
644+ of all those maximal matching blocks that start earliest in *a *, return
645+ the one that starts earliest in *b *.
646+
647+ >>> s = SequenceMatcher(None , " abcd" , " abcd abcd" )
648+ >>> s.find_longest_match(0 , 5 , 0 , 9 )
649+ Match(a=0, b=4, size=5)
650+
651+ If *isjunk * was provided, first the longest matching block is determined
652+ as above, but with the additional restriction that no junk element appears
653+ in the block. Then that block is extended as far as possible by matching
654+ (only) junk elements on both sides. So the resulting block never matches
655+ on junk except as identical junk happens to be adjacent to an interesting
656+ match.
657+
658+ Here's the same example as before, but considering blanks to be junk. That
659+ prevents ``' abcd' `` from matching the ``' abcd' `` at the tail end of the
660+ second sequence directly. Instead only the ``'abcd' `` can match, and
661+ matches the leftmost ``'abcd' `` in the second sequence:
662+
663+ >>> s = SequenceMatcher(lambda x : x== " " , " abcd" , " abcd abcd" )
664+ >>> s.find_longest_match(0 , 5 , 0 , 9 )
665+ Match(a=1, b=0, size=4)
666+
667+ If no blocks match, this returns ``(alo, blo, 0) ``.
668+
669+ This method returns a :term: `named tuple ` ``Match(a, b, size) ``.
670+
671+ .. versionchanged :: 3.9
672+ Added default arguments.
673+
674+
591675.. _sequencematcher-examples :
592676
593677SequenceMatcher Examples
@@ -653,7 +737,7 @@ locality, at the occasional cost of producing a longer diff.
653737The :class: `Differ ` class has this constructor:
654738
655739
656- .. class :: Differ(linejunk=None, charjunk=None)
740+ .. class :: Differ(linejunk=None, charjunk=None, linematcher=None, charmatcher=None )
657741 :noindex:
658742
659743 Optional keyword parameters *linejunk * and *charjunk * are for filter functions
@@ -673,6 +757,14 @@ The :class:`Differ` class has this constructor:
673757 :meth: `~SequenceMatcher.find_longest_match ` method's *isjunk *
674758 parameter for an explanation.
675759
760+ *linematcher *: callable with 3 optional arguments which returns
761+ :class: `~SequenceMatcherBase ` instance. i.e. ``matcher(isjunk=None, a='', b='') ``.
762+ Default (if ``None ``) is a :class: `SequenceMatcher ` class.
763+
764+ *charmatcher *: callable with 3 optional arguments which returns
765+ :class: `~SequenceMatcherBase ` instance. i.e. ``matcher(isjunk=None, a='', b='') ``.
766+ Default (if ``None ``) is a :class: `SequenceMatcher ` class.
767+
676768 :class: `Differ ` objects are used (deltas generated) via a single method:
677769
678770
0 commit comments