Skip to content

Commit b4be13f

Browse files
ianyu93Ian Yupre-commit-ci[bot]
authored
Updated Anonymization (#406)
* Updated ac_dc * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update .gitmodules Updating submodule config * updated test case Updated test case for faker address to be street_address rather than full address * Updated apply_regex_anonymization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated anonymization By default, tag_type for anonymization is None to allow passing iterables as tag_type. This allows applying anonymization of specific types for specific situation. By default, if tag_type is None, all keys in regex_rulebase would be applied. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Ian Yu <ianyu@MBPC02FC7Z5MD6R.phub.net.cable.rogers.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 452e3ac commit b4be13f

1 file changed

Lines changed: 17 additions & 2 deletions

File tree

ac_dc/anonymization.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,14 +182,29 @@
182182

183183

184184
def apply_regex_anonymization(
185-
sentence: str, lang_id: str, context_window: int = 20, anonymize_condition=None
185+
sentence: str,
186+
lang_id: str,
187+
context_window: int = 20,
188+
anonymize_condition=None,
189+
tag_type=None,
186190
) -> str:
191+
"""
192+
Params:
193+
==================
194+
sentence: str, the sentence to be anonymized
195+
lang_id: str, the language id of the sentence
196+
context_window: int, the context window size
197+
anonymize_condition: function, the anonymization condition
198+
tag_type: iterable, the tag types of the anonymization. All keys in regex_rulebase by default
199+
"""
200+
if tag_type == None:
201+
tag_type = regex_rulebase.keys()
187202
lang_id = lang_id.split("_")[0]
188203
ner = detect_ner_with_regex_and_context(
189204
sentence=sentence,
190205
src_lang=lang_id,
191206
context_window=context_window,
192-
tag_type=regex_rulebase.keys(),
207+
tag_type=tag_type,
193208
)
194209
if anonymize_condition:
195210
for (ent, start, end, tag) in ner:

0 commit comments

Comments
 (0)