Elegant way to replace substring in a regex with optional groups in Python? The Next CEO of Stack OverflowCapturing optional regex segment with PHPFind and replace String with a substring resultOptimal string literal tokenizing algorithmEval is evil: Dynamic method calls from named regex groups in Python 3Improving CSV filtering with Python using regexJavaScript Regex Test and ReplaceReplace fixed width values over 530px with 100% using RegExpython recursive regex optimizationRecursively replace string placeholders with parameterized phrasesFaster way of replacing strings in large pandas dataframe with regex
Is 'diverse range' a pleonastic phrase?
Is "for causing autism in X" grammatical?
Is it professional to write unrelated content in an almost-empty email?
How did the Bene Gesserit know how to make a Kwisatz Haderach?
Make solar eclipses exceedingly rare, but still have new moons
Why don't programming languages automatically manage the synchronous/asynchronous problem?
Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?
What's the best way to handle refactoring a big file?
Preparing Indesign booklet with .psd graphics for print
How to count occurrences of text in a file?
sp_blitzCache results Memory grants
How do we know the LHC results are robust?
Won the lottery - how do I keep the money?
Is micro rebar a better way to reinforce concrete than rebar?
Why does standard notation not preserve intervals (visually)
What can we do to stop prior company from asking us questions?
Unreliable Magic - Is it worth it?
Should I tutor a student who I know has cheated on their homework?
Does it take more energy to get to Venus or to Mars?
Why do remote companies require working in the US?
How do I avoid eval and parse?
If a black hole is created from light, can this black hole then move at speed of light?
Why has the US not been more assertive in confronting Russia in recent years?
How do I reset passwords on multiple websites easily?
Elegant way to replace substring in a regex with optional groups in Python?
The Next CEO of Stack OverflowCapturing optional regex segment with PHPFind and replace String with a substring resultOptimal string literal tokenizing algorithmEval is evil: Dynamic method calls from named regex groups in Python 3Improving CSV filtering with Python using regexJavaScript Regex Test and ReplaceReplace fixed width values over 530px with 100% using RegExpython recursive regex optimizationRecursively replace string placeholders with parameterized phrasesFaster way of replacing strings in large pandas dataframe with regex
$begingroup$
Given a string taken from the following set:
strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]
I would like to constuct a function which replaces subject, color and optional verb from this string with others values.
All strings match a certain regex pattern as follow:
regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"
The expected output of such function would look like this:
repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"
Here is the solution I come with (I can't use .replace()
because there is edge cases if the string contains the subject twice for example):
def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string
Do you think there is a more straightforward way to implement this?
python python-3.x strings regex
$endgroup$
add a comment |
$begingroup$
Given a string taken from the following set:
strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]
I would like to constuct a function which replaces subject, color and optional verb from this string with others values.
All strings match a certain regex pattern as follow:
regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"
The expected output of such function would look like this:
repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"
Here is the solution I come with (I can't use .replace()
because there is edge cases if the string contains the subject twice for example):
def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string
Do you think there is a more straightforward way to implement this?
python python-3.x strings regex
$endgroup$
$begingroup$
Do you have to use a regex? If not,split(" ")
the string into words, replace words 1, 3, and possibly 6, then" ".join(...)
it back into a sentence.
$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,repl("The meloon is orange", "orange", "great", "like")
or simplyrepl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
Given a string taken from the following set:
strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]
I would like to constuct a function which replaces subject, color and optional verb from this string with others values.
All strings match a certain regex pattern as follow:
regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"
The expected output of such function would look like this:
repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"
Here is the solution I come with (I can't use .replace()
because there is edge cases if the string contains the subject twice for example):
def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string
Do you think there is a more straightforward way to implement this?
python python-3.x strings regex
$endgroup$
Given a string taken from the following set:
strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]
I would like to constuct a function which replaces subject, color and optional verb from this string with others values.
All strings match a certain regex pattern as follow:
regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"
The expected output of such function would look like this:
repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"
Here is the solution I come with (I can't use .replace()
because there is edge cases if the string contains the subject twice for example):
def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string
Do you think there is a more straightforward way to implement this?
python python-3.x strings regex
python python-3.x strings regex
edited 14 hours ago
Reinderien
4,935825
4,935825
asked 15 hours ago
DelganDelgan
242111
242111
$begingroup$
Do you have to use a regex? If not,split(" ")
the string into words, replace words 1, 3, and possibly 6, then" ".join(...)
it back into a sentence.
$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,repl("The meloon is orange", "orange", "great", "like")
or simplyrepl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
Do you have to use a regex? If not,split(" ")
the string into words, replace words 1, 3, and possibly 6, then" ".join(...)
it back into a sentence.
$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,repl("The meloon is orange", "orange", "great", "like")
or simplyrepl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago
$begingroup$
Do you have to use a regex? If not,
split(" ")
the string into words, replace words 1, 3, and possibly 6, then " ".join(...)
it back into a sentence.$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
Do you have to use a regex? If not,
split(" ")
the string into words, replace words 1, 3, and possibly 6, then " ".join(...)
it back into a sentence.$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,
repl("The meloon is orange", "orange", "great", "like")
or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,
repl("The meloon is orange", "orange", "great", "like")
or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
import re
regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)
def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new
def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'
Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.
$endgroup$
2
$begingroup$
I did not knowexpand()
, this seems very useful. Thanks!
$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data
:
You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:
import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag
def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]
def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)
for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)
new_sentence = []
for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)
return ' '.join(new_sentence)
repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'
This version is brittle though, because some verbs appear to be nouns or vice-versa.
I guess someone with more NLTK experience could find a more robust way to replace the words.
$endgroup$
add a comment |
$begingroup$
Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.
Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
We can glue these parts together with a placeholder which we will substitute the desired value in:
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
Using ""
as the joiner will create a string like The is and I it
, which makes a perfect .format()
string to substitute in the desired replacements:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
return fmt.format(subject, color, verb)
If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216474%2felegant-way-to-replace-substring-in-a-regex-with-optional-groups-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
import re
regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)
def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new
def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'
Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.
$endgroup$
2
$begingroup$
I did not knowexpand()
, this seems very useful. Thanks!
$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
import re
regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)
def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new
def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'
Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.
$endgroup$
2
$begingroup$
I did not knowexpand()
, this seems very useful. Thanks!
$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
import re
regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)
def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new
def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'
Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.
$endgroup$
import re
regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)
def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new
def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'
Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.
answered 13 hours ago
ReinderienReinderien
4,935825
4,935825
2
$begingroup$
I did not knowexpand()
, this seems very useful. Thanks!
$endgroup$
– Delgan
13 hours ago
add a comment |
2
$begingroup$
I did not knowexpand()
, this seems very useful. Thanks!
$endgroup$
– Delgan
13 hours ago
2
2
$begingroup$
I did not know
expand()
, this seems very useful. Thanks!$endgroup$
– Delgan
13 hours ago
$begingroup$
I did not know
expand()
, this seems very useful. Thanks!$endgroup$
– Delgan
13 hours ago
add a comment |
$begingroup$
You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data
:
You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:
import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag
def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]
def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)
for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)
new_sentence = []
for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)
return ' '.join(new_sentence)
repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'
This version is brittle though, because some verbs appear to be nouns or vice-versa.
I guess someone with more NLTK experience could find a more robust way to replace the words.
$endgroup$
add a comment |
$begingroup$
You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data
:
You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:
import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag
def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]
def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)
for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)
new_sentence = []
for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)
return ' '.join(new_sentence)
repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'
This version is brittle though, because some verbs appear to be nouns or vice-versa.
I guess someone with more NLTK experience could find a more robust way to replace the words.
$endgroup$
add a comment |
$begingroup$
You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data
:
You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:
import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag
def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]
def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)
for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)
new_sentence = []
for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)
return ' '.join(new_sentence)
repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'
This version is brittle though, because some verbs appear to be nouns or vice-versa.
I guess someone with more NLTK experience could find a more robust way to replace the words.
$endgroup$
You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data
:
You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:
import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag
def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]
def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)
for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)
new_sentence = []
for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)
return ' '.join(new_sentence)
repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'
This version is brittle though, because some verbs appear to be nouns or vice-versa.
I guess someone with more NLTK experience could find a more robust way to replace the words.
answered 7 hours ago
Eric DuminilEric Duminil
2,1011613
2,1011613
add a comment |
add a comment |
$begingroup$
Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.
Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
We can glue these parts together with a placeholder which we will substitute the desired value in:
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
Using ""
as the joiner will create a string like The is and I it
, which makes a perfect .format()
string to substitute in the desired replacements:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
return fmt.format(subject, color, verb)
If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)
$endgroup$
add a comment |
$begingroup$
Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.
Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
We can glue these parts together with a placeholder which we will substitute the desired value in:
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
Using ""
as the joiner will create a string like The is and I it
, which makes a perfect .format()
string to substitute in the desired replacements:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
return fmt.format(subject, color, verb)
If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)
$endgroup$
add a comment |
$begingroup$
Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.
Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
We can glue these parts together with a placeholder which we will substitute the desired value in:
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
Using ""
as the joiner will create a string like The is and I it
, which makes a perfect .format()
string to substitute in the desired replacements:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
return fmt.format(subject, color, verb)
If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)
$endgroup$
Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.
Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
We can glue these parts together with a placeholder which we will substitute the desired value in:
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
Using ""
as the joiner will create a string like The is and I it
, which makes a perfect .format()
string to substitute in the desired replacements:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
start = [0] + [m.end(i+1) for i in range(m.lastindex)]
end = [m.start(i+1) for i in range(m.lastindex)] + [None]
fmt = "".join(sentence[s:e] for s, e in zip(start, end))
return fmt.format(subject, color, verb)
If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:
def repl(sentence, subject, color, verb=None):
m = re.match(regex, sentence)
idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)
answered 6 hours ago
AJNeufeldAJNeufeld
6,5101621
6,5101621
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216474%2felegant-way-to-replace-substring-in-a-regex-with-optional-groups-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Do you have to use a regex? If not,
split(" ")
the string into words, replace words 1, 3, and possibly 6, then" ".join(...)
it back into a sentence.$endgroup$
– AJNeufeld
14 hours ago
$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
14 hours ago
$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
13 hours ago
$begingroup$
@Reinderien For example,
repl("The meloon is orange", "orange", "great", "like")
or simplyrepl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
13 hours ago