Elegant way to replace substring in a regex with optional groups in Python?Capturing optional regex segment with PHPFind and replace String with a substring resultOptimal string literal tokenizing algorithmEval is evil: Dynamic method calls from named regex groups in Python 3Improving CSV filtering with Python using regexJavaScript Regex Test and ReplaceReplace fixed width values over 530px with 100% using RegExpython recursive regex optimizationRecursively replace string placeholders with parameterized phrasesFaster way of replacing strings in large pandas dataframe with regex

Is it possible to do 50 km distance without any previous training?

Why do falling prices hurt debtors?

Font hinting is lost in Chrome-like browsers (for some languages )

The Clique vs. Independent Set Problem

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

US citizen flying to France today and my passport expires in less than 2 months

Prove that NP is closed under karp reduction?

Why not use SQL instead of GraphQL?

Collect Fourier series terms

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

How can bays and straits be determined in a procedurally generated map?

Did Shadowfax go to Valinor?

Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.

What's the output of a record cartridge playing an out-of-speed record

What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)

Does Unearthed Arcana render Favored Souls redundant?

Why is consensus so controversial in Britain?

Which models of the Boeing 737 are still in production?

Is it important to consider tone, melody, and musical form while writing a song?

tikz: show 0 at the axis origin

Schoenfled Residua test shows proportionality hazard assumptions holds but Kaplan-Meier plots intersect

What defenses are there against being summoned by the Gate spell?

Why, historically, did Gödel think CH was false?

How does strength of boric acid solution increase in presence of salicylic acid?



Elegant way to replace substring in a regex with optional groups in Python?


Capturing optional regex segment with PHPFind and replace String with a substring resultOptimal string literal tokenizing algorithmEval is evil: Dynamic method calls from named regex groups in Python 3Improving CSV filtering with Python using regexJavaScript Regex Test and ReplaceReplace fixed width values over 530px with 100% using RegExpython recursive regex optimizationRecursively replace string placeholders with parameterized phrasesFaster way of replacing strings in large pandas dataframe with regex






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6












$begingroup$


Given a string taken from the following set:



strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]


I would like to constuct a function which replaces subject, color and optional verb from this string with others values.



All strings match a certain regex pattern as follow:



regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"


The expected output of such function would look like this:



repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"


Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):



def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string


Do you think there is a more straightforward way to implement this?










share|improve this question











$endgroup$











  • $begingroup$
    Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
    $endgroup$
    – AJNeufeld
    Mar 29 at 14:06










  • $begingroup$
    What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
    $endgroup$
    – Reinderien
    Mar 29 at 14:18










  • $begingroup$
    @AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
    $endgroup$
    – Delgan
    Mar 29 at 14:35










  • $begingroup$
    @Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
    $endgroup$
    – Delgan
    Mar 29 at 14:40

















6












$begingroup$


Given a string taken from the following set:



strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]


I would like to constuct a function which replaces subject, color and optional verb from this string with others values.



All strings match a certain regex pattern as follow:



regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"


The expected output of such function would look like this:



repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"


Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):



def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string


Do you think there is a more straightforward way to implement this?










share|improve this question











$endgroup$











  • $begingroup$
    Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
    $endgroup$
    – AJNeufeld
    Mar 29 at 14:06










  • $begingroup$
    What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
    $endgroup$
    – Reinderien
    Mar 29 at 14:18










  • $begingroup$
    @AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
    $endgroup$
    – Delgan
    Mar 29 at 14:35










  • $begingroup$
    @Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
    $endgroup$
    – Delgan
    Mar 29 at 14:40













6












6








6


1



$begingroup$


Given a string taken from the following set:



strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]


I would like to constuct a function which replaces subject, color and optional verb from this string with others values.



All strings match a certain regex pattern as follow:



regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"


The expected output of such function would look like this:



repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"


Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):



def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string


Do you think there is a more straightforward way to implement this?










share|improve this question











$endgroup$




Given a string taken from the following set:



strings = [
"The sky is blue and I like it",
"The tree is green and I love it",
"A lemon is yellow"
]


I would like to constuct a function which replaces subject, color and optional verb from this string with others values.



All strings match a certain regex pattern as follow:



regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"


The expected output of such function would look like this:



repl("The sea is blue", "moon", "white", "hate")
# => "The moon is white"


Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):



def repl(sentence, subject, color, verb):
m = re.match(regex, sentence)
s = sentence
new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color
if m.group("verb") is None:
new_string += s[m.end("color"):]
else:
new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]
return new_string


Do you think there is a more straightforward way to implement this?







python python-3.x strings regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 29 at 14:10









Reinderien

5,260926




5,260926










asked Mar 29 at 13:18









DelganDelgan

242111




242111











  • $begingroup$
    Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
    $endgroup$
    – AJNeufeld
    Mar 29 at 14:06










  • $begingroup$
    What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
    $endgroup$
    – Reinderien
    Mar 29 at 14:18










  • $begingroup$
    @AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
    $endgroup$
    – Delgan
    Mar 29 at 14:35










  • $begingroup$
    @Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
    $endgroup$
    – Delgan
    Mar 29 at 14:40
















  • $begingroup$
    Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
    $endgroup$
    – AJNeufeld
    Mar 29 at 14:06










  • $begingroup$
    What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
    $endgroup$
    – Reinderien
    Mar 29 at 14:18










  • $begingroup$
    @AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
    $endgroup$
    – Delgan
    Mar 29 at 14:35










  • $begingroup$
    @Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
    $endgroup$
    – Delgan
    Mar 29 at 14:40















$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06




$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06












$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18




$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18












$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35




$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35












$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40




$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40










3 Answers
3






active

oldest

votes


















11












$begingroup$

import re

regex = re.compile(
r'(The|A) '
r'w+'
r'( is )'
r'w+'
r'(?:'
r'( and I )'
r'w+'
r'( it)'
r')?'
)


def repl(sentence, subject, colour, verb=None):
m = regex.match(sentence)
new = m.expand(rf'1 subject2colour')
if m[3]:
new += m.expand(rf'3verb4')
return new


def test():
assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
'The bathroom is smelly and I distrust it'
assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
'The pinata is angry and I fear it'
assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
'A population is dumbfounded'


Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.






share|improve this answer









$endgroup$








  • 2




    $begingroup$
    I did not know expand(), this seems very useful. Thanks!
    $endgroup$
    – Delgan
    Mar 29 at 15:12


















3












$begingroup$

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:



You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:



import nltk
from collections import defaultdict
from nltk.tag import pos_tag, map_tag

def simple_tags(words):
#see https://stackoverflow.com/a/5793083/6419007
return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]

def repl(sentence, *new_words):
new_words_by_tag = defaultdict(list)

for new_word, tag in simple_tags(new_words):
new_words_by_tag[tag].append(new_word)

new_sentence = []

for word, tag in simple_tags(nltk.word_tokenize(sentence)):
possible_replacements = new_words_by_tag.get(tag)
if possible_replacements:
new_sentence.append(possible_replacements.pop(0))
else:
new_sentence.append(word)

return ' '.join(new_sentence)

repl("The sea is blue", "moon", "white", "hate")
# 'The moon is white'
repl("The sea is blue", "yellow", "elephant")
# 'The elephant is yellow'


This version is brittle though, because some verbs appear to be nouns or vice-versa.



I guess someone with more NLTK experience could find a more robust way to replace the words.






share|improve this answer









$endgroup$




















    0












    $begingroup$

    Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.




    Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:



    start = [0] + [m.end(i+1) for i in range(m.lastindex)]
    end = [m.start(i+1) for i in range(m.lastindex)] + [None]


    We can glue these parts together with a placeholder which we will substitute the desired value in:



    fmt = "".join(sentence[s:e] for s, e in zip(start, end))


    Using "" as the joiner will create a string like The is and I it, which makes a perfect .format() string to substitute in the desired replacements:



    def repl(sentence, subject, color, verb=None):
    m = re.match(regex, sentence)
    start = [0] + [m.end(i+1) for i in range(m.lastindex)]
    end = [m.start(i+1) for i in range(m.lastindex)] + [None]
    fmt = "".join(sentence[s:e] for s, e in zip(start, end))
    return fmt.format(subject, color, verb)


    If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:



    def repl(sentence, subject, color, verb=None):
    m = re.match(regex, sentence)
    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
    return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)





    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "196"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216474%2felegant-way-to-replace-substring-in-a-regex-with-optional-groups-in-python%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      11












      $begingroup$

      import re

      regex = re.compile(
      r'(The|A) '
      r'w+'
      r'( is )'
      r'w+'
      r'(?:'
      r'( and I )'
      r'w+'
      r'( it)'
      r')?'
      )


      def repl(sentence, subject, colour, verb=None):
      m = regex.match(sentence)
      new = m.expand(rf'1 subject2colour')
      if m[3]:
      new += m.expand(rf'3verb4')
      return new


      def test():
      assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
      'The bathroom is smelly and I distrust it'
      assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
      'The pinata is angry and I fear it'
      assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
      'A population is dumbfounded'


      Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.






      share|improve this answer









      $endgroup$








      • 2




        $begingroup$
        I did not know expand(), this seems very useful. Thanks!
        $endgroup$
        – Delgan
        Mar 29 at 15:12















      11












      $begingroup$

      import re

      regex = re.compile(
      r'(The|A) '
      r'w+'
      r'( is )'
      r'w+'
      r'(?:'
      r'( and I )'
      r'w+'
      r'( it)'
      r')?'
      )


      def repl(sentence, subject, colour, verb=None):
      m = regex.match(sentence)
      new = m.expand(rf'1 subject2colour')
      if m[3]:
      new += m.expand(rf'3verb4')
      return new


      def test():
      assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
      'The bathroom is smelly and I distrust it'
      assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
      'The pinata is angry and I fear it'
      assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
      'A population is dumbfounded'


      Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.






      share|improve this answer









      $endgroup$








      • 2




        $begingroup$
        I did not know expand(), this seems very useful. Thanks!
        $endgroup$
        – Delgan
        Mar 29 at 15:12













      11












      11








      11





      $begingroup$

      import re

      regex = re.compile(
      r'(The|A) '
      r'w+'
      r'( is )'
      r'w+'
      r'(?:'
      r'( and I )'
      r'w+'
      r'( it)'
      r')?'
      )


      def repl(sentence, subject, colour, verb=None):
      m = regex.match(sentence)
      new = m.expand(rf'1 subject2colour')
      if m[3]:
      new += m.expand(rf'3verb4')
      return new


      def test():
      assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
      'The bathroom is smelly and I distrust it'
      assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
      'The pinata is angry and I fear it'
      assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
      'A population is dumbfounded'


      Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.






      share|improve this answer









      $endgroup$



      import re

      regex = re.compile(
      r'(The|A) '
      r'w+'
      r'( is )'
      r'w+'
      r'(?:'
      r'( and I )'
      r'w+'
      r'( it)'
      r')?'
      )


      def repl(sentence, subject, colour, verb=None):
      m = regex.match(sentence)
      new = m.expand(rf'1 subject2colour')
      if m[3]:
      new += m.expand(rf'3verb4')
      return new


      def test():
      assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') ==
      'The bathroom is smelly and I distrust it'
      assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') ==
      'The pinata is angry and I fear it'
      assert repl('A lemon is yellow', 'population', 'dumbfounded') ==
      'A population is dumbfounded'


      Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Mar 29 at 14:34









      ReinderienReinderien

      5,260926




      5,260926







      • 2




        $begingroup$
        I did not know expand(), this seems very useful. Thanks!
        $endgroup$
        – Delgan
        Mar 29 at 15:12












      • 2




        $begingroup$
        I did not know expand(), this seems very useful. Thanks!
        $endgroup$
        – Delgan
        Mar 29 at 15:12







      2




      2




      $begingroup$
      I did not know expand(), this seems very useful. Thanks!
      $endgroup$
      – Delgan
      Mar 29 at 15:12




      $begingroup$
      I did not know expand(), this seems very useful. Thanks!
      $endgroup$
      – Delgan
      Mar 29 at 15:12













      3












      $begingroup$

      You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:



      You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:



      import nltk
      from collections import defaultdict
      from nltk.tag import pos_tag, map_tag

      def simple_tags(words):
      #see https://stackoverflow.com/a/5793083/6419007
      return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]

      def repl(sentence, *new_words):
      new_words_by_tag = defaultdict(list)

      for new_word, tag in simple_tags(new_words):
      new_words_by_tag[tag].append(new_word)

      new_sentence = []

      for word, tag in simple_tags(nltk.word_tokenize(sentence)):
      possible_replacements = new_words_by_tag.get(tag)
      if possible_replacements:
      new_sentence.append(possible_replacements.pop(0))
      else:
      new_sentence.append(word)

      return ' '.join(new_sentence)

      repl("The sea is blue", "moon", "white", "hate")
      # 'The moon is white'
      repl("The sea is blue", "yellow", "elephant")
      # 'The elephant is yellow'


      This version is brittle though, because some verbs appear to be nouns or vice-versa.



      I guess someone with more NLTK experience could find a more robust way to replace the words.






      share|improve this answer









      $endgroup$

















        3












        $begingroup$

        You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:



        You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:



        import nltk
        from collections import defaultdict
        from nltk.tag import pos_tag, map_tag

        def simple_tags(words):
        #see https://stackoverflow.com/a/5793083/6419007
        return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]

        def repl(sentence, *new_words):
        new_words_by_tag = defaultdict(list)

        for new_word, tag in simple_tags(new_words):
        new_words_by_tag[tag].append(new_word)

        new_sentence = []

        for word, tag in simple_tags(nltk.word_tokenize(sentence)):
        possible_replacements = new_words_by_tag.get(tag)
        if possible_replacements:
        new_sentence.append(possible_replacements.pop(0))
        else:
        new_sentence.append(word)

        return ' '.join(new_sentence)

        repl("The sea is blue", "moon", "white", "hate")
        # 'The moon is white'
        repl("The sea is blue", "yellow", "elephant")
        # 'The elephant is yellow'


        This version is brittle though, because some verbs appear to be nouns or vice-versa.



        I guess someone with more NLTK experience could find a more robust way to replace the words.






        share|improve this answer









        $endgroup$















          3












          3








          3





          $begingroup$

          You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:



          You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:



          import nltk
          from collections import defaultdict
          from nltk.tag import pos_tag, map_tag

          def simple_tags(words):
          #see https://stackoverflow.com/a/5793083/6419007
          return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]

          def repl(sentence, *new_words):
          new_words_by_tag = defaultdict(list)

          for new_word, tag in simple_tags(new_words):
          new_words_by_tag[tag].append(new_word)

          new_sentence = []

          for word, tag in simple_tags(nltk.word_tokenize(sentence)):
          possible_replacements = new_words_by_tag.get(tag)
          if possible_replacements:
          new_sentence.append(possible_replacements.pop(0))
          else:
          new_sentence.append(word)

          return ' '.join(new_sentence)

          repl("The sea is blue", "moon", "white", "hate")
          # 'The moon is white'
          repl("The sea is blue", "yellow", "elephant")
          # 'The elephant is yellow'


          This version is brittle though, because some verbs appear to be nouns or vice-versa.



          I guess someone with more NLTK experience could find a more robust way to replace the words.






          share|improve this answer









          $endgroup$



          You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:



          You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:



          import nltk
          from collections import defaultdict
          from nltk.tag import pos_tag, map_tag

          def simple_tags(words):
          #see https://stackoverflow.com/a/5793083/6419007
          return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]

          def repl(sentence, *new_words):
          new_words_by_tag = defaultdict(list)

          for new_word, tag in simple_tags(new_words):
          new_words_by_tag[tag].append(new_word)

          new_sentence = []

          for word, tag in simple_tags(nltk.word_tokenize(sentence)):
          possible_replacements = new_words_by_tag.get(tag)
          if possible_replacements:
          new_sentence.append(possible_replacements.pop(0))
          else:
          new_sentence.append(word)

          return ' '.join(new_sentence)

          repl("The sea is blue", "moon", "white", "hate")
          # 'The moon is white'
          repl("The sea is blue", "yellow", "elephant")
          # 'The elephant is yellow'


          This version is brittle though, because some verbs appear to be nouns or vice-versa.



          I guess someone with more NLTK experience could find a more robust way to replace the words.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 29 at 20:46









          Eric DuminilEric Duminil

          2,1111613




          2,1111613





















              0












              $begingroup$

              Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.




              Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:



              start = [0] + [m.end(i+1) for i in range(m.lastindex)]
              end = [m.start(i+1) for i in range(m.lastindex)] + [None]


              We can glue these parts together with a placeholder which we will substitute the desired value in:



              fmt = "".join(sentence[s:e] for s, e in zip(start, end))


              Using "" as the joiner will create a string like The is and I it, which makes a perfect .format() string to substitute in the desired replacements:



              def repl(sentence, subject, color, verb=None):
              m = re.match(regex, sentence)
              start = [0] + [m.end(i+1) for i in range(m.lastindex)]
              end = [m.start(i+1) for i in range(m.lastindex)] + [None]
              fmt = "".join(sentence[s:e] for s, e in zip(start, end))
              return fmt.format(subject, color, verb)


              If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:



              def repl(sentence, subject, color, verb=None):
              m = re.match(regex, sentence)
              idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
              return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)





              share|improve this answer









              $endgroup$

















                0












                $begingroup$

                Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.




                Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:



                start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                end = [m.start(i+1) for i in range(m.lastindex)] + [None]


                We can glue these parts together with a placeholder which we will substitute the desired value in:



                fmt = "".join(sentence[s:e] for s, e in zip(start, end))


                Using "" as the joiner will create a string like The is and I it, which makes a perfect .format() string to substitute in the desired replacements:



                def repl(sentence, subject, color, verb=None):
                m = re.match(regex, sentence)
                start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                end = [m.start(i+1) for i in range(m.lastindex)] + [None]
                fmt = "".join(sentence[s:e] for s, e in zip(start, end))
                return fmt.format(subject, color, verb)


                If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:



                def repl(sentence, subject, color, verb=None):
                m = re.match(regex, sentence)
                idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
                return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)





                share|improve this answer









                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.




                  Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:



                  start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                  end = [m.start(i+1) for i in range(m.lastindex)] + [None]


                  We can glue these parts together with a placeholder which we will substitute the desired value in:



                  fmt = "".join(sentence[s:e] for s, e in zip(start, end))


                  Using "" as the joiner will create a string like The is and I it, which makes a perfect .format() string to substitute in the desired replacements:



                  def repl(sentence, subject, color, verb=None):
                  m = re.match(regex, sentence)
                  start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                  end = [m.start(i+1) for i in range(m.lastindex)] + [None]
                  fmt = "".join(sentence[s:e] for s, e in zip(start, end))
                  return fmt.format(subject, color, verb)


                  If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:



                  def repl(sentence, subject, color, verb=None):
                  m = re.match(regex, sentence)
                  idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
                  return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)





                  share|improve this answer









                  $endgroup$



                  Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.




                  Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:



                  start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                  end = [m.start(i+1) for i in range(m.lastindex)] + [None]


                  We can glue these parts together with a placeholder which we will substitute the desired value in:



                  fmt = "".join(sentence[s:e] for s, e in zip(start, end))


                  Using "" as the joiner will create a string like The is and I it, which makes a perfect .format() string to substitute in the desired replacements:



                  def repl(sentence, subject, color, verb=None):
                  m = re.match(regex, sentence)
                  start = [0] + [m.end(i+1) for i in range(m.lastindex)]
                  end = [m.start(i+1) for i in range(m.lastindex)] + [None]
                  fmt = "".join(sentence[s:e] for s, e in zip(start, end))
                  return fmt.format(subject, color, verb)


                  If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:



                  def repl(sentence, subject, color, verb=None):
                  m = re.match(regex, sentence)
                  idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]
                  return "".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 29 at 22:07









                  AJNeufeldAJNeufeld

                  6,6241621




                  6,6241621



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Code Review Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216474%2felegant-way-to-replace-substring-in-a-regex-with-optional-groups-in-python%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

                      Trouble understanding the speech of overseas colleaguesHow can I better understand manager or clients with strong accents?Adding more movement and speech at the fundamental level to a highly-sedentary job?Difficulty in understanding Manager's accent(language and communication)How to adjust yourself where your colleagues are not understanding to you?Understanding manager's expectationsForeigner and colleagues using slangHaving difficulty understanding meetingsHow do you breathe when giving a speech?Trouble Waking Up for Emergencies (On-Call)Problems with colleaguesColleagues feeling insecure when I do my work

                      Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O