Simulation of an alien population











up vote
3
down vote

favorite












Background



I've come across a puzzle (Problem 10 from this list of sample interview questions):




One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:




  • Kill himself

  • Do nothing

  • Split himself into two aliens (while killing himself)

  • split himself into three aliens (while killing himself)


What is the probability that the alien species eventually dies out entirely?




Unfortunately, I haven't been able to solve the problem theoretically. Then I moved on to simulate it with a basic Markov Chain and Monte Carlo simulation in mind.



This was not asked to me in an interview. I learned the problem from a friend, then found the link above while searching for mathematical solutions.



Reinterpreting the question



We start with the number of aliens n = 1. n has a chance to not change, be decremented by 1, be incremented by 1, and be incremented by 2, %25 for each. If n is incremented, i.e. aliens multiplied, we repeat this procedure for n times again. This corresponds to each alien will do its thing again. I have to put an upper limit though, so that we stop simulating and avoid a crash. n is likely to increase and we're looping n times again and again.



If aliens somehow go extinct, we stop simulating again since there's nothing left to simulate.



After n reaches zero or the upper limit, we also record the population (it will be either zero or some number >= max_pop).



I repeat this many times and record every result. At the end, number of zeros divided by total number of results should give me an approximation.



The code



from random import randint
import numpy as np

pop_max = 100
iter_max = 100000

results = np.zeros(iter_max, dtype=int)

for i in range(iter_max):
n = 1
while n > 0 and n < pop_max:
for j in range(n):
x = randint(1, 4)
if x == 1:
n = n - 1
elif x == 2:
continue
elif x == 3:
n = n + 1
elif x == 4:
n = n + 2
results[i] = n

print( np.bincount(results)[0] / iter_max )


iter_max and pop_max can be changed indeed, but I thought if there is 100 aliens, the probability that they go extinct would be negligibly low. This is just a guess though, I haven't done anything to calculate a (more) proper upper limit for population.



This code gives promising results, fairly close to the real answer which is approximately %41.4.



Some outputs



> python aliens.py
0.41393
> python aliens.py
0.41808
> python aliens.py
0.41574
> python aliens.py
0.4149
> python aliens.py
0.41505
> python aliens.py
0.41277
> python aliens.py
0.41428
> python aliens.py
0.41407
> python aliens.py
0.41676


Aftermath



I'm okay with the results but I can't say the same for the time this code takes. It takes around 16-17 seconds :)



How can I improve the speed? How can I optimize loops (especially the while loop)? Maybe there's a much better approach or better models?










share|improve this question









New contributor




Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    3
    down vote

    favorite












    Background



    I've come across a puzzle (Problem 10 from this list of sample interview questions):




    One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:




    • Kill himself

    • Do nothing

    • Split himself into two aliens (while killing himself)

    • split himself into three aliens (while killing himself)


    What is the probability that the alien species eventually dies out entirely?




    Unfortunately, I haven't been able to solve the problem theoretically. Then I moved on to simulate it with a basic Markov Chain and Monte Carlo simulation in mind.



    This was not asked to me in an interview. I learned the problem from a friend, then found the link above while searching for mathematical solutions.



    Reinterpreting the question



    We start with the number of aliens n = 1. n has a chance to not change, be decremented by 1, be incremented by 1, and be incremented by 2, %25 for each. If n is incremented, i.e. aliens multiplied, we repeat this procedure for n times again. This corresponds to each alien will do its thing again. I have to put an upper limit though, so that we stop simulating and avoid a crash. n is likely to increase and we're looping n times again and again.



    If aliens somehow go extinct, we stop simulating again since there's nothing left to simulate.



    After n reaches zero or the upper limit, we also record the population (it will be either zero or some number >= max_pop).



    I repeat this many times and record every result. At the end, number of zeros divided by total number of results should give me an approximation.



    The code



    from random import randint
    import numpy as np

    pop_max = 100
    iter_max = 100000

    results = np.zeros(iter_max, dtype=int)

    for i in range(iter_max):
    n = 1
    while n > 0 and n < pop_max:
    for j in range(n):
    x = randint(1, 4)
    if x == 1:
    n = n - 1
    elif x == 2:
    continue
    elif x == 3:
    n = n + 1
    elif x == 4:
    n = n + 2
    results[i] = n

    print( np.bincount(results)[0] / iter_max )


    iter_max and pop_max can be changed indeed, but I thought if there is 100 aliens, the probability that they go extinct would be negligibly low. This is just a guess though, I haven't done anything to calculate a (more) proper upper limit for population.



    This code gives promising results, fairly close to the real answer which is approximately %41.4.



    Some outputs



    > python aliens.py
    0.41393
    > python aliens.py
    0.41808
    > python aliens.py
    0.41574
    > python aliens.py
    0.4149
    > python aliens.py
    0.41505
    > python aliens.py
    0.41277
    > python aliens.py
    0.41428
    > python aliens.py
    0.41407
    > python aliens.py
    0.41676


    Aftermath



    I'm okay with the results but I can't say the same for the time this code takes. It takes around 16-17 seconds :)



    How can I improve the speed? How can I optimize loops (especially the while loop)? Maybe there's a much better approach or better models?










    share|improve this question









    New contributor




    Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      Background



      I've come across a puzzle (Problem 10 from this list of sample interview questions):




      One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:




      • Kill himself

      • Do nothing

      • Split himself into two aliens (while killing himself)

      • split himself into three aliens (while killing himself)


      What is the probability that the alien species eventually dies out entirely?




      Unfortunately, I haven't been able to solve the problem theoretically. Then I moved on to simulate it with a basic Markov Chain and Monte Carlo simulation in mind.



      This was not asked to me in an interview. I learned the problem from a friend, then found the link above while searching for mathematical solutions.



      Reinterpreting the question



      We start with the number of aliens n = 1. n has a chance to not change, be decremented by 1, be incremented by 1, and be incremented by 2, %25 for each. If n is incremented, i.e. aliens multiplied, we repeat this procedure for n times again. This corresponds to each alien will do its thing again. I have to put an upper limit though, so that we stop simulating and avoid a crash. n is likely to increase and we're looping n times again and again.



      If aliens somehow go extinct, we stop simulating again since there's nothing left to simulate.



      After n reaches zero or the upper limit, we also record the population (it will be either zero or some number >= max_pop).



      I repeat this many times and record every result. At the end, number of zeros divided by total number of results should give me an approximation.



      The code



      from random import randint
      import numpy as np

      pop_max = 100
      iter_max = 100000

      results = np.zeros(iter_max, dtype=int)

      for i in range(iter_max):
      n = 1
      while n > 0 and n < pop_max:
      for j in range(n):
      x = randint(1, 4)
      if x == 1:
      n = n - 1
      elif x == 2:
      continue
      elif x == 3:
      n = n + 1
      elif x == 4:
      n = n + 2
      results[i] = n

      print( np.bincount(results)[0] / iter_max )


      iter_max and pop_max can be changed indeed, but I thought if there is 100 aliens, the probability that they go extinct would be negligibly low. This is just a guess though, I haven't done anything to calculate a (more) proper upper limit for population.



      This code gives promising results, fairly close to the real answer which is approximately %41.4.



      Some outputs



      > python aliens.py
      0.41393
      > python aliens.py
      0.41808
      > python aliens.py
      0.41574
      > python aliens.py
      0.4149
      > python aliens.py
      0.41505
      > python aliens.py
      0.41277
      > python aliens.py
      0.41428
      > python aliens.py
      0.41407
      > python aliens.py
      0.41676


      Aftermath



      I'm okay with the results but I can't say the same for the time this code takes. It takes around 16-17 seconds :)



      How can I improve the speed? How can I optimize loops (especially the while loop)? Maybe there's a much better approach or better models?










      share|improve this question









      New contributor




      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      Background



      I've come across a puzzle (Problem 10 from this list of sample interview questions):




      One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:




      • Kill himself

      • Do nothing

      • Split himself into two aliens (while killing himself)

      • split himself into three aliens (while killing himself)


      What is the probability that the alien species eventually dies out entirely?




      Unfortunately, I haven't been able to solve the problem theoretically. Then I moved on to simulate it with a basic Markov Chain and Monte Carlo simulation in mind.



      This was not asked to me in an interview. I learned the problem from a friend, then found the link above while searching for mathematical solutions.



      Reinterpreting the question



      We start with the number of aliens n = 1. n has a chance to not change, be decremented by 1, be incremented by 1, and be incremented by 2, %25 for each. If n is incremented, i.e. aliens multiplied, we repeat this procedure for n times again. This corresponds to each alien will do its thing again. I have to put an upper limit though, so that we stop simulating and avoid a crash. n is likely to increase and we're looping n times again and again.



      If aliens somehow go extinct, we stop simulating again since there's nothing left to simulate.



      After n reaches zero or the upper limit, we also record the population (it will be either zero or some number >= max_pop).



      I repeat this many times and record every result. At the end, number of zeros divided by total number of results should give me an approximation.



      The code



      from random import randint
      import numpy as np

      pop_max = 100
      iter_max = 100000

      results = np.zeros(iter_max, dtype=int)

      for i in range(iter_max):
      n = 1
      while n > 0 and n < pop_max:
      for j in range(n):
      x = randint(1, 4)
      if x == 1:
      n = n - 1
      elif x == 2:
      continue
      elif x == 3:
      n = n + 1
      elif x == 4:
      n = n + 2
      results[i] = n

      print( np.bincount(results)[0] / iter_max )


      iter_max and pop_max can be changed indeed, but I thought if there is 100 aliens, the probability that they go extinct would be negligibly low. This is just a guess though, I haven't done anything to calculate a (more) proper upper limit for population.



      This code gives promising results, fairly close to the real answer which is approximately %41.4.



      Some outputs



      > python aliens.py
      0.41393
      > python aliens.py
      0.41808
      > python aliens.py
      0.41574
      > python aliens.py
      0.4149
      > python aliens.py
      0.41505
      > python aliens.py
      0.41277
      > python aliens.py
      0.41428
      > python aliens.py
      0.41407
      > python aliens.py
      0.41676


      Aftermath



      I'm okay with the results but I can't say the same for the time this code takes. It takes around 16-17 seconds :)



      How can I improve the speed? How can I optimize loops (especially the while loop)? Maybe there's a much better approach or better models?







      python programming-challenge numpy simulation markov-chain






      share|improve this question









      New contributor




      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 1 hour ago









      200_success

      128k15149412




      128k15149412






      New contributor




      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 15 hours ago









      Motun

      1184




      1184




      New contributor




      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Motun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          There are some obvious possible simplifications for elegance, if not necessarily for speed.



          The while condition should be written as a double-ended inequality:



          while 0 < n < pop_max:



          The variable j is unused. The convention is to use _ as the name of a "throwaway" variable.



          The if-elif chain can be eliminated with a smarter randint() call:



          for j in range(n):
          n += randint(-1, 2)


          NumPy is overkill here, when all you want to know whether the population went extinct. The built-in sum() function can do the counting for you.



          Each simulation run is independent. I'd put the code in a function for readability.



          from random import randint

          def population(pop_max=100):
          n = 1
          while 0 < n < pop_max:
          for _ in range(n):
          n += randint(-1, 2)
          return n

          iterations = 100000
          print(sum(population() == 0 for _ in range(iterations)) / iterations)





          share|improve this answer




























            up vote
            0
            down vote













            numpy has a function to generate a random array, this might be faster than generating a random number within the inner loop.
            https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html



            You can also try generating a larger 32 or 64-bit number, and shifting and masking the whole time to get 2 random bits. However, this is a bit far-fetched.






            share|improve this answer








            New contributor




            user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.


















            • Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
              – Aaron
              14 hours ago











            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Motun is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209365%2fsimulation-of-an-alien-population%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            There are some obvious possible simplifications for elegance, if not necessarily for speed.



            The while condition should be written as a double-ended inequality:



            while 0 < n < pop_max:



            The variable j is unused. The convention is to use _ as the name of a "throwaway" variable.



            The if-elif chain can be eliminated with a smarter randint() call:



            for j in range(n):
            n += randint(-1, 2)


            NumPy is overkill here, when all you want to know whether the population went extinct. The built-in sum() function can do the counting for you.



            Each simulation run is independent. I'd put the code in a function for readability.



            from random import randint

            def population(pop_max=100):
            n = 1
            while 0 < n < pop_max:
            for _ in range(n):
            n += randint(-1, 2)
            return n

            iterations = 100000
            print(sum(population() == 0 for _ in range(iterations)) / iterations)





            share|improve this answer

























              up vote
              1
              down vote



              accepted










              There are some obvious possible simplifications for elegance, if not necessarily for speed.



              The while condition should be written as a double-ended inequality:



              while 0 < n < pop_max:



              The variable j is unused. The convention is to use _ as the name of a "throwaway" variable.



              The if-elif chain can be eliminated with a smarter randint() call:



              for j in range(n):
              n += randint(-1, 2)


              NumPy is overkill here, when all you want to know whether the population went extinct. The built-in sum() function can do the counting for you.



              Each simulation run is independent. I'd put the code in a function for readability.



              from random import randint

              def population(pop_max=100):
              n = 1
              while 0 < n < pop_max:
              for _ in range(n):
              n += randint(-1, 2)
              return n

              iterations = 100000
              print(sum(population() == 0 for _ in range(iterations)) / iterations)





              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                There are some obvious possible simplifications for elegance, if not necessarily for speed.



                The while condition should be written as a double-ended inequality:



                while 0 < n < pop_max:



                The variable j is unused. The convention is to use _ as the name of a "throwaway" variable.



                The if-elif chain can be eliminated with a smarter randint() call:



                for j in range(n):
                n += randint(-1, 2)


                NumPy is overkill here, when all you want to know whether the population went extinct. The built-in sum() function can do the counting for you.



                Each simulation run is independent. I'd put the code in a function for readability.



                from random import randint

                def population(pop_max=100):
                n = 1
                while 0 < n < pop_max:
                for _ in range(n):
                n += randint(-1, 2)
                return n

                iterations = 100000
                print(sum(population() == 0 for _ in range(iterations)) / iterations)





                share|improve this answer












                There are some obvious possible simplifications for elegance, if not necessarily for speed.



                The while condition should be written as a double-ended inequality:



                while 0 < n < pop_max:



                The variable j is unused. The convention is to use _ as the name of a "throwaway" variable.



                The if-elif chain can be eliminated with a smarter randint() call:



                for j in range(n):
                n += randint(-1, 2)


                NumPy is overkill here, when all you want to know whether the population went extinct. The built-in sum() function can do the counting for you.



                Each simulation run is independent. I'd put the code in a function for readability.



                from random import randint

                def population(pop_max=100):
                n = 1
                while 0 < n < pop_max:
                for _ in range(n):
                n += randint(-1, 2)
                return n

                iterations = 100000
                print(sum(population() == 0 for _ in range(iterations)) / iterations)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 11 hours ago









                200_success

                128k15149412




                128k15149412
























                    up vote
                    0
                    down vote













                    numpy has a function to generate a random array, this might be faster than generating a random number within the inner loop.
                    https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html



                    You can also try generating a larger 32 or 64-bit number, and shifting and masking the whole time to get 2 random bits. However, this is a bit far-fetched.






                    share|improve this answer








                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.


















                    • Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                      – Aaron
                      14 hours ago















                    up vote
                    0
                    down vote













                    numpy has a function to generate a random array, this might be faster than generating a random number within the inner loop.
                    https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html



                    You can also try generating a larger 32 or 64-bit number, and shifting and masking the whole time to get 2 random bits. However, this is a bit far-fetched.






                    share|improve this answer








                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.


















                    • Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                      – Aaron
                      14 hours ago













                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    numpy has a function to generate a random array, this might be faster than generating a random number within the inner loop.
                    https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html



                    You can also try generating a larger 32 or 64-bit number, and shifting and masking the whole time to get 2 random bits. However, this is a bit far-fetched.






                    share|improve this answer








                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    numpy has a function to generate a random array, this might be faster than generating a random number within the inner loop.
                    https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html



                    You can also try generating a larger 32 or 64-bit number, and shifting and masking the whole time to get 2 random bits. However, this is a bit far-fetched.







                    share|improve this answer








                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    share|improve this answer



                    share|improve this answer






                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    answered 14 hours ago









                    user2966394

                    233




                    233




                    New contributor




                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.





                    New contributor





                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.






                    user2966394 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.












                    • Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                      – Aaron
                      14 hours ago


















                    • Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                      – Aaron
                      14 hours ago
















                    Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                    – Aaron
                    14 hours ago




                    Depending on the platform (and especially in python) working with native machine words is faster even if it seems wasteful to only use 2-3 bits out of 64 or 32.
                    – Aaron
                    14 hours ago










                    Motun is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    Motun is a new contributor. Be nice, and check out our Code of Conduct.













                    Motun is a new contributor. Be nice, and check out our Code of Conduct.












                    Motun is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209365%2fsimulation-of-an-alien-population%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    404 Error Contact Form 7 ajax form submitting

                    How to know if a Active Directory user can login interactively

                    TypeError: fit_transform() missing 1 required positional argument: 'X'