Data transformation involving dictionaries and lists












3














My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



Input:



[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]


Output:



[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]


To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



Here is a possible implementation, using a class to isolate the whole mess.



@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""

mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result

results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])

tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})

if len(tmp) == 0:
return [{}]
else:
return tmp


Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










share|improve this question





























    3














    My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



    Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



    Input:



    [
    {
    ('B3G', 'B1'): [1.0, 2.0],
    ('B1G', 'B1'): [11.0, 12.0]
    },
    {
    ('B2G', 'B1'): [1.5, 2.5, 3.5]
    }
    ]


    Output:



    [
    {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
    {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
    {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
    {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
    {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
    {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
    ]


    To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



    Here is a possible implementation, using a class to isolate the whole mess.



    @dataclass
    class ParametricMapping:
    """Abstraction for multi-dimensional parametric mappings."""

    mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

    @property
    def combinations(self) -> List[Mapping[Tuple[str], float]]:
    """Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

    labels = [label for arg in self.mappings for label in tuple(arg.keys())]
    pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

    def cartesian_product(*args):
    """Cartesian product similar to `itertools.product`"""
    result = []
    for pool in args:
    result = [x + [y] for x in result for y in pool]
    return result

    results =
    for term in cartesian_product(*pools):
    results.append([pp for p in term for pp in p])

    tmp =
    for r in results:
    tmp.append({k: v for k, v in zip(labels, r)})

    if len(tmp) == 0:
    return [{}]
    else:
    return tmp


    Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










    share|improve this question



























      3












      3








      3







      My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



      Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



      Input:



      [
      {
      ('B3G', 'B1'): [1.0, 2.0],
      ('B1G', 'B1'): [11.0, 12.0]
      },
      {
      ('B2G', 'B1'): [1.5, 2.5, 3.5]
      }
      ]


      Output:



      [
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
      ]


      To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



      Here is a possible implementation, using a class to isolate the whole mess.



      @dataclass
      class ParametricMapping:
      """Abstraction for multi-dimensional parametric mappings."""

      mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

      @property
      def combinations(self) -> List[Mapping[Tuple[str], float]]:
      """Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

      labels = [label for arg in self.mappings for label in tuple(arg.keys())]
      pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

      def cartesian_product(*args):
      """Cartesian product similar to `itertools.product`"""
      result = []
      for pool in args:
      result = [x + [y] for x in result for y in pool]
      return result

      results =
      for term in cartesian_product(*pools):
      results.append([pp for p in term for pp in p])

      tmp =
      for r in results:
      tmp.append({k: v for k, v in zip(labels, r)})

      if len(tmp) == 0:
      return [{}]
      else:
      return tmp


      Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










      share|improve this question















      My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



      Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



      Input:



      [
      {
      ('B3G', 'B1'): [1.0, 2.0],
      ('B1G', 'B1'): [11.0, 12.0]
      },
      {
      ('B2G', 'B1'): [1.5, 2.5, 3.5]
      }
      ]


      Output:



      [
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
      {('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
      {('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
      ]


      To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



      Here is a possible implementation, using a class to isolate the whole mess.



      @dataclass
      class ParametricMapping:
      """Abstraction for multi-dimensional parametric mappings."""

      mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

      @property
      def combinations(self) -> List[Mapping[Tuple[str], float]]:
      """Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

      labels = [label for arg in self.mappings for label in tuple(arg.keys())]
      pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

      def cartesian_product(*args):
      """Cartesian product similar to `itertools.product`"""
      result = []
      for pool in args:
      result = [x + [y] for x in result for y in pool]
      return result

      results =
      for term in cartesian_product(*pools):
      results.append([pp for p in term for pp in p])

      tmp =
      for r in results:
      tmp.append({k: v for k, v in zip(labels, r)})

      if len(tmp) == 0:
      return [{}]
      else:
      return tmp


      Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?







      python python-3.x






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 18 mins ago









      Jamal

      30.3k11116226




      30.3k11116226










      asked 10 hours ago









      Cedric H.Cedric H.

      1465




      1465






















          1 Answer
          1






          active

          oldest

          votes


















          1














          TL;DR: Scroll to the end for the provided code with suggested improvements



          Suggestions



          1. Standalone cartesian_product helper function outside scope of combinations



          A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



          If you still don't want to use itertools.product:



          While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



          2. flatten helper function



          This helper function should be used thusly:



          for term in cartesian_product(*pools):
          results.append(flatten(term))


          This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



          3. Consolidate some code



          If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



          results = 
          for term in cartesian_product(*pools):
          results.append([pp for p in term for pp in p])


          should now be



          results = 
          for term in itertools.product(*pools):
          results.append(flatten(term))


          This block can be expressed clearly and concisely with the following list comprehension:



          results = [flatten(term) for term in itertools.product(*pools)]


          Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



          4. Check for empty mappings at top of combinations



          Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



          if self.mappings == [{}]:
          return [{}]


          The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



          if len(tmp) == 0:
          return [{}]
          else:
          return tmp


          with



          return tmp or [{}]


          This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



          5. Define labels and pools using helper methods or non-init fields



          Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



          See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



          TL;DR



          In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



          def flatten(l):
          return [item for sublist in l for item in sublist]


          With either ...



          @dataclass
          class ParametricMapping1:
          mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

          def _labels(self) -> List[Tuple[str]]:
          return [label for arg in self.mappings for label in arg.keys()]

          def _pools(self) -> List[List[Sequence[float]]]:
          return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

          @property
          def combinations(self) -> List[Mapping[Tuple[str], float]]:
          if self.mappings == [{}]:
          return [{}]

          pool_values = [flatten(term) for term in itertools.product(*self._pools())]
          return [dict(zip(self._labels(), v)) for v in pool_values]


          Or ...



          @dataclass
          class ParametricMapping2:
          mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
          labels: List[Tuple[str]] = field(init=False, repr=False)
          pools: List[List[Sequence[float]]] = field(init=False, repr=False)

          def __post_init__(self):
          self.labels = [label for arg in self.mappings for label in arg.keys()]
          self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

          @property
          def combinations(self) -> List[Mapping[Tuple[str], float]]:
          pool_values = [flatten(term) for term in itertools.product(*self.pools)]
          return [dict(zip(self.labels, v)) for v in pool_values] or [{}]





          share|improve this answer








          New contributor




          Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            TL;DR: Scroll to the end for the provided code with suggested improvements



            Suggestions



            1. Standalone cartesian_product helper function outside scope of combinations



            A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



            If you still don't want to use itertools.product:



            While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



            2. flatten helper function



            This helper function should be used thusly:



            for term in cartesian_product(*pools):
            results.append(flatten(term))


            This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



            3. Consolidate some code



            If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



            results = 
            for term in cartesian_product(*pools):
            results.append([pp for p in term for pp in p])


            should now be



            results = 
            for term in itertools.product(*pools):
            results.append(flatten(term))


            This block can be expressed clearly and concisely with the following list comprehension:



            results = [flatten(term) for term in itertools.product(*pools)]


            Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



            4. Check for empty mappings at top of combinations



            Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



            if self.mappings == [{}]:
            return [{}]


            The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



            if len(tmp) == 0:
            return [{}]
            else:
            return tmp


            with



            return tmp or [{}]


            This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



            5. Define labels and pools using helper methods or non-init fields



            Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



            See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



            TL;DR



            In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



            def flatten(l):
            return [item for sublist in l for item in sublist]


            With either ...



            @dataclass
            class ParametricMapping1:
            mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

            def _labels(self) -> List[Tuple[str]]:
            return [label for arg in self.mappings for label in arg.keys()]

            def _pools(self) -> List[List[Sequence[float]]]:
            return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

            @property
            def combinations(self) -> List[Mapping[Tuple[str], float]]:
            if self.mappings == [{}]:
            return [{}]

            pool_values = [flatten(term) for term in itertools.product(*self._pools())]
            return [dict(zip(self._labels(), v)) for v in pool_values]


            Or ...



            @dataclass
            class ParametricMapping2:
            mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
            labels: List[Tuple[str]] = field(init=False, repr=False)
            pools: List[List[Sequence[float]]] = field(init=False, repr=False)

            def __post_init__(self):
            self.labels = [label for arg in self.mappings for label in arg.keys()]
            self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

            @property
            def combinations(self) -> List[Mapping[Tuple[str], float]]:
            pool_values = [flatten(term) for term in itertools.product(*self.pools)]
            return [dict(zip(self.labels, v)) for v in pool_values] or [{}]





            share|improve this answer








            New contributor




            Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.























              1














              TL;DR: Scroll to the end for the provided code with suggested improvements



              Suggestions



              1. Standalone cartesian_product helper function outside scope of combinations



              A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



              If you still don't want to use itertools.product:



              While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



              2. flatten helper function



              This helper function should be used thusly:



              for term in cartesian_product(*pools):
              results.append(flatten(term))


              This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



              3. Consolidate some code



              If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



              results = 
              for term in cartesian_product(*pools):
              results.append([pp for p in term for pp in p])


              should now be



              results = 
              for term in itertools.product(*pools):
              results.append(flatten(term))


              This block can be expressed clearly and concisely with the following list comprehension:



              results = [flatten(term) for term in itertools.product(*pools)]


              Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



              4. Check for empty mappings at top of combinations



              Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



              if self.mappings == [{}]:
              return [{}]


              The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



              if len(tmp) == 0:
              return [{}]
              else:
              return tmp


              with



              return tmp or [{}]


              This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



              5. Define labels and pools using helper methods or non-init fields



              Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



              See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



              TL;DR



              In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



              def flatten(l):
              return [item for sublist in l for item in sublist]


              With either ...



              @dataclass
              class ParametricMapping1:
              mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

              def _labels(self) -> List[Tuple[str]]:
              return [label for arg in self.mappings for label in arg.keys()]

              def _pools(self) -> List[List[Sequence[float]]]:
              return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

              @property
              def combinations(self) -> List[Mapping[Tuple[str], float]]:
              if self.mappings == [{}]:
              return [{}]

              pool_values = [flatten(term) for term in itertools.product(*self._pools())]
              return [dict(zip(self._labels(), v)) for v in pool_values]


              Or ...



              @dataclass
              class ParametricMapping2:
              mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
              labels: List[Tuple[str]] = field(init=False, repr=False)
              pools: List[List[Sequence[float]]] = field(init=False, repr=False)

              def __post_init__(self):
              self.labels = [label for arg in self.mappings for label in arg.keys()]
              self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

              @property
              def combinations(self) -> List[Mapping[Tuple[str], float]]:
              pool_values = [flatten(term) for term in itertools.product(*self.pools)]
              return [dict(zip(self.labels, v)) for v in pool_values] or [{}]





              share|improve this answer








              New contributor




              Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





















                1












                1








                1






                TL;DR: Scroll to the end for the provided code with suggested improvements



                Suggestions



                1. Standalone cartesian_product helper function outside scope of combinations



                A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



                If you still don't want to use itertools.product:



                While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



                2. flatten helper function



                This helper function should be used thusly:



                for term in cartesian_product(*pools):
                results.append(flatten(term))


                This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



                3. Consolidate some code



                If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



                results = 
                for term in cartesian_product(*pools):
                results.append([pp for p in term for pp in p])


                should now be



                results = 
                for term in itertools.product(*pools):
                results.append(flatten(term))


                This block can be expressed clearly and concisely with the following list comprehension:



                results = [flatten(term) for term in itertools.product(*pools)]


                Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



                4. Check for empty mappings at top of combinations



                Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



                if self.mappings == [{}]:
                return [{}]


                The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



                if len(tmp) == 0:
                return [{}]
                else:
                return tmp


                with



                return tmp or [{}]


                This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



                5. Define labels and pools using helper methods or non-init fields



                Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



                See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



                TL;DR



                In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



                def flatten(l):
                return [item for sublist in l for item in sublist]


                With either ...



                @dataclass
                class ParametricMapping1:
                mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

                def _labels(self) -> List[Tuple[str]]:
                return [label for arg in self.mappings for label in arg.keys()]

                def _pools(self) -> List[List[Sequence[float]]]:
                return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

                @property
                def combinations(self) -> List[Mapping[Tuple[str], float]]:
                if self.mappings == [{}]:
                return [{}]

                pool_values = [flatten(term) for term in itertools.product(*self._pools())]
                return [dict(zip(self._labels(), v)) for v in pool_values]


                Or ...



                @dataclass
                class ParametricMapping2:
                mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
                labels: List[Tuple[str]] = field(init=False, repr=False)
                pools: List[List[Sequence[float]]] = field(init=False, repr=False)

                def __post_init__(self):
                self.labels = [label for arg in self.mappings for label in arg.keys()]
                self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

                @property
                def combinations(self) -> List[Mapping[Tuple[str], float]]:
                pool_values = [flatten(term) for term in itertools.product(*self.pools)]
                return [dict(zip(self.labels, v)) for v in pool_values] or [{}]





                share|improve this answer








                New contributor




                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                TL;DR: Scroll to the end for the provided code with suggested improvements



                Suggestions



                1. Standalone cartesian_product helper function outside scope of combinations



                A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



                If you still don't want to use itertools.product:



                While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



                2. flatten helper function



                This helper function should be used thusly:



                for term in cartesian_product(*pools):
                results.append(flatten(term))


                This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



                3. Consolidate some code



                If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



                results = 
                for term in cartesian_product(*pools):
                results.append([pp for p in term for pp in p])


                should now be



                results = 
                for term in itertools.product(*pools):
                results.append(flatten(term))


                This block can be expressed clearly and concisely with the following list comprehension:



                results = [flatten(term) for term in itertools.product(*pools)]


                Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



                4. Check for empty mappings at top of combinations



                Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



                if self.mappings == [{}]:
                return [{}]


                The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



                if len(tmp) == 0:
                return [{}]
                else:
                return tmp


                with



                return tmp or [{}]


                This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



                5. Define labels and pools using helper methods or non-init fields



                Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



                See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



                TL;DR



                In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



                def flatten(l):
                return [item for sublist in l for item in sublist]


                With either ...



                @dataclass
                class ParametricMapping1:
                mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

                def _labels(self) -> List[Tuple[str]]:
                return [label for arg in self.mappings for label in arg.keys()]

                def _pools(self) -> List[List[Sequence[float]]]:
                return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

                @property
                def combinations(self) -> List[Mapping[Tuple[str], float]]:
                if self.mappings == [{}]:
                return [{}]

                pool_values = [flatten(term) for term in itertools.product(*self._pools())]
                return [dict(zip(self._labels(), v)) for v in pool_values]


                Or ...



                @dataclass
                class ParametricMapping2:
                mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
                labels: List[Tuple[str]] = field(init=False, repr=False)
                pools: List[List[Sequence[float]]] = field(init=False, repr=False)

                def __post_init__(self):
                self.labels = [label for arg in self.mappings for label in arg.keys()]
                self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

                @property
                def combinations(self) -> List[Mapping[Tuple[str], float]]:
                pool_values = [flatten(term) for term in itertools.product(*self.pools)]
                return [dict(zip(self.labels, v)) for v in pool_values] or [{}]






                share|improve this answer








                New contributor




                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer






                New contributor




                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 57 mins ago









                Hunter McGushionHunter McGushion

                113




                113




                New contributor




                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    404 Error Contact Form 7 ajax form submitting

                    How to know if a Active Directory user can login interactively

                    TypeError: fit_transform() missing 1 required positional argument: 'X'