Data transformation involving dictionaries and lists
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
add a comment |
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
add a comment |
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
python python-3.x
edited 18 mins ago
Jamal♦
30.3k11116226
30.3k11116226
asked 10 hours ago
Cedric H.Cedric H.
1465
1465
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return [label for arg in self.mappings for label in arg.keys()]
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = [label for arg in self.mappings for label in arg.keys()]
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
New contributor
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return [label for arg in self.mappings for label in arg.keys()]
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = [label for arg in self.mappings for label in arg.keys()]
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
New contributor
add a comment |
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return [label for arg in self.mappings for label in arg.keys()]
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = [label for arg in self.mappings for label in arg.keys()]
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
New contributor
add a comment |
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return [label for arg in self.mappings for label in arg.keys()]
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = [label for arg in self.mappings for label in arg.keys()]
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
New contributor
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return [label for arg in self.mappings for label in arg.keys()]
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = [label for arg in self.mappings for label in arg.keys()]
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
New contributor
New contributor
answered 57 mins ago
Hunter McGushionHunter McGushion
113
113
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown