Choose dictionary keys only if their values don't have a certain number of duplicates












-1














Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question




















  • 1




    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
    – timgeb
    Nov 21 '18 at 18:47






  • 1




    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
    – jonrsharpe
    Nov 21 '18 at 18:49
















-1














Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question




















  • 1




    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
    – timgeb
    Nov 21 '18 at 18:47






  • 1




    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
    – jonrsharpe
    Nov 21 '18 at 18:49














-1












-1








-1







Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question















Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}






python dictionary






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 18:55









Conner

23.2k84568




23.2k84568










asked Nov 21 '18 at 18:43









CompComp

456




456








  • 1




    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
    – timgeb
    Nov 21 '18 at 18:47






  • 1




    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
    – jonrsharpe
    Nov 21 '18 at 18:49














  • 1




    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
    – timgeb
    Nov 21 '18 at 18:47






  • 1




    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
    – jonrsharpe
    Nov 21 '18 at 18:49








1




1




I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
– timgeb
Nov 21 '18 at 18:47




I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".
– timgeb
Nov 21 '18 at 18:47




1




1




So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
– jonrsharpe
Nov 21 '18 at 18:49




So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.
– jonrsharpe
Nov 21 '18 at 18:49












3 Answers
3






active

oldest

votes


















2














The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



from collections import Counter

dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
n = 3

items = Counter(dictation).most_common(n+1)
last_val = items[-1][1]
if len(items) > n:
while items[-1][1] == last_val:
items.pop()

new = dict(items)
# {'apple': 5, 'pears': 4}





share|improve this answer























  • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
    – timgeb
    Nov 21 '18 at 18:50










  • @timgeb I added the necessary bumpiness. Lost all of its appeal :(
    – schwobaseggl
    Nov 21 '18 at 18:57












  • still shorter then mine
    – Patrick Artner
    Nov 21 '18 at 19:03



















1














This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



from collections import defaultdict, Counter

def gimme(d,n):
c = Counter(d)
grpd = defaultdict(list)
for key,value in c.items():
grpd[value].append(key)


result = {}
for key,value in c.most_common():
if len(grpd[value])+len(result) <= n:
result.update( {k:value for k in grpd[value] } )
else:
break
return result


Test:



data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

for k in range(10):
print(k, gimme(data,k))


Output:



0 {}
1 {'apple': 5}
2 {'apple': 5, 'pears': 4}
3 {'apple': 5, 'pears': 4}
4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





share|improve this answer





























    1














    As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



    The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



    from heapq import nlargest

    dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

    n = 3
    largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
    n_plus_one_value = largest_items[-1][1]

    res = {k: v for k, v in largest_items if v > n_plus_one_value}

    print(res)

    {'apple': 5, 'pears': 4}


    We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





    The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



    from heapq import nlargest
    from operator import itemgetter
    from bisect import bisect

    dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

    n = 3
    largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
    n_plus_one_value = largest_items[-1][1]

    index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

    res = dict(largest_items[:len(largest_items) - index])

    print(res)

    {'apple': 5, 'pears': 4}





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418662%2fchoose-dictionary-keys-only-if-their-values-dont-have-a-certain-number-of-dupli%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer























      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
        – timgeb
        Nov 21 '18 at 18:50










      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(
        – schwobaseggl
        Nov 21 '18 at 18:57












      • still shorter then mine
        – Patrick Artner
        Nov 21 '18 at 19:03
















      2














      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer























      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
        – timgeb
        Nov 21 '18 at 18:50










      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(
        – schwobaseggl
        Nov 21 '18 at 18:57












      • still shorter then mine
        – Patrick Artner
        Nov 21 '18 at 19:03














      2












      2








      2






      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer














      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 21 '18 at 18:54

























      answered Nov 21 '18 at 18:49









      schwobasegglschwobaseggl

      36.8k32441




      36.8k32441












      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
        – timgeb
        Nov 21 '18 at 18:50










      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(
        – schwobaseggl
        Nov 21 '18 at 18:57












      • still shorter then mine
        – Patrick Artner
        Nov 21 '18 at 19:03


















      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
        – timgeb
        Nov 21 '18 at 18:50










      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(
        – schwobaseggl
        Nov 21 '18 at 18:57












      • still shorter then mine
        – Patrick Artner
        Nov 21 '18 at 19:03
















      But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
      – timgeb
      Nov 21 '18 at 18:50




      But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.
      – timgeb
      Nov 21 '18 at 18:50












      @timgeb I added the necessary bumpiness. Lost all of its appeal :(
      – schwobaseggl
      Nov 21 '18 at 18:57






      @timgeb I added the necessary bumpiness. Lost all of its appeal :(
      – schwobaseggl
      Nov 21 '18 at 18:57














      still shorter then mine
      – Patrick Artner
      Nov 21 '18 at 19:03




      still shorter then mine
      – Patrick Artner
      Nov 21 '18 at 19:03













      1














      This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



      from collections import defaultdict, Counter

      def gimme(d,n):
      c = Counter(d)
      grpd = defaultdict(list)
      for key,value in c.items():
      grpd[value].append(key)


      result = {}
      for key,value in c.most_common():
      if len(grpd[value])+len(result) <= n:
      result.update( {k:value for k in grpd[value] } )
      else:
      break
      return result


      Test:



      data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

      for k in range(10):
      print(k, gimme(data,k))


      Output:



      0 {}
      1 {'apple': 5}
      2 {'apple': 5, 'pears': 4}
      3 {'apple': 5, 'pears': 4}
      4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
      5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
      6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





      share|improve this answer


























        1














        This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



        from collections import defaultdict, Counter

        def gimme(d,n):
        c = Counter(d)
        grpd = defaultdict(list)
        for key,value in c.items():
        grpd[value].append(key)


        result = {}
        for key,value in c.most_common():
        if len(grpd[value])+len(result) <= n:
        result.update( {k:value for k in grpd[value] } )
        else:
        break
        return result


        Test:



        data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

        for k in range(10):
        print(k, gimme(data,k))


        Output:



        0 {}
        1 {'apple': 5}
        2 {'apple': 5, 'pears': 4}
        3 {'apple': 5, 'pears': 4}
        4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
        5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
        6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





        share|improve this answer
























          1












          1








          1






          This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



          from collections import defaultdict, Counter

          def gimme(d,n):
          c = Counter(d)
          grpd = defaultdict(list)
          for key,value in c.items():
          grpd[value].append(key)


          result = {}
          for key,value in c.most_common():
          if len(grpd[value])+len(result) <= n:
          result.update( {k:value for k in grpd[value] } )
          else:
          break
          return result


          Test:



          data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

          for k in range(10):
          print(k, gimme(data,k))


          Output:



          0 {}
          1 {'apple': 5}
          2 {'apple': 5, 'pears': 4}
          3 {'apple': 5, 'pears': 4}
          4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





          share|improve this answer












          This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



          from collections import defaultdict, Counter

          def gimme(d,n):
          c = Counter(d)
          grpd = defaultdict(list)
          for key,value in c.items():
          grpd[value].append(key)


          result = {}
          for key,value in c.most_common():
          if len(grpd[value])+len(result) <= n:
          result.update( {k:value for k in grpd[value] } )
          else:
          break
          return result


          Test:



          data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

          for k in range(10):
          print(k, gimme(data,k))


          Output:



          0 {}
          1 {'apple': 5}
          2 {'apple': 5, 'pears': 4}
          3 {'apple': 5, 'pears': 4}
          4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 19:01









          Patrick ArtnerPatrick Artner

          22.1k62143




          22.1k62143























              1














              As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



              The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



              from heapq import nlargest

              dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

              n = 3
              largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
              n_plus_one_value = largest_items[-1][1]

              res = {k: v for k, v in largest_items if v > n_plus_one_value}

              print(res)

              {'apple': 5, 'pears': 4}


              We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





              The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



              from heapq import nlargest
              from operator import itemgetter
              from bisect import bisect

              dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

              n = 3
              largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
              n_plus_one_value = largest_items[-1][1]

              index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

              res = dict(largest_items[:len(largest_items) - index])

              print(res)

              {'apple': 5, 'pears': 4}





              share|improve this answer




























                1














                As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                from heapq import nlargest

                dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                n = 3
                largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                n_plus_one_value = largest_items[-1][1]

                res = {k: v for k, v in largest_items if v > n_plus_one_value}

                print(res)

                {'apple': 5, 'pears': 4}


                We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                from heapq import nlargest
                from operator import itemgetter
                from bisect import bisect

                dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                n = 3
                largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                n_plus_one_value = largest_items[-1][1]

                index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                res = dict(largest_items[:len(largest_items) - index])

                print(res)

                {'apple': 5, 'pears': 4}





                share|improve this answer


























                  1












                  1








                  1






                  As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                  The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                  from heapq import nlargest

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  res = {k: v for k, v in largest_items if v > n_plus_one_value}

                  print(res)

                  {'apple': 5, 'pears': 4}


                  We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                  The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                  from heapq import nlargest
                  from operator import itemgetter
                  from bisect import bisect

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                  res = dict(largest_items[:len(largest_items) - index])

                  print(res)

                  {'apple': 5, 'pears': 4}





                  share|improve this answer














                  As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                  The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                  from heapq import nlargest

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  res = {k: v for k, v in largest_items if v > n_plus_one_value}

                  print(res)

                  {'apple': 5, 'pears': 4}


                  We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                  The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                  from heapq import nlargest
                  from operator import itemgetter
                  from bisect import bisect

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                  res = dict(largest_items[:len(largest_items) - index])

                  print(res)

                  {'apple': 5, 'pears': 4}






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 22 '18 at 2:28

























                  answered Nov 21 '18 at 19:03









                  jppjpp

                  93.2k2054104




                  93.2k2054104






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418662%2fchoose-dictionary-keys-only-if-their-values-dont-have-a-certain-number-of-dupli%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      TypeError: fit_transform() missing 1 required positional argument: 'X'