optimal algorithm grouping data in javascript

up vote
10
down vote

favorite

The following (simplified) json type of data defines a Contact:

{

  id:   number;

  name: string;

  phone: string;

  email: string

}

There is the following group of data:

+---+----------+-------------+---------------------------+ 

|id | name     | phone       |email                      | 

+---+----------+-------------+---------------------------+

|1  | John     | 11111111    |aaaa@test.com              | 

|2  | Marc     | 22222222    |bbbb@test.com              | 

|3  | Ron      | 99999999    |aaaa@test.com              |

|4  | Andrew   | 55555555    |dddd@test.com              |

|5  | Wim      | 99999999    |gggg@test.com              |

|6  | Marc     | 33333333    |cccc@test.com              |

|7  | Dan      | 44444444    |cccc@test.com              |

+---+----------+-------------+---------------------------+

Goal is to find the groups that belong together using javascript(optionally in lodash, but main idea is to get the algorithm clear) according to the following constraint: a contact belongs to a group when any of the following criteria are the same: name, phone or email. The results shows the id's grouped as arrays in arrays. a contact in a group of 1 is ignored.

In the example above, it means that contacts with ids 1,3,5 belong together since 1,3 share the same email and 3 and 5 share the same phonenumber. Likewise 2,6,7: 2 and 6 have the same name and 6 and 7 have the same email. 5 does not have anything in common.
The expected result therefore is:
[[1,3,5], [2,6,7]]

Background:
One solution that works is iterating over every item and check for the remainder of the list if name, email, or phone is the same. If so, group them and take them out of the list (in the example we compare 1 with all the items in the list and only 3 is found). problem is that next items also need to be checked again to these groups, because in this case 5 is not yet detected as part of the group. This makes the algorithm complex, while I suspect there is a simple way of solving this in linear time. There might also be a name for this class of problems?
`

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #: 99999999? Would it be [[1, 2, 3, 5, 6, 7]]?
– Josep
Nov 20 at 9:27

1

correct, we would have one group.
– Han
Nov 20 at 9:39

I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 at 9:52

Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 at 4:53

add a comment |

up vote
10
down vote

favorite

The following (simplified) json type of data defines a Contact:

{

  id:   number;

  name: string;

  phone: string;

  email: string

}

There is the following group of data:

+---+----------+-------------+---------------------------+ 

|id | name     | phone       |email                      | 

+---+----------+-------------+---------------------------+

|1  | John     | 11111111    |aaaa@test.com              | 

|2  | Marc     | 22222222    |bbbb@test.com              | 

|3  | Ron      | 99999999    |aaaa@test.com              |

|4  | Andrew   | 55555555    |dddd@test.com              |

|5  | Wim      | 99999999    |gggg@test.com              |

|6  | Marc     | 33333333    |cccc@test.com              |

|7  | Dan      | 44444444    |cccc@test.com              |

+---+----------+-------------+---------------------------+

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #: 99999999? Would it be [[1, 2, 3, 5, 6, 7]]?
– Josep
Nov 20 at 9:27

1

correct, we would have one group.
– Han
Nov 20 at 9:39

I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 at 9:52

Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 at 4:53

add a comment |

up vote
10
down vote

favorite

The following (simplified) json type of data defines a Contact:

{

  id:   number;

  name: string;

  phone: string;

  email: string

}

There is the following group of data:

+---+----------+-------------+---------------------------+ 

|id | name     | phone       |email                      | 

+---+----------+-------------+---------------------------+

|1  | John     | 11111111    |aaaa@test.com              | 

|2  | Marc     | 22222222    |bbbb@test.com              | 

|3  | Ron      | 99999999    |aaaa@test.com              |

|4  | Andrew   | 55555555    |dddd@test.com              |

|5  | Wim      | 99999999    |gggg@test.com              |

|6  | Marc     | 33333333    |cccc@test.com              |

|7  | Dan      | 44444444    |cccc@test.com              |

+---+----------+-------------+---------------------------+

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

The following (simplified) json type of data defines a Contact:

{

  id:   number;

  name: string;

  phone: string;

  email: string

}

There is the following group of data:

+---+----------+-------------+---------------------------+ 

|id | name     | phone       |email                      | 

+---+----------+-------------+---------------------------+

|1  | John     | 11111111    |aaaa@test.com              | 

|2  | Marc     | 22222222    |bbbb@test.com              | 

|3  | Ron      | 99999999    |aaaa@test.com              |

|4  | Andrew   | 55555555    |dddd@test.com              |

|5  | Wim      | 99999999    |gggg@test.com              |

|6  | Marc     | 33333333    |cccc@test.com              |

|7  | Dan      | 44444444    |cccc@test.com              |

+---+----------+-------------+---------------------------+

javascript algorithm grouping lodash

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

edited Nov 20 at 12:01

asked Nov 20 at 9:18

Han

535

asked Nov 20 at 9:18

Han

535

asked Nov 20 at 9:18

Han

535

That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #: 99999999? Would it be [[1, 2, 3, 5, 6, 7]]?
– Josep
Nov 20 at 9:27

1

correct, we would have one group.
– Han
Nov 20 at 9:39

I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 at 9:52

Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 at 4:53

add a comment |

That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #: 99999999? Would it be [[1, 2, 3, 5, 6, 7]]?
– Josep
Nov 20 at 9:27

1

correct, we would have one group.
– Han
Nov 20 at 9:39

I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 at 9:52

Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 at 4:53

That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #: 99999999? Would it be [[1, 2, 3, 5, 6, 7]]?
– Josep
Nov 20 at 9:27

correct, we would have one group.
– Han
Nov 20 at 9:39

I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 at 9:52

Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 at 4:53

add a comment |

4 Answers
4

active

oldest

votes

up vote
3
down vote

accepted

Idea:

Start with 0 groups

Iterate your list of contacts

Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.

Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

1

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

add a comment |

up vote
2
down vote

Any answer that iterates over each of n entries, and then over a growing list of m groups to match against is going to have worst-time performance of O(n*m) (found when there are no two entries that match on any term).

Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q options will further have to pay a penalty of O(q) per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m).

I believe this answer is O(n), because assuming that the number of fields to match with is a constant (in this case, 3: name, phone and email), all operations in the main loop, which is run once per entry, are O(1).

There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m), with m the number of groups; with an extra O(n) when actually copying entry IDs to the merged groups: overall, we are still in O(n) territory.

The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.

const data = [

    {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

    {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

    {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

    {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

    {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

    {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

    {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



const groups = function(inputs) {



    let valuesToGroups = new Map(

        ['name', 'phone', 'email'].map(key => [key, new Map()]));

    let groups = new Map();

    let pendingMerges = ;

    for (const entry of inputs) {

        let group = undefined;

        let found = ;

        for (const [key, valueMap] of valuesToGroups) {

            // look up value in values-index for current key

            group = valueMap.get(entry[key]);

            if (group !== undefined) {

                found.push(group);

                // not breaking allows groups to be merged

            }

        }

        if (found.length === 0) {

            // not found: create new group

            group = groups.size;

            groups.set(group, [entry.id]);

        } else {

            // found: add entry to chosen group

            group = found[0];

            groups.get(group).push(entry.id);

            if (found.length > 1) {

                pendingMerges.push(found);

            }

        }

        // add entry's values to index, pointing to group

        for (const [key, valueMap] of valuesToGroups) {

            valueMap.set(entry[key], group); 

        }        

    }

    // do pending merges; initially, all groups are stand-alone

    let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));

    for (const merge of pendingMerges) {

        // contents will go to the lowest-numbered group

        const sorted = merge.map(groupId => merges.get(groupId)).sort();

        sorted.forEach(groupId => merges.set(groupId, sorted[0]));

    }

    const cleanGroups = new Map();

    groups.forEach((value, key) => { 

        const k = merges.get(key); 

        if ( ! cleanGroups.has(k)) {

            cleanGroups.set(k, );

        }

        value.forEach(id => cleanGroups.get(k).push(id))

    })

    // return only non-empty groups

    return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);

}(data);



console.log(""+JSON.stringify(groups))

// output is [[1,2,3,5,6,7]]

answered Nov 21 at 11:25

tucuxi

11.5k22858

add a comment |

up vote
0
down vote

Here is another suggestion of a route you could take. The idea is to use one Array.reduce to group by id and keep all the values (vls) and combined results (ids) in that accumulator object.

This way you can easily compare the name/phone/email using Array.some + Array.includes (which is what the getGroupId function does).

Once you have grouped and have the almost final result just prettify it by removing the groups with length of one and picking only the ids array of the rest:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

One thing to note is that we do not care about the number of properties since we do Object.values. So you can easily add another address or fax to that list and it would still work with zero code changes.

As per feedback here is another version which works slightly different:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

The reason for this version is due to the testData provided by @Mark which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.

To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.

The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

add a comment |

up vote
-1
down vote

One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names, phones and emails.

Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails so that next contacts might fall into the same on.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

answered Nov 20 at 9:51

Ahmad

8,11143463

2

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53389734%2foptimal-algorithm-grouping-data-in-javascript%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
3
down vote

accepted

Idea:

Start with 0 groups

Iterate your list of contacts

Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

1

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

add a comment |

up vote
3
down vote

accepted

Idea:

Start with 0 groups

Iterate your list of contacts

Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

1

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

add a comment |

up vote
3
down vote

accepted

Idea:

Start with 0 groups

Iterate your list of contacts

Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

Idea:

Start with 0 groups

Iterate your list of contacts

Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



// UNION-FIND structure, with path comression and union by rank



var UNIONFIND = (function () {

    

    function _find(n)

    {

        if(n.parent == n) return n;	

        n.parent = _find(n.parent);	

        return n.parent;

    }

    

    return {

        makeset:function(id){    

            var newnode = {

                parent: null,

                id: id,

                rank: 0

            };

            newnode.parent = newnode;            

            return newnode;

        },

    

        find: _find,

     

        combine: function(n1, n2) {                                    

            var n1 = _find(n1);

            var n2 = _find(n2);

            

            if (n1 == n2) return;

        

            if(n1.rank < n2.rank)

            {

                n2.parent = n2;

                return n2;

            }

            else if(n2.rank < n1.rank)

            {

                n2.parent = n1;

                return n1;

            }

            else

            {

                n2.parent = n1;

                n1.rank += 1;

                return n1;

            }

        }

    };

})();



var groupHash = {name: {}, phone: {}, email: {}}

var groupNodes = 



data.forEach(function(contact){

  var group = UNIONFIND.makeset(contact.id);

  var groups = new Set();

  ["name", "phone", "email"].forEach(function(attr){

    if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])

  });

  

  groups = Array.from(groups);

  groups.push(group);

  groupNodes.push(group);

  

  for(var i = 1; i < groups.length; i++) {

    UNIONFIND.combine(groups[0], groups[i]);

  }  

  

  ["name", "phone", "email"].forEach(function(attr){

      groupHash[attr][contact[attr]] = groups[0];

  });

  

})



var contactsInGroup = {}





groupNodes.forEach(function(group){

    var groupId = UNIONFIND.find(group).id;

    

    if (contactsInGroup.hasOwnProperty(groupId) == false) {

      contactsInGroup[groupId] = ;

    }

    

    contactsInGroup[groupId].push(group.id);

})



var result = Object.values(contactsInGroup).filter(function(list){

 return list.length > 1

})



console.log(result)

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

edited Nov 20 at 14:59

answered Nov 20 at 14:30

juvian

13.2k22026

answered Nov 20 at 14:30

juvian

13.2k22026

answered Nov 20 at 14:30

juvian

13.2k22026

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

1

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

add a comment |

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

1

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

Using Map instead of JS objects would allow you to avoid hasOwnProperty. You already use Set, which has a similar API and advantages.
– tucuxi
Nov 21 at 11:35

Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 at 11:39

@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 at 14:19

add a comment |

up vote
2
down vote

The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.

const data = [

    {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

    {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

    {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

    {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

    {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

    {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

    {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



const groups = function(inputs) {



    let valuesToGroups = new Map(

        ['name', 'phone', 'email'].map(key => [key, new Map()]));

    let groups = new Map();

    let pendingMerges = ;

    for (const entry of inputs) {

        let group = undefined;

        let found = ;

        for (const [key, valueMap] of valuesToGroups) {

            // look up value in values-index for current key

            group = valueMap.get(entry[key]);

            if (group !== undefined) {

                found.push(group);

                // not breaking allows groups to be merged

            }

        }

        if (found.length === 0) {

            // not found: create new group

            group = groups.size;

            groups.set(group, [entry.id]);

        } else {

            // found: add entry to chosen group

            group = found[0];

            groups.get(group).push(entry.id);

            if (found.length > 1) {

                pendingMerges.push(found);

            }

        }

        // add entry's values to index, pointing to group

        for (const [key, valueMap] of valuesToGroups) {

            valueMap.set(entry[key], group); 

        }        

    }

    // do pending merges; initially, all groups are stand-alone

    let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));

    for (const merge of pendingMerges) {

        // contents will go to the lowest-numbered group

        const sorted = merge.map(groupId => merges.get(groupId)).sort();

        sorted.forEach(groupId => merges.set(groupId, sorted[0]));

    }

    const cleanGroups = new Map();

    groups.forEach((value, key) => { 

        const k = merges.get(key); 

        if ( ! cleanGroups.has(k)) {

            cleanGroups.set(k, );

        }

        value.forEach(id => cleanGroups.get(k).push(id))

    })

    // return only non-empty groups

    return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);

}(data);



console.log(""+JSON.stringify(groups))

// output is [[1,2,3,5,6,7]]

answered Nov 21 at 11:25

tucuxi

11.5k22858

add a comment |

up vote
2
down vote

The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.

const data = [

    {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

    {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

    {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

    {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

    {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

    {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

    {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



const groups = function(inputs) {



    let valuesToGroups = new Map(

        ['name', 'phone', 'email'].map(key => [key, new Map()]));

    let groups = new Map();

    let pendingMerges = ;

    for (const entry of inputs) {

        let group = undefined;

        let found = ;

        for (const [key, valueMap] of valuesToGroups) {

            // look up value in values-index for current key

            group = valueMap.get(entry[key]);

            if (group !== undefined) {

                found.push(group);

                // not breaking allows groups to be merged

            }

        }

        if (found.length === 0) {

            // not found: create new group

            group = groups.size;

            groups.set(group, [entry.id]);

        } else {

            // found: add entry to chosen group

            group = found[0];

            groups.get(group).push(entry.id);

            if (found.length > 1) {

                pendingMerges.push(found);

            }

        }

        // add entry's values to index, pointing to group

        for (const [key, valueMap] of valuesToGroups) {

            valueMap.set(entry[key], group); 

        }        

    }

    // do pending merges; initially, all groups are stand-alone

    let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));

    for (const merge of pendingMerges) {

        // contents will go to the lowest-numbered group

        const sorted = merge.map(groupId => merges.get(groupId)).sort();

        sorted.forEach(groupId => merges.set(groupId, sorted[0]));

    }

    const cleanGroups = new Map();

    groups.forEach((value, key) => { 

        const k = merges.get(key); 

        if ( ! cleanGroups.has(k)) {

            cleanGroups.set(k, );

        }

        value.forEach(id => cleanGroups.get(k).push(id))

    })

    // return only non-empty groups

    return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);

}(data);



console.log(""+JSON.stringify(groups))

// output is [[1,2,3,5,6,7]]

answered Nov 21 at 11:25

tucuxi

11.5k22858

add a comment |

up vote
2
down vote

The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.

const data = [

    {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

    {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

    {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

    {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

    {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

    {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

    {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



const groups = function(inputs) {



    let valuesToGroups = new Map(

        ['name', 'phone', 'email'].map(key => [key, new Map()]));

    let groups = new Map();

    let pendingMerges = ;

    for (const entry of inputs) {

        let group = undefined;

        let found = ;

        for (const [key, valueMap] of valuesToGroups) {

            // look up value in values-index for current key

            group = valueMap.get(entry[key]);

            if (group !== undefined) {

                found.push(group);

                // not breaking allows groups to be merged

            }

        }

        if (found.length === 0) {

            // not found: create new group

            group = groups.size;

            groups.set(group, [entry.id]);

        } else {

            // found: add entry to chosen group

            group = found[0];

            groups.get(group).push(entry.id);

            if (found.length > 1) {

                pendingMerges.push(found);

            }

        }

        // add entry's values to index, pointing to group

        for (const [key, valueMap] of valuesToGroups) {

            valueMap.set(entry[key], group); 

        }        

    }

    // do pending merges; initially, all groups are stand-alone

    let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));

    for (const merge of pendingMerges) {

        // contents will go to the lowest-numbered group

        const sorted = merge.map(groupId => merges.get(groupId)).sort();

        sorted.forEach(groupId => merges.set(groupId, sorted[0]));

    }

    const cleanGroups = new Map();

    groups.forEach((value, key) => { 

        const k = merges.get(key); 

        if ( ! cleanGroups.has(k)) {

            cleanGroups.set(k, );

        }

        value.forEach(id => cleanGroups.get(k).push(id))

    })

    // return only non-empty groups

    return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);

}(data);



console.log(""+JSON.stringify(groups))

// output is [[1,2,3,5,6,7]]

answered Nov 21 at 11:25

tucuxi

11.5k22858

The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.

const data = [

    {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

    {id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},

    {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

    {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

    {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

    {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

    {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



const groups = function(inputs) {



    let valuesToGroups = new Map(

        ['name', 'phone', 'email'].map(key => [key, new Map()]));

    let groups = new Map();

    let pendingMerges = ;

    for (const entry of inputs) {

        let group = undefined;

        let found = ;

        for (const [key, valueMap] of valuesToGroups) {

            // look up value in values-index for current key

            group = valueMap.get(entry[key]);

            if (group !== undefined) {

                found.push(group);

                // not breaking allows groups to be merged

            }

        }

        if (found.length === 0) {

            // not found: create new group

            group = groups.size;

            groups.set(group, [entry.id]);

        } else {

            // found: add entry to chosen group

            group = found[0];

            groups.get(group).push(entry.id);

            if (found.length > 1) {

                pendingMerges.push(found);

            }

        }

        // add entry's values to index, pointing to group

        for (const [key, valueMap] of valuesToGroups) {

            valueMap.set(entry[key], group); 

        }        

    }

    // do pending merges; initially, all groups are stand-alone

    let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));

    for (const merge of pendingMerges) {

        // contents will go to the lowest-numbered group

        const sorted = merge.map(groupId => merges.get(groupId)).sort();

        sorted.forEach(groupId => merges.set(groupId, sorted[0]));

    }

    const cleanGroups = new Map();

    groups.forEach((value, key) => { 

        const k = merges.get(key); 

        if ( ! cleanGroups.has(k)) {

            cleanGroups.set(k, );

        }

        value.forEach(id => cleanGroups.get(k).push(id))

    })

    // return only non-empty groups

    return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);

}(data);



console.log(""+JSON.stringify(groups))

// output is [[1,2,3,5,6,7]]

answered Nov 21 at 11:25

tucuxi

11.5k22858

answered Nov 21 at 11:25

tucuxi

11.5k22858

answered Nov 21 at 11:25

tucuxi

11.5k22858

answered Nov 21 at 11:25

tucuxi

11.5k22858

add a comment |

up vote
0
down vote

This way you can easily compare the name/phone/email using Array.some + Array.includes (which is what the getGroupId function does).

Once you have grouped and have the almost final result just prettify it by removing the groups with length of one and picking only the ids array of the rest:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

As per feedback here is another version which works slightly different:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

add a comment |

up vote
0
down vote

This way you can easily compare the name/phone/email using Array.some + Array.includes (which is what the getGroupId function does).

Once you have grouped and have the almost final result just prettify it by removing the groups with length of one and picking only the ids array of the rest:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

As per feedback here is another version which works slightly different:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

add a comment |

up vote
0
down vote

This way you can easily compare the name/phone/email using Array.some + Array.includes (which is what the getGroupId function does).

Once you have grouped and have the almost final result just prettify it by removing the groups with length of one and picking only the ids array of the rest:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

As per feedback here is another version which works slightly different:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

This way you can easily compare the name/phone/email using Array.some + Array.includes (which is what the getGroupId function does).

Once you have grouped and have the almost final result just prettify it by removing the groups with length of one and picking only the ids array of the rest:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

As per feedback here is another version which works slightly different:

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];



const getGroupId = (obj, vals) => Object.entries(obj)

   .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r, c) => {

   let values = Object.values(c), groupID = getGroupId(r, values)[0]

	

   if(!groupID)	

      r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })

   else {

      r[groupID] = ({

         vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]

      })

   }

   return r

}, {})



const prettify = grp => Object.values(grp).reduce((r,c) => {

   if(c.ids.length > 1)

     r.push(c.ids)

     return r

}, )



console.log(prettify(group(data)))

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];

var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }]; 



const getGroupId = (obj, vals) => Object.entries(obj)

  .find(([k,v]) => v.vls.some(x => vals.includes(x))) || 



const group = d => d.reduce((r,c,i,a) => {

  let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]



  if (!groupID) {		

    let hits = a.filter(x => 

       x.id != c.id && values.some(v => Object.values(x).includes(v)))

    hits.forEach(h => 

       r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))

  }

  else

    r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] : 

      ({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })      

  return r

}, {})



const prettify = grp => Object.values(grp).reduce((r, c) => {

  if (c.ids.length > 1)

    r.push(c.ids)

  return r

}, )



console.log(prettify(group(data)))      // OP data

console.log(prettify(group(testData)))  // Test data

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

edited Nov 21 at 7:50

answered Nov 21 at 4:22

Akrion

9,15911224

answered Nov 21 at 4:22

Akrion

9,15911224

answered Nov 21 at 4:22

Akrion

9,15911224

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

add a comment |

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

I would expect these to all be in one group: {id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}, but it seems to leave 2 out.
– Mark Meyer
Nov 21 at 5:47

I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 at 7:50

The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in O(1). Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 at 11:29

In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 at 19:31

add a comment |

up vote
-1
down vote

One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names, phones and emails.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

answered Nov 20 at 9:51

Ahmad

8,11143463

2

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

add a comment |

up vote
-1
down vote

One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names, phones and emails.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

answered Nov 20 at 9:51

Ahmad

8,11143463

2

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

add a comment |

up vote
-1
down vote

One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names, phones and emails.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

answered Nov 20 at 9:51

Ahmad

8,11143463

One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names, phones and emails.

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

var data = [

      {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},

      {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},

      {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},

      {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},

      {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},

      {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},

      {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}

];



var groups = ;



data.forEach(function(person){

  var phone = person.phone;

  var email = person.email;

  var name = person.name;

  var id = person.id;

  var found = false;

  groups.forEach(function(g){

    if(    g.names.indexOf(name) > -1 

        || g.phones.indexOf(phone)>-1 

        || g.emails.indexOf(email)>-1) {

      found = true;

      g.names.push(name);

      g.phones.push(phone);

      g.emails.push(email);

      g.people.push(id);

    }

  });

  if(!found) {

      groups.push({names:[name],phones:[phone],emails:[email],people:[id]});

  }

  

  

});

var output=;

groups.forEach(function(g){

  output.push(g.people);

});

console.log(output);   //[ [1,3,5] , [2,6,7] , [4] ]

answered Nov 20 at 9:51

Ahmad

8,11143463

answered Nov 20 at 9:51

Ahmad

8,11143463

answered Nov 20 at 9:51

Ahmad

8,11143463

answered Nov 20 at 9:51

Ahmad

8,11143463

2

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

add a comment |

2

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 at 10:44

the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 at 13:47

Groups with only 1 element should not be part of the result as per OP. [4] should not be in the result.
– Akrion
Nov 21 at 6:11

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk