Replace multiple strings in XML using a key->value pair in a CSV file
up vote
0
down vote
favorite
I have a dump from our application server which contains XML of multiple strings. I am interested in the userID, which is embedded in the XML tags and in the format of (lasfir1) as in the XML examples below:
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>lasfir1</string>
</row>
<row>
<integer>1536</integer>
<string>lasfir1</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>lisfar1</string>
</row>
The requirement is to convert the string "lasfir1" only into its equivalent Email ID, which are available in another CSV (text) file which has key->value pairing of the userID and Email ID:
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
The XML may not always be the same, but the string will be the one to search for, not the pattern of what is ahead or behind it.
Is there some simple way to read the key->value pair (in the CSV file), check if the key (userID) exists in the XML file and then replace it with the 'value' (Email ID)
This is required for a set of 300+ userID and Email ID combinations, all of which might not be in the XML.
python regex xml perl csv
add a comment |
up vote
0
down vote
favorite
I have a dump from our application server which contains XML of multiple strings. I am interested in the userID, which is embedded in the XML tags and in the format of (lasfir1) as in the XML examples below:
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>lasfir1</string>
</row>
<row>
<integer>1536</integer>
<string>lasfir1</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>lisfar1</string>
</row>
The requirement is to convert the string "lasfir1" only into its equivalent Email ID, which are available in another CSV (text) file which has key->value pairing of the userID and Email ID:
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
The XML may not always be the same, but the string will be the one to search for, not the pattern of what is ahead or behind it.
Is there some simple way to read the key->value pair (in the CSV file), check if the key (userID) exists in the XML file and then replace it with the 'value' (Email ID)
This is required for a set of 300+ userID and Email ID combinations, all of which might not be in the XML.
python regex xml perl csv
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a dump from our application server which contains XML of multiple strings. I am interested in the userID, which is embedded in the XML tags and in the format of (lasfir1) as in the XML examples below:
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>lasfir1</string>
</row>
<row>
<integer>1536</integer>
<string>lasfir1</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>lisfar1</string>
</row>
The requirement is to convert the string "lasfir1" only into its equivalent Email ID, which are available in another CSV (text) file which has key->value pairing of the userID and Email ID:
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
The XML may not always be the same, but the string will be the one to search for, not the pattern of what is ahead or behind it.
Is there some simple way to read the key->value pair (in the CSV file), check if the key (userID) exists in the XML file and then replace it with the 'value' (Email ID)
This is required for a set of 300+ userID and Email ID combinations, all of which might not be in the XML.
python regex xml perl csv
I have a dump from our application server which contains XML of multiple strings. I am interested in the userID, which is embedded in the XML tags and in the format of (lasfir1) as in the XML examples below:
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>lasfir1</string>
</row>
<row>
<integer>1536</integer>
<string>lasfir1</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>lisfar1</string>
</row>
The requirement is to convert the string "lasfir1" only into its equivalent Email ID, which are available in another CSV (text) file which has key->value pairing of the userID and Email ID:
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
The XML may not always be the same, but the string will be the one to search for, not the pattern of what is ahead or behind it.
Is there some simple way to read the key->value pair (in the CSV file), check if the key (userID) exists in the XML file and then replace it with the 'value' (Email ID)
This is required for a set of 300+ userID and Email ID combinations, all of which might not be in the XML.
python regex xml perl csv
python regex xml perl csv
edited Nov 20 at 22:09
asked Nov 20 at 5:41
gagneet
12.2k256194
12.2k256194
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36
add a comment |
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
Check out this Perl one liner solution:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
If you want edit only between tags, then
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}</string>/<string>$kv{$y}</string>/gm; } print "$1$xml$3n"; } exit } '
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
add a comment |
up vote
0
down vote
Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. The command is:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded.
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
The basic script is:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
I am trying to add threads and modify it accordingly to optimize the running of it. If anyone has a better way then please do inform.
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Check out this Perl one liner solution:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
If you want edit only between tags, then
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}</string>/<string>$kv{$y}</string>/gm; } print "$1$xml$3n"; } exit } '
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
add a comment |
up vote
1
down vote
Check out this Perl one liner solution:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
If you want edit only between tags, then
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}</string>/<string>$kv{$y}</string>/gm; } print "$1$xml$3n"; } exit } '
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
add a comment |
up vote
1
down vote
up vote
1
down vote
Check out this Perl one liner solution:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
If you want edit only between tags, then
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}</string>/<string>$kv{$y}</string>/gm; } print "$1$xml$3n"; } exit } '
Check out this Perl one liner solution:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
If you want edit only between tags, then
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}</string>/<string>$kv{$y}</string>/gm; } print "$1$xml$3n"; } exit } '
edited Nov 21 at 6:08
answered Nov 21 at 5:57
stack0114106
1,6521416
1,6521416
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
add a comment |
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
when I run this script, it gives me an error sometimes: "The specified path is invalid." The command is: perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat users.csv) ; $content=qx(cat entities_email.xml);while($content=~/(<row>)(.*?)(</row>)/smg) { $xml=$2;foreach $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3n"; } exit }, and both the files are available in the folder in which I am running the command. This is on a Windows10 command line.
– gagneet
Nov 27 at 22:14
can you share the sample xml
– stack0114106
Nov 28 at 3:31
can you share the sample xml
– stack0114106
Nov 28 at 3:31
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
it is a 1.7GB file. :-(
– gagneet
Nov 30 at 3:45
add a comment |
up vote
0
down vote
Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. The command is:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded.
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
The basic script is:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
I am trying to add threads and modify it accordingly to optimize the running of it. If anyone has a better way then please do inform.
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
add a comment |
up vote
0
down vote
Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. The command is:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded.
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
The basic script is:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
I am trying to add threads and modify it accordingly to optimize the running of it. If anyone has a better way then please do inform.
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
add a comment |
up vote
0
down vote
up vote
0
down vote
Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. The command is:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded.
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
The basic script is:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
I am trying to add threads and modify it accordingly to optimize the running of it. If anyone has a better way then please do inform.
Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. The command is:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded.
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
The basic script is:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
I am trying to add threads and modify it accordingly to optimize the running of it. If anyone has a better way then please do inform.
answered Nov 21 at 4:33
gagneet
12.2k256194
12.2k256194
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
add a comment |
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
The script above just replaces the first instance of the "string", if there are more, then will need to modify it to run
– gagneet
Nov 30 at 3:44
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53386889%2freplace-multiple-strings-in-xml-using-a-key-value-pair-in-a-csv-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you looking for a programming language specific answer ? Please state the details of language, framework you are using for rest of the application?
– Rohit Nandi
Nov 20 at 7:18
I am not sure if this could be programming language specific, as there might be different solutions in different langs. I have tried with sed and perl, but it is a tedious process, as reading each value from the CSV and then searching in a 17M line file multiple times is very resource intensive.
– gagneet
Nov 20 at 22:09
there are 3 <string> tags from which we are not able to distinguish the id that you are mentioning.
– stack0114106
Nov 21 at 4:03
@stack0114106 the userID field can come up anywhere in the file, and as I have mentioned, there is no particular pattern that I have been able to use to search, except use the string from the CSV file to find the string in the XML file.
– gagneet
Nov 21 at 4:36