How to count matches in tokoneized dataframe











up vote
1
down vote

favorite












I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question


















  • 1




    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
    – FMarazzi
    Nov 20 at 16:49










  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
    – user10395806
    Nov 20 at 16:52

















up vote
1
down vote

favorite












I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question


















  • 1




    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
    – FMarazzi
    Nov 20 at 16:49










  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
    – user10395806
    Nov 20 at 16:52















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question













I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!







python python-3.x pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 at 16:43









user10395806

356




356








  • 1




    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
    – FMarazzi
    Nov 20 at 16:49










  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
    – user10395806
    Nov 20 at 16:52
















  • 1




    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
    – FMarazzi
    Nov 20 at 16:49










  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
    – user10395806
    Nov 20 at 16:52










1




1




What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49




What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49












at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52






at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52














1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer



















  • 1




    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
    – user10395806
    Nov 20 at 16:58










  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
    – FMarazzi
    Nov 20 at 17:01













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer



















  • 1




    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
    – user10395806
    Nov 20 at 16:58










  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
    – FMarazzi
    Nov 20 at 17:01

















up vote
1
down vote



accepted










You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer



















  • 1




    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
    – user10395806
    Nov 20 at 16:58










  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
    – FMarazzi
    Nov 20 at 17:01















up vote
1
down vote



accepted







up vote
1
down vote



accepted






You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer














You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 16:59

























answered Nov 20 at 16:55









FMarazzi

318213




318213








  • 1




    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
    – user10395806
    Nov 20 at 16:58










  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
    – FMarazzi
    Nov 20 at 17:01
















  • 1




    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
    – user10395806
    Nov 20 at 16:58










  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
    – FMarazzi
    Nov 20 at 17:01










1




1




Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58




Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58












I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01






I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'