How to count matches in tokoneized dataframe

up vote
1
down vote

favorite

I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:

grwoth = ['growth', 'grow', 'growing', 'grows']

syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]

I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.

I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:

%%time



    growth = ['growth', 'grow', 'growing', 'grows']

    synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

    intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

    customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

    technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

    human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]

    the = 'Wire'



    result_list=

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

count_intagibles = 0

count_synergies = 0

count_the = 0

for file in file_list:

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())









    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

The problem here was, that it creates a total sum but not a sum for each row as it does for length.

Thanks in advance for any solutions!

asked Nov 20 at 16:43

user10395806

356

1

What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49

at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52

add a comment |

up vote
1
down vote

favorite

I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:

grwoth = ['growth', 'grow', 'growing', 'grows']

syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]

I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.

I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:

%%time



    growth = ['growth', 'grow', 'growing', 'grows']

    synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

    intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

    customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

    technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

    human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]

    the = 'Wire'



    result_list=

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

count_intagibles = 0

count_synergies = 0

count_the = 0

for file in file_list:

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())









    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

The problem here was, that it creates a total sum but not a sum for each row as it does for length.

Thanks in advance for any solutions!

asked Nov 20 at 16:43

user10395806

356

1

What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49

at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52

add a comment |

up vote
1
down vote

favorite

I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:

grwoth = ['growth', 'grow', 'growing', 'grows']

syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]

I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.

I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:

%%time



    growth = ['growth', 'grow', 'growing', 'grows']

    synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

    intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

    customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

    technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

    human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]

    the = 'Wire'



    result_list=

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

count_intagibles = 0

count_synergies = 0

count_the = 0

for file in file_list:

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())









    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

The problem here was, that it creates a total sum but not a sum for each row as it does for length.

Thanks in advance for any solutions!

asked Nov 20 at 16:43

user10395806

356

I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:

grwoth = ['growth', 'grow', 'growing', 'grows']

syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]

I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.

I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:

%%time



    growth = ['growth', 'grow', 'growing', 'grows']

    synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]

    intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']

    customers = ['customer', 'customers' ,'consumer' ,'consumers' ]

    technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']

    human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]

    the = 'Wire'



    result_list=

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

count_intagibles = 0

count_synergies = 0

count_the = 0

for file in file_list:

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())









    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

The problem here was, that it creates a total sum but not a sum for each row as it does for length.

Thanks in advance for any solutions!

python python-3.x pandas

asked Nov 20 at 16:43

user10395806

356

asked Nov 20 at 16:43

user10395806

356

asked Nov 20 at 16:43

user10395806

356

asked Nov 20 at 16:43

user10395806

356

asked Nov 20 at 16:43

user10395806

356

1

What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49

at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52

add a comment |

1

What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49

at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52

What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 at 16:49

at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 at 16:52

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.

I hope I understood correctly what you wanted to do.

Code below:

for file in file_list:

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

    count_intagibles = 0

    count_synergies = 0

    count_the = 0

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())    

    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

1

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.

I hope I understood correctly what you wanted to do.

Code below:

for file in file_list:

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

    count_intagibles = 0

    count_synergies = 0

    count_the = 0

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())    

    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

1

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

add a comment |

up vote
1
down vote

accepted

You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.

I hope I understood correctly what you wanted to do.

Code below:

for file in file_list:

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

    count_intagibles = 0

    count_synergies = 0

    count_the = 0

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())    

    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

1

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

add a comment |

up vote
1
down vote

accepted

You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.

I hope I understood correctly what you wanted to do.

Code below:

for file in file_list:

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

    count_intagibles = 0

    count_synergies = 0

    count_the = 0

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())    

    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.

I hope I understood correctly what you wanted to do.

Code below:

for file in file_list:

    count_growth = 0

    count_human = 0

    count_technology= 0

    count_customers = 0

    count_intagibles = 0

    count_synergies = 0

    count_the = 0

    name = file[len(input_path):]

    date = name[11:17]

    type_1 = name[17:20]

    with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:

            # We need to encode/decode as some text files are not in utf-8 format

            text = rfile.read()

            text = text.encode('utf-8', 'ignore')

            text = text.decode('utf-8', 'ignore')



    for word in text.split():

        if word in growth:

            count_growth = count_growth +1

        if word in synergies:

            count_synergies = count_synergies +1 

        if word in intagibles:

            count_intagibles = count_intagibles+1

        if word in customers:

            count_customers = count_customers +1

        if word in technology:

            count_technology = count_technology +1

        if word in human:

            count_human = count_human +1

        if word == 'The':

            count_the = count_the +1

    length = len(text.split())    

    a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}

    result_list.append(a)

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

edited Nov 20 at 16:59

answered Nov 20 at 16:55

FMarazzi

318213

answered Nov 20 at 16:55

FMarazzi

318213

answered Nov 20 at 16:55

FMarazzi

318213

1

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

add a comment |

1

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 at 16:58

I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 at 17:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk