How to convert token list into wordnet lemma list using nltk?












-1














I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this:



['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...]


There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age', 'remembers', 'heard' etc. the token list is suppose to look like:



['age', 'remember', 'hear', ...]


I am checking the synonyms through this code:



syns = wn.synsets("heard")
print(syns[0].lemmas()[0].name())


At this point I have created the function clean_text() in python for preprocessing. That looks like:



def clean_text(text):
# Eliminating punctuations
text = "".join([word for word in text if word not in string.punctuation])
# tokenizing
tokens = re.split("W+", text)
# lemmatizing and removing stopwords
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
# converting token list into synset
syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
return text


I am getting the error :



syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
AttributeError: 'list' object has no attribute 'lower'


How to get the token list for each lemma?



The full code:



import string
import re
from wordcloud import WordCloud
import nltk
from nltk.tokenize.treebank import TreebankWordDetokenizer
from nltk.corpus import wordnet
import PyPDF4
import matplotlib
import numpy as np
from PIL import Image

stopwords = nltk.corpus.stopwords.words('english')
moreStopwords = ['clin97803078874365pallr1indd'] # additional stopwords to be removed manually.
wn = nltk.WordNetLemmatizer()

data = PyPDF4.PdfFileReader(open('ReadyPlayerOne.pdf', 'rb'))
pageData = ''
for page in data.pages:
pageData += page.extractText()
# print(pageData)


def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
syns = [text.lemmas()[0].name() for text in wordnet.synsets(text)]
return syns


print(clean_text(pageData))









share|improve this question
























  • You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
    – BlueSheepToken
    Nov 21 '18 at 16:48










  • Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
    – Tony
    Nov 21 '18 at 16:52












  • Nice, you should edit your post, I might not be able to help you there.
    – BlueSheepToken
    Nov 21 '18 at 16:54






  • 1




    Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
    – Matt Messersmith
    Nov 21 '18 at 16:56












  • Done, thanks @Matt
    – Tony
    Nov 21 '18 at 17:04
















-1














I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this:



['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...]


There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age', 'remembers', 'heard' etc. the token list is suppose to look like:



['age', 'remember', 'hear', ...]


I am checking the synonyms through this code:



syns = wn.synsets("heard")
print(syns[0].lemmas()[0].name())


At this point I have created the function clean_text() in python for preprocessing. That looks like:



def clean_text(text):
# Eliminating punctuations
text = "".join([word for word in text if word not in string.punctuation])
# tokenizing
tokens = re.split("W+", text)
# lemmatizing and removing stopwords
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
# converting token list into synset
syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
return text


I am getting the error :



syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
AttributeError: 'list' object has no attribute 'lower'


How to get the token list for each lemma?



The full code:



import string
import re
from wordcloud import WordCloud
import nltk
from nltk.tokenize.treebank import TreebankWordDetokenizer
from nltk.corpus import wordnet
import PyPDF4
import matplotlib
import numpy as np
from PIL import Image

stopwords = nltk.corpus.stopwords.words('english')
moreStopwords = ['clin97803078874365pallr1indd'] # additional stopwords to be removed manually.
wn = nltk.WordNetLemmatizer()

data = PyPDF4.PdfFileReader(open('ReadyPlayerOne.pdf', 'rb'))
pageData = ''
for page in data.pages:
pageData += page.extractText()
# print(pageData)


def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
syns = [text.lemmas()[0].name() for text in wordnet.synsets(text)]
return syns


print(clean_text(pageData))









share|improve this question
























  • You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
    – BlueSheepToken
    Nov 21 '18 at 16:48










  • Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
    – Tony
    Nov 21 '18 at 16:52












  • Nice, you should edit your post, I might not be able to help you there.
    – BlueSheepToken
    Nov 21 '18 at 16:54






  • 1




    Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
    – Matt Messersmith
    Nov 21 '18 at 16:56












  • Done, thanks @Matt
    – Tony
    Nov 21 '18 at 17:04














-1












-1








-1







I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this:



['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...]


There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age', 'remembers', 'heard' etc. the token list is suppose to look like:



['age', 'remember', 'hear', ...]


I am checking the synonyms through this code:



syns = wn.synsets("heard")
print(syns[0].lemmas()[0].name())


At this point I have created the function clean_text() in python for preprocessing. That looks like:



def clean_text(text):
# Eliminating punctuations
text = "".join([word for word in text if word not in string.punctuation])
# tokenizing
tokens = re.split("W+", text)
# lemmatizing and removing stopwords
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
# converting token list into synset
syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
return text


I am getting the error :



syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
AttributeError: 'list' object has no attribute 'lower'


How to get the token list for each lemma?



The full code:



import string
import re
from wordcloud import WordCloud
import nltk
from nltk.tokenize.treebank import TreebankWordDetokenizer
from nltk.corpus import wordnet
import PyPDF4
import matplotlib
import numpy as np
from PIL import Image

stopwords = nltk.corpus.stopwords.words('english')
moreStopwords = ['clin97803078874365pallr1indd'] # additional stopwords to be removed manually.
wn = nltk.WordNetLemmatizer()

data = PyPDF4.PdfFileReader(open('ReadyPlayerOne.pdf', 'rb'))
pageData = ''
for page in data.pages:
pageData += page.extractText()
# print(pageData)


def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
syns = [text.lemmas()[0].name() for text in wordnet.synsets(text)]
return syns


print(clean_text(pageData))









share|improve this question















I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this:



['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...]


There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age', 'remembers', 'heard' etc. the token list is suppose to look like:



['age', 'remember', 'hear', ...]


I am checking the synonyms through this code:



syns = wn.synsets("heard")
print(syns[0].lemmas()[0].name())


At this point I have created the function clean_text() in python for preprocessing. That looks like:



def clean_text(text):
# Eliminating punctuations
text = "".join([word for word in text if word not in string.punctuation])
# tokenizing
tokens = re.split("W+", text)
# lemmatizing and removing stopwords
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
# converting token list into synset
syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
return text


I am getting the error :



syns = [text.lemmas()[0].name() for text in wn.synsets(text)]
AttributeError: 'list' object has no attribute 'lower'


How to get the token list for each lemma?



The full code:



import string
import re
from wordcloud import WordCloud
import nltk
from nltk.tokenize.treebank import TreebankWordDetokenizer
from nltk.corpus import wordnet
import PyPDF4
import matplotlib
import numpy as np
from PIL import Image

stopwords = nltk.corpus.stopwords.words('english')
moreStopwords = ['clin97803078874365pallr1indd'] # additional stopwords to be removed manually.
wn = nltk.WordNetLemmatizer()

data = PyPDF4.PdfFileReader(open('ReadyPlayerOne.pdf', 'rb'))
pageData = ''
for page in data.pages:
pageData += page.extractText()
# print(pageData)


def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
syns = [text.lemmas()[0].name() for text in wordnet.synsets(text)]
return syns


print(clean_text(pageData))






python nltk wordnet






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 17:04

























asked Nov 21 '18 at 16:41









Tony

55




55












  • You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
    – BlueSheepToken
    Nov 21 '18 at 16:48










  • Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
    – Tony
    Nov 21 '18 at 16:52












  • Nice, you should edit your post, I might not be able to help you there.
    – BlueSheepToken
    Nov 21 '18 at 16:54






  • 1




    Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
    – Matt Messersmith
    Nov 21 '18 at 16:56












  • Done, thanks @Matt
    – Tony
    Nov 21 '18 at 17:04


















  • You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
    – BlueSheepToken
    Nov 21 '18 at 16:48










  • Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
    – Tony
    Nov 21 '18 at 16:52












  • Nice, you should edit your post, I might not be able to help you there.
    – BlueSheepToken
    Nov 21 '18 at 16:54






  • 1




    Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
    – Matt Messersmith
    Nov 21 '18 at 16:56












  • Done, thanks @Matt
    – Tony
    Nov 21 '18 at 17:04
















You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
– BlueSheepToken
Nov 21 '18 at 16:48




You should check your imports, in nltk, wordnet might refer to deferents objects, and some do not have synsets attributes
– BlueSheepToken
Nov 21 '18 at 16:48












Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
– Tony
Nov 21 '18 at 16:52






Oh, yes I was importing wordnet as wn and declaring the WordnetLematizer in wn variable too. But, now I am getting this error. AttributeError: 'list' object has no attribute 'lower'
– Tony
Nov 21 '18 at 16:52














Nice, you should edit your post, I might not be able to help you there.
– BlueSheepToken
Nov 21 '18 at 16:54




Nice, you should edit your post, I might not be able to help you there.
– BlueSheepToken
Nov 21 '18 at 16:54




1




1




Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
– Matt Messersmith
Nov 21 '18 at 16:56






Hi Tony, it's best to create a Minimal, Complete, and Verifiable example. See here: stackoverflow.com/help/mcve. This means we should be able to copy/paste your code and run it in a REPL so that we can confirm the error you're seeing, and (ideally) point you in the right direction for a fix. Good luck!
– Matt Messersmith
Nov 21 '18 at 16:56














Done, thanks @Matt
– Tony
Nov 21 '18 at 17:04




Done, thanks @Matt
– Tony
Nov 21 '18 at 17:04












1 Answer
1






active

oldest

votes


















0














You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word.
The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').



Below there is a functional version of clean_text with a fix of this problem:



import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
lemmas =
for token in text:
lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
return lemmas


text = "The grass was greener."

print(clean_text(text))


Returns:



['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']





share|improve this answer





















  • Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
    – Tony
    Nov 21 '18 at 17:37










  • @Tony sure, use a set instead of a list.
    – Matt Messersmith
    Nov 21 '18 at 17:49










  • @Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
    – Julian Peller
    Nov 21 '18 at 17:57











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416780%2fhow-to-convert-token-list-into-wordnet-lemma-list-using-nltk%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word.
The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').



Below there is a functional version of clean_text with a fix of this problem:



import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
lemmas =
for token in text:
lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
return lemmas


text = "The grass was greener."

print(clean_text(text))


Returns:



['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']





share|improve this answer





















  • Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
    – Tony
    Nov 21 '18 at 17:37










  • @Tony sure, use a set instead of a list.
    – Matt Messersmith
    Nov 21 '18 at 17:49










  • @Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
    – Julian Peller
    Nov 21 '18 at 17:57
















0














You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word.
The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').



Below there is a functional version of clean_text with a fix of this problem:



import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
lemmas =
for token in text:
lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
return lemmas


text = "The grass was greener."

print(clean_text(text))


Returns:



['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']





share|improve this answer





















  • Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
    – Tony
    Nov 21 '18 at 17:37










  • @Tony sure, use a set instead of a list.
    – Matt Messersmith
    Nov 21 '18 at 17:49










  • @Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
    – Julian Peller
    Nov 21 '18 at 17:57














0












0








0






You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word.
The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').



Below there is a functional version of clean_text with a fix of this problem:



import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
lemmas =
for token in text:
lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
return lemmas


text = "The grass was greener."

print(clean_text(text))


Returns:



['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']





share|improve this answer












You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word.
The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').



Below there is a functional version of clean_text with a fix of this problem:



import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
text = "".join([word for word in text if word not in string.punctuation])
tokens = re.split("W+", text)
text = [wn.lemmatize(word) for word in tokens if word not in stopwords]
lemmas =
for token in text:
lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
return lemmas


text = "The grass was greener."

print(clean_text(text))


Returns:



['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 17:24









Julian Peller

864511




864511












  • Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
    – Tony
    Nov 21 '18 at 17:37










  • @Tony sure, use a set instead of a list.
    – Matt Messersmith
    Nov 21 '18 at 17:49










  • @Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
    – Julian Peller
    Nov 21 '18 at 17:57


















  • Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
    – Tony
    Nov 21 '18 at 17:37










  • @Tony sure, use a set instead of a list.
    – Matt Messersmith
    Nov 21 '18 at 17:49










  • @Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
    – Julian Peller
    Nov 21 '18 at 17:57
















Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
– Tony
Nov 21 '18 at 17:37




Hey Julian, is there a way to avoid the repition of words like grass is being repeated so many times.
– Tony
Nov 21 '18 at 17:37












@Tony sure, use a set instead of a list.
– Matt Messersmith
Nov 21 '18 at 17:49




@Tony sure, use a set instead of a list.
– Matt Messersmith
Nov 21 '18 at 17:49












@Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
– Julian Peller
Nov 21 '18 at 17:57




@Tony, you can use set as @Matt Messersmith suggested to remove duplicates of your final list. Replace return lemmas with return list(set(lemmas)).
– Julian Peller
Nov 21 '18 at 17:57


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416780%2fhow-to-convert-token-list-into-wordnet-lemma-list-using-nltk%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'