Why does Keras halt at the first epoch when I attempt to train it using fit_generator?












0















I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:



# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(vgg_conv)

# Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(5, activation='softmax'))

# Show a summary of the model. Check the number of trainable params
model.summary()
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

#Change the batchsize according to the system RAM
train_batchsize = 100
val_batchsize = 10

train_dir='training_data/train'
validation_dir='training_data/validation'

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(image_size1, image_size2),
batch_size=train_batchsize,
class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(image_size1, image_size2),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=False)

# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])

# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)


The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.



Layer (type)                 Output Shape              Param #
=================================================================
vgg16 (Model) (None, 15, 20, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 153600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 157287424
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 5125
=================================================================
Total params: 172,007,237
Trainable params: 164,371,973
Non-trainable params: 7,635,264
_________________________________________________________________
Found 1989 images belonging to 5 classes.
Found 819 images belonging to 5 classes.
Epoch 1/30


This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).



How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?










share|improve this question























  • round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

    – Geeocode
    Nov 25 '18 at 21:05
















0















I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:



# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(vgg_conv)

# Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(5, activation='softmax'))

# Show a summary of the model. Check the number of trainable params
model.summary()
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

#Change the batchsize according to the system RAM
train_batchsize = 100
val_batchsize = 10

train_dir='training_data/train'
validation_dir='training_data/validation'

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(image_size1, image_size2),
batch_size=train_batchsize,
class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(image_size1, image_size2),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=False)

# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])

# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)


The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.



Layer (type)                 Output Shape              Param #
=================================================================
vgg16 (Model) (None, 15, 20, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 153600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 157287424
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 5125
=================================================================
Total params: 172,007,237
Trainable params: 164,371,973
Non-trainable params: 7,635,264
_________________________________________________________________
Found 1989 images belonging to 5 classes.
Found 819 images belonging to 5 classes.
Epoch 1/30


This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).



How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?










share|improve this question























  • round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

    – Geeocode
    Nov 25 '18 at 21:05














0












0








0


1






I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:



# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(vgg_conv)

# Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(5, activation='softmax'))

# Show a summary of the model. Check the number of trainable params
model.summary()
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

#Change the batchsize according to the system RAM
train_batchsize = 100
val_batchsize = 10

train_dir='training_data/train'
validation_dir='training_data/validation'

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(image_size1, image_size2),
batch_size=train_batchsize,
class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(image_size1, image_size2),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=False)

# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])

# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)


The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.



Layer (type)                 Output Shape              Param #
=================================================================
vgg16 (Model) (None, 15, 20, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 153600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 157287424
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 5125
=================================================================
Total params: 172,007,237
Trainable params: 164,371,973
Non-trainable params: 7,635,264
_________________________________________________________________
Found 1989 images belonging to 5 classes.
Found 819 images belonging to 5 classes.
Epoch 1/30


This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).



How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?










share|improve this question














I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:



# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(vgg_conv)

# Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(5, activation='softmax'))

# Show a summary of the model. Check the number of trainable params
model.summary()
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

#Change the batchsize according to the system RAM
train_batchsize = 100
val_batchsize = 10

train_dir='training_data/train'
validation_dir='training_data/validation'

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(image_size1, image_size2),
batch_size=train_batchsize,
class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(image_size1, image_size2),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=False)

# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])

# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)


The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.



Layer (type)                 Output Shape              Param #
=================================================================
vgg16 (Model) (None, 15, 20, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 153600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 157287424
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 5125
=================================================================
Total params: 172,007,237
Trainable params: 164,371,973
Non-trainable params: 7,635,264
_________________________________________________________________
Found 1989 images belonging to 5 classes.
Found 819 images belonging to 5 classes.
Epoch 1/30


This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).



How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?







python tensorflow keras






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 20:52









sjgandhi2312sjgandhi2312

4817




4817













  • round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

    – Geeocode
    Nov 25 '18 at 21:05



















  • round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

    – Geeocode
    Nov 25 '18 at 21:05

















round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05





round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05












1 Answer
1






active

oldest

votes


















1














You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:



history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=30,
validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
verbose=1)


The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.






share|improve this answer


























  • I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

    – sjgandhi2312
    Nov 25 '18 at 23:56






  • 1





    @sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

    – Geeocode
    Nov 26 '18 at 2:40











  • That would be fantastic! How should I send you the code and the data?

    – sjgandhi2312
    Nov 26 '18 at 4:48











  • @sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

    – Geeocode
    Nov 26 '18 at 10:00








  • 1





    @sjgandhi2312 OK I check it and will be back soon.

    – Geeocode
    Nov 27 '18 at 14:33











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471855%2fwhy-does-keras-halt-at-the-first-epoch-when-i-attempt-to-train-it-using-fit-gene%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:



history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=30,
validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
verbose=1)


The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.






share|improve this answer


























  • I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

    – sjgandhi2312
    Nov 25 '18 at 23:56






  • 1





    @sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

    – Geeocode
    Nov 26 '18 at 2:40











  • That would be fantastic! How should I send you the code and the data?

    – sjgandhi2312
    Nov 26 '18 at 4:48











  • @sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

    – Geeocode
    Nov 26 '18 at 10:00








  • 1





    @sjgandhi2312 OK I check it and will be back soon.

    – Geeocode
    Nov 27 '18 at 14:33
















1














You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:



history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=30,
validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
verbose=1)


The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.






share|improve this answer


























  • I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

    – sjgandhi2312
    Nov 25 '18 at 23:56






  • 1





    @sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

    – Geeocode
    Nov 26 '18 at 2:40











  • That would be fantastic! How should I send you the code and the data?

    – sjgandhi2312
    Nov 26 '18 at 4:48











  • @sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

    – Geeocode
    Nov 26 '18 at 10:00








  • 1





    @sjgandhi2312 OK I check it and will be back soon.

    – Geeocode
    Nov 27 '18 at 14:33














1












1








1







You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:



history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=30,
validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
verbose=1)


The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.






share|improve this answer















You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:



history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=30,
validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
verbose=1)


The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 27 '18 at 22:51

























answered Nov 25 '18 at 21:13









GeeocodeGeeocode

2,3801920




2,3801920













  • I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

    – sjgandhi2312
    Nov 25 '18 at 23:56






  • 1





    @sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

    – Geeocode
    Nov 26 '18 at 2:40











  • That would be fantastic! How should I send you the code and the data?

    – sjgandhi2312
    Nov 26 '18 at 4:48











  • @sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

    – Geeocode
    Nov 26 '18 at 10:00








  • 1





    @sjgandhi2312 OK I check it and will be back soon.

    – Geeocode
    Nov 27 '18 at 14:33



















  • I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

    – sjgandhi2312
    Nov 25 '18 at 23:56






  • 1





    @sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

    – Geeocode
    Nov 26 '18 at 2:40











  • That would be fantastic! How should I send you the code and the data?

    – sjgandhi2312
    Nov 26 '18 at 4:48











  • @sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

    – Geeocode
    Nov 26 '18 at 10:00








  • 1





    @sjgandhi2312 OK I check it and will be back soon.

    – Geeocode
    Nov 27 '18 at 14:33

















I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56





I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56




1




1





@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40





@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40













That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48





That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48













@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00







@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00






1




1





@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33





@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471855%2fwhy-does-keras-halt-at-the-first-epoch-when-i-attempt-to-train-it-using-fit-gene%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

Refactoring coordinates for Minecraft Pi buildings written in Python