Why does Keras halt at the first epoch when I attempt to train it using fit

Why does Keras halt at the first epoch when I attempt to train it using fit_generator?

I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:

# Create the model

model = models.Sequential()



# Add the vgg convolutional base model

model.add(vgg_conv)



# Add new layers

model.add(layers.Flatten())

model.add(layers.Dense(1024, activation='relu'))

model.add(layers.Dropout(0.5))

model.add(layers.Dense(5, activation='softmax'))



# Show a summary of the model. Check the number of trainable params

model.summary()

from keras.preprocessing.image import ImageDataGenerator



train_datagen = ImageDataGenerator(

    rescale=1./255,

    rotation_range=20,

    width_shift_range=0.2,

    height_shift_range=0.2,

    horizontal_flip=True,

    fill_mode='nearest')



validation_datagen = ImageDataGenerator(rescale=1./255)



#Change the batchsize according to the system RAM

train_batchsize = 100

val_batchsize = 10



train_dir='training_data/train'

validation_dir='training_data/validation'



train_generator = train_datagen.flow_from_directory(

    train_dir,

    target_size=(image_size1, image_size2),

    batch_size=train_batchsize,

    class_mode='categorical')



validation_generator = validation_datagen.flow_from_directory(

    validation_dir,

    target_size=(image_size1, image_size2),

    batch_size=val_batchsize,

    class_mode='categorical',

    shuffle=False)



# Compile the model

model.compile(loss='categorical_crossentropy',

              optimizer=optimizers.RMSprop(lr=1e-4),

              metrics=['acc'])



# Train the model

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples/train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator,

    validation_steps=validation_generator.samples/validation_generator.batch_size,

    verbose=1)

The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.

Layer (type)                 Output Shape              Param #

=================================================================

vgg16 (Model)                (None, 15, 20, 512)       14714688

_________________________________________________________________

flatten_1 (Flatten)          (None, 153600)            0

_________________________________________________________________

dense_1 (Dense)              (None, 1024)              157287424

_________________________________________________________________

dropout_1 (Dropout)          (None, 1024)              0

_________________________________________________________________

dense_2 (Dense)              (None, 5)                 5125

=================================================================

Total params: 172,007,237

Trainable params: 164,371,973

Non-trainable params: 7,635,264

_________________________________________________________________

Found 1989 images belonging to 5 classes.

Found 819 images belonging to 5 classes.

Epoch 1/30

This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).

How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05

add a comment |

I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:

# Create the model

model = models.Sequential()



# Add the vgg convolutional base model

model.add(vgg_conv)



# Add new layers

model.add(layers.Flatten())

model.add(layers.Dense(1024, activation='relu'))

model.add(layers.Dropout(0.5))

model.add(layers.Dense(5, activation='softmax'))



# Show a summary of the model. Check the number of trainable params

model.summary()

from keras.preprocessing.image import ImageDataGenerator



train_datagen = ImageDataGenerator(

    rescale=1./255,

    rotation_range=20,

    width_shift_range=0.2,

    height_shift_range=0.2,

    horizontal_flip=True,

    fill_mode='nearest')



validation_datagen = ImageDataGenerator(rescale=1./255)



#Change the batchsize according to the system RAM

train_batchsize = 100

val_batchsize = 10



train_dir='training_data/train'

validation_dir='training_data/validation'



train_generator = train_datagen.flow_from_directory(

    train_dir,

    target_size=(image_size1, image_size2),

    batch_size=train_batchsize,

    class_mode='categorical')



validation_generator = validation_datagen.flow_from_directory(

    validation_dir,

    target_size=(image_size1, image_size2),

    batch_size=val_batchsize,

    class_mode='categorical',

    shuffle=False)



# Compile the model

model.compile(loss='categorical_crossentropy',

              optimizer=optimizers.RMSprop(lr=1e-4),

              metrics=['acc'])



# Train the model

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples/train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator,

    validation_steps=validation_generator.samples/validation_generator.batch_size,

    verbose=1)

The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.

Layer (type)                 Output Shape              Param #

=================================================================

vgg16 (Model)                (None, 15, 20, 512)       14714688

_________________________________________________________________

flatten_1 (Flatten)          (None, 153600)            0

_________________________________________________________________

dense_1 (Dense)              (None, 1024)              157287424

_________________________________________________________________

dropout_1 (Dropout)          (None, 1024)              0

_________________________________________________________________

dense_2 (Dense)              (None, 5)                 5125

=================================================================

Total params: 172,007,237

Trainable params: 164,371,973

Non-trainable params: 7,635,264

_________________________________________________________________

Found 1989 images belonging to 5 classes.

Found 819 images belonging to 5 classes.

Epoch 1/30

How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05

add a comment |

I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:

# Create the model

model = models.Sequential()



# Add the vgg convolutional base model

model.add(vgg_conv)



# Add new layers

model.add(layers.Flatten())

model.add(layers.Dense(1024, activation='relu'))

model.add(layers.Dropout(0.5))

model.add(layers.Dense(5, activation='softmax'))



# Show a summary of the model. Check the number of trainable params

model.summary()

from keras.preprocessing.image import ImageDataGenerator



train_datagen = ImageDataGenerator(

    rescale=1./255,

    rotation_range=20,

    width_shift_range=0.2,

    height_shift_range=0.2,

    horizontal_flip=True,

    fill_mode='nearest')



validation_datagen = ImageDataGenerator(rescale=1./255)



#Change the batchsize according to the system RAM

train_batchsize = 100

val_batchsize = 10



train_dir='training_data/train'

validation_dir='training_data/validation'



train_generator = train_datagen.flow_from_directory(

    train_dir,

    target_size=(image_size1, image_size2),

    batch_size=train_batchsize,

    class_mode='categorical')



validation_generator = validation_datagen.flow_from_directory(

    validation_dir,

    target_size=(image_size1, image_size2),

    batch_size=val_batchsize,

    class_mode='categorical',

    shuffle=False)



# Compile the model

model.compile(loss='categorical_crossentropy',

              optimizer=optimizers.RMSprop(lr=1e-4),

              metrics=['acc'])



# Train the model

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples/train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator,

    validation_steps=validation_generator.samples/validation_generator.batch_size,

    verbose=1)

The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.

Layer (type)                 Output Shape              Param #

=================================================================

vgg16 (Model)                (None, 15, 20, 512)       14714688

_________________________________________________________________

flatten_1 (Flatten)          (None, 153600)            0

_________________________________________________________________

dense_1 (Dense)              (None, 1024)              157287424

_________________________________________________________________

dropout_1 (Dropout)          (None, 1024)              0

_________________________________________________________________

dense_2 (Dense)              (None, 5)                 5125

=================================================================

Total params: 172,007,237

Trainable params: 164,371,973

Non-trainable params: 7,635,264

_________________________________________________________________

Found 1989 images belonging to 5 classes.

Found 819 images belonging to 5 classes.

Epoch 1/30

How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:

# Create the model

model = models.Sequential()



# Add the vgg convolutional base model

model.add(vgg_conv)



# Add new layers

model.add(layers.Flatten())

model.add(layers.Dense(1024, activation='relu'))

model.add(layers.Dropout(0.5))

model.add(layers.Dense(5, activation='softmax'))



# Show a summary of the model. Check the number of trainable params

model.summary()

from keras.preprocessing.image import ImageDataGenerator



train_datagen = ImageDataGenerator(

    rescale=1./255,

    rotation_range=20,

    width_shift_range=0.2,

    height_shift_range=0.2,

    horizontal_flip=True,

    fill_mode='nearest')



validation_datagen = ImageDataGenerator(rescale=1./255)



#Change the batchsize according to the system RAM

train_batchsize = 100

val_batchsize = 10



train_dir='training_data/train'

validation_dir='training_data/validation'



train_generator = train_datagen.flow_from_directory(

    train_dir,

    target_size=(image_size1, image_size2),

    batch_size=train_batchsize,

    class_mode='categorical')



validation_generator = validation_datagen.flow_from_directory(

    validation_dir,

    target_size=(image_size1, image_size2),

    batch_size=val_batchsize,

    class_mode='categorical',

    shuffle=False)



# Compile the model

model.compile(loss='categorical_crossentropy',

              optimizer=optimizers.RMSprop(lr=1e-4),

              metrics=['acc'])



# Train the model

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples/train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator,

    validation_steps=validation_generator.samples/validation_generator.batch_size,

    verbose=1)

The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.

Layer (type)                 Output Shape              Param #

=================================================================

vgg16 (Model)                (None, 15, 20, 512)       14714688

_________________________________________________________________

flatten_1 (Flatten)          (None, 153600)            0

_________________________________________________________________

dense_1 (Dense)              (None, 1024)              157287424

_________________________________________________________________

dropout_1 (Dropout)          (None, 1024)              0

_________________________________________________________________

dense_2 (Dense)              (None, 5)                 5125

=================================================================

Total params: 172,007,237

Trainable params: 164,371,973

Non-trainable params: 7,635,264

_________________________________________________________________

Found 1989 images belonging to 5 classes.

Found 819 images belonging to 5 classes.

Epoch 1/30

How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?

python tensorflow keras

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

asked Nov 25 '18 at 20:52

sjgandhi2312

4817

round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05

add a comment |

round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05

round() or int() steps_per_epoch=train_generator.samples/train_generator.batch_size and validation_generator.samples/validation_generator.batch_size

– Geeocode
Nov 25 '18 at 21:05

add a comment |

1 Answer
1

active

oldest

votes

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples//train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,

    verbose=1)

The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize.
You should use images with lower resolution, note that in many case we can get better results with them.

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

1

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

1

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471855%2fwhy-does-keras-halt-at-the-first-epoch-when-i-attempt-to-train-it-using-fit-gene%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples//train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,

    verbose=1)

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

1

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

1

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

|
show 4 more comments

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples//train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,

    verbose=1)

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

1

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

1

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

|
show 4 more comments

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples//train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,

    verbose=1)

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters.
So you can use as follows:

history = model.fit_generator(

    train_generator,

    steps_per_epoch=train_generator.samples//train_generator.batch_size,

    epochs=30,

    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,

    verbose=1)

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

edited Nov 27 '18 at 22:51

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

answered Nov 25 '18 at 21:13

Geeocode

2,3801920

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

1

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

1

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

|
show 4 more comments

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

1

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

1

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

I tried doing that but it didn’t work unfortunately. Might the issue be with train_generator?

– sjgandhi2312
Nov 25 '18 at 23:56

@sjgandhi2312 I don't think the issue is the traingenerator, as I worked with it a lot with no problem. If you give me all the minimum access some test data and all modules you use here, I will solve the puzzle, if you want.

– Geeocode
Nov 26 '18 at 2:40

That would be fantastic! How should I send you the code and the data?

– sjgandhi2312
Nov 26 '18 at 4:48

@sjgandhi2312 There are a lot of cloud possibilities, like google drive, github, dropbox etc. and others and you can share it via a link with me.

– Geeocode
Nov 26 '18 at 10:00

@sjgandhi2312 OK I check it and will be back soon.

– Geeocode
Nov 27 '18 at 14:33

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk