[Python] ValueError: Unexpected result of `train_function`

Stack · Setembro 28, 2024 às 12:43

I'm training a model to classify CAPTCHA labels where I'm facing the following problem during model.Fit():

python3 train.py --width 128 --height 64 --length 4 --symbols symbols.txt --batch-size 32 --epochs 1 --output-model test --train-dataset training_data --validate-dataset validation_data

Length of captcha symbols 36
Metal device set to: Apple M3

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2024-09-28 15:26:54.529594: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-09-28 15:26:54.529683: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
input_shape: (64, 128, 3)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 64, 128, 3) 0 []
]

conv2d (Conv2D) (None, 64, 128, 32) 896 ['input_1[0][0]']

batch_normalization (BatchNorm (None, 64, 128, 32) 128 ['conv2d[0][0]']
alization)

activation (Activation) (None, 64, 128, 32) 0 ['batch_normalization[0][0]']

conv2d_1 (Conv2D) (None, 64, 128, 32) 9248 ['activation[0][0]']

batch_normalization_1 (BatchNo (None, 64, 128, 32) 128 ['conv2d_1[0][0]']
rmalization)

activation_1 (Activation) (None, 64, 128, 32) 0 ['batch_normalization_1[0][0]']

max_pooling2d (MaxPooling2D) (None, 32, 64, 32) 0 ['activation_1[0][0]']

conv2d_2 (Conv2D) (None, 32, 64, 64) 18496 ['max_pooling2d[0][0]']

batch_normalization_2 (BatchNo (None, 32, 64, 64) 256 ['conv2d_2[0][0]']
rmalization)

activation_2 (Activation) (None, 32, 64, 64) 0 ['batch_normalization_2[0][0]']

conv2d_3 (Conv2D) (None, 32, 64, 64) 36928 ['activation_2[0][0]']

batch_normalization_3 (BatchNo (None, 32, 64, 64) 256 ['conv2d_3[0][0]']
rmalization)

activation_3 (Activation) (None, 32, 64, 64) 0 ['batch_normalization_3[0][0]']

max_pooling2d_1 (MaxPooling2D) (None, 16, 32, 64) 0 ['activation_3[0][0]']

conv2d_4 (Conv2D) (None, 16, 32, 128) 73856 ['max_pooling2d_1[0][0]']

batch_normalization_4 (BatchNo (None, 16, 32, 128) 512 ['conv2d_4[0][0]']
rmalization)

activation_4 (Activation) (None, 16, 32, 128) 0 ['batch_normalization_4[0][0]']

conv2d_5 (Conv2D) (None, 16, 32, 128) 147584 ['activation_4[0][0]']

batch_normalization_5 (BatchNo (None, 16, 32, 128) 512 ['conv2d_5[0][0]']
rmalization)

activation_5 (Activation) (None, 16, 32, 128) 0 ['batch_normalization_5[0][0]']

max_pooling2d_2 (MaxPooling2D) (None, 8, 16, 128) 0 ['activation_5[0][0]']

conv2d_6 (Conv2D) (None, 8, 16, 256) 295168 ['max_pooling2d_2[0][0]']

batch_normalization_6 (BatchNo (None, 8, 16, 256) 1024 ['conv2d_6[0][0]']
rmalization)

activation_6 (Activation) (None, 8, 16, 256) 0 ['batch_normalization_6[0][0]']

conv2d_7 (Conv2D) (None, 8, 16, 256) 590080 ['activation_6[0][0]']

batch_normalization_7 (BatchNo (None, 8, 16, 256) 1024 ['conv2d_7[0][0]']
rmalization)

activation_7 (Activation) (None, 8, 16, 256) 0 ['batch_normalization_7[0][0]']

max_pooling2d_3 (MaxPooling2D) (None, 4, 8, 256) 0 ['activation_7[0][0]']

conv2d_8 (Conv2D) (None, 4, 8, 256) 590080 ['max_pooling2d_3[0][0]']

batch_normalization_8 (BatchNo (None, 4, 8, 256) 1024 ['conv2d_8[0][0]']
rmalization)

activation_8 (Activation) (None, 4, 8, 256) 0 ['batch_normalization_8[0][0]']

conv2d_9 (Conv2D) (None, 4, 8, 256) 590080 ['activation_8[0][0]']

batch_normalization_9 (BatchNo (None, 4, 8, 256) 1024 ['conv2d_9[0][0]']
rmalization)

activation_9 (Activation) (None, 4, 8, 256) 0 ['batch_normalization_9[0][0]']

max_pooling2d_4 (MaxPooling2D) (None, 2, 4, 256) 0 ['activation_9[0][0]']

flatten (Flatten) (None, 2048) 0 ['max_pooling2d_4[0][0]']

char_1 (Dense) (None, 36) 73764 ['flatten[0][0]']

char_2 (Dense) (None, 36) 73764 ['flatten[0][0]']

char_3 (Dense) (None, 36) 73764 ['flatten[0][0]']

char_4 (Dense) (None, 36) 73764 ['flatten[0][0]']

==================================================================================================
Total params: 2,653,360
Trainable params: 2,650,416
Non-trainable params: 2,944
__________________________________________________________________________________________________
Batch X shape: (32, 64, 128, 3)
Batch y shape: [(32, 36), (32, 36), (32, 36), (32, 36)]
Count 0 list(self.files.keys()) value ['N5R5', 'IZJO', 'I8CB', 'NZ9O']
IZJO
Count 1 list(self.files.keys()) value ['N5R5', 'I8CB', 'NZ9O']
N5R5
Count 2 list(self.files.keys()) value ['I8CB', 'NZ9O']
I8CB
Count 3 list(self.files.keys()) value ['NZ9O']
NZ9O
Count 4 list(self.files.keys()) value []
2024-09-28 15:26:54.847798: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Traceback (most recent call last):
File "<>/train.py", line 194, in <module>
main()
File "<>/train.py", line 185, in main
model.fit(training_data,
File "/opt/anaconda3/envs/tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/opt/anaconda3/envs/tf/lib/python3.9/site-packages/keras/engine/training.py", line 1420, in fit
raise ValueError('Unexpected result of `train_function` '
ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.

This is the corresponding code for training:

import os
import cv2
import numpy
import random
import argparse
import tensorflow as tf

# Build a Keras model given some parameters
def create_model(captcha_length, captcha_num_symbols, input_shape, model_depth=5, module_size=2):
print(f"input_shape: {input_shape}") # After loading the batch
input_tensor = tf.keras.Input(input_shape)
x = input_tensor
for i, module_length in enumerate([module_size] * model_depth):
for j in range(module_length):
x = tf.keras.layers.Conv2D((32*2**min(i,3)), kernel_size=3, padding='same', kernel_initializer='he_uniform')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)

x = tf.keras.layers.Flatten()(x)
x = [tf.keras.layers.Dense(captcha_num_symbols, activation='softmax', name='char_%d'%(i+1))(x) for i in range(captcha_length)]
model = tf.keras.Model(inputs=input_tensor, outputs=x)

return model

class ImageSequence(tf.keras.utils.Sequence):
def __init__(self, directory_name, batch_size, captcha_length, captcha_symbols, captcha_width, captcha_height):
self.directory_name = directory_name
self.batch_size = batch_size
self.captcha_length = captcha_length
self.captcha_symbols = captcha_symbols
self.captcha_width = captcha_width
self.captcha_height = captcha_height

file_list = os.listdir(self.directory_name)
self.files = dict(zip(map(lambda x: x.split('.')[0], file_list), file_list))
self.used_files = []
self.count = len(file_list)

def __len__(self):
return int(numpy.floor(self.count / self.batch_size))

def __getitem__(self, idx):
X = numpy.zeros((self.batch_size, self.captcha_height, self.captcha_width, 3), dtype=numpy.float32)
y = [numpy.zeros((self.batch_size, len(self.captcha_symbols)), dtype=numpy.uint8) for i in range(self.captcha_length)]

# Add print statements to verify the data
print(f"Batch X shape: {X.shape}")
print(f"Batch y shape: {[yi.shape for yi in y]}")

for i in range(self.batch_size):
print("Count", i, "list(self.files.keys()) value", list(self.files.keys()))
if i == self.count:
break
random_image_label = random.choice(list(self.files.keys()))
print(random_image_label)
random_image_file = self.files[random_image_label]

# We've used this image now, so we can't repeat it in this iteration
self.used_files.append(self.files.pop(random_image_label))

# We have to scale the input pixel values to the range [0, 1] for
# Keras so we divide by 255 since the image is 8-bit RGB
raw_data = cv2.imread(os.path.join(self.directory_name, random_image_file))
rgb_data = cv2.cvtColor(raw_data, cv2.COLOR_BGR2RGB)
processed_data = numpy.array(rgb_data) / 255.0
X = processed_data

# We have a little hack here - we save captchas as TEXT_num.png if there is more than one captcha with the text "TEXT"
# So the real label should have the "_num" stripped out.

random_image_label = random_image_label.split('_')[0]
if len(random_image_label) != self.captcha_length:
raise ValueError(f"Expected CAPTCHA length {self.captcha_length}, but got {len(random_image_label)} for image: {random_image_file}")

for j, ch in enumerate(random_image_label):
symbol_index = self.captcha_symbols.find(ch)
if symbol_index == -1:
raise ValueError(f"Character '{ch}' in CAPTCHA not found in symbols: {self.captcha_symbols}")

y[j][i, :] = 0
y[j][i, self.captcha_symbols.find(ch)] = 1

return X, y

def main():
parser = argparse.ArgumentParser()
parser.add_argument('--width', help='Width of captcha image', type=int)
parser.add_argument('--height', help='Height of captcha image', type=int)
parser.add_argument('--length', help='Length of captchas in characters', type=int)
parser.add_argument('--batch-size', help='How many images in training captcha batches', type=int)
parser.add_argument('--train-dataset', help='Where to look for the training image dataset', type=str)
parser.add_argument('--validate-dataset', help='Where to look for the validation image dataset', type=str)
parser.add_argument('--output-model-name', help='Where to save the trained model', type=str)
parser.add_argument('--input-model', help='Where to look for the input model to continue training', type=str)
parser.add_argument('--epochs', help='How many training epochs to run', type=int)
parser.add_argument('--symbols', help='File with the symbols to use in captchas', type=str)
args = parser.parse_args()

if args.width is None:
print("Please specify the captcha image width")
exit(1)

if args.height is None:
print("Please specify the captcha image height")
exit(1)

if args.length is None:
print("Please specify the captcha length")
exit(1)

if args.batch_size is None:
print("Please specify the training batch size")
exit(1)

if args.epochs is None:
print("Please specify the number of training epochs to run")
exit(1)

if args.train_dataset is None:
print("Please specify the path to the training data set")
exit(1)

if args.validate_dataset is None:
print("Please specify the path to the validation data set")
exit(1)

if args.output_model_name is None:
print("Please specify a name for the trained model")
exit(1)

if args.symbols is None:
print("Please specify the captcha symbols file")
exit(1)

captcha_symbols = None
with open(args.symbols) as symbols_file:
captcha_symbols = symbols_file.readline()

print("Length of captcha symbols", len(captcha_symbols))

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "No GPU available!"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

with tf.device('/device:GPU:0'):
# with tf.device('/device:CPU:0'):
# with tf.device('/device:XLA_CPU:0'):
model = create_model(args.length, len(captcha_symbols), (args.height, args.width, 3))

if args.input_model is not None:
model.load_weights(args.input_model)

model.compile(loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(1e-3, amsgrad=True),
metrics=['accuracy'],
run_eagerly=True)

model.summary()

training_data = ImageSequence(args.train_dataset, args.batch_size, args.length, captcha_symbols, args.width, args.height)
validation_data = ImageSequence(args.validate_dataset, args.batch_size, args.length, captcha_symbols, args.width, args.height)

callbacks = [tf.keras.callbacks.EarlyStopping(patience=3),
tf.keras.callbacks.CSVLogger('log.csv'),
tf.keras.callbacks.ModelCheckpoint(args.output_model_name+'.keras', save_best_only=False)]

with open(args.output_model_name+".json", "w") as json_file:
json_file.write(model.to_json())

try:
model.fit(training_data,
validation_data=validation_data,
epochs=args.epochs,
verbose=1)
except KeyboardInterrupt:
print('KeyboardInterrupt caught, saving current weights as ' + args.output_model_name+'_resume.h5')
model.save_weights(args.output_model_name+'_resume.h5')

if __name__ == '__main__':
main()

Content of symbols.txt:

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

Why am I getting the ValueError? My input shape and output shape seem to be correct and as expected. Please let me know where I'm going wrong, I'm just beginning with Tensorflow and Keras, so forgive me if I have made a really stupid mistake. TIA!

Continue reading...

Logar ou Criar uma Conta

[Python] ValueError: Unexpected result of `train_function`

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] ValueError: Unexpected result of `train_function`

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis