1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] ValueError: Unexpected result of `train_function`

Discussão em 'Python' iniciado por Stack, Setembro 28, 2024 às 12:43.

  1. Stack

    Stack Membro Participativo

    I'm training a model to classify CAPTCHA labels where I'm facing the following problem during model.Fit():

    python3 train.py --width 128 --height 64 --length 4 --symbols symbols.txt --batch-size 32 --epochs 1 --output-model test --train-dataset training_data --validate-dataset validation_data

    Length of captcha symbols 36
    Metal device set to: Apple M3

    systemMemory: 16.00 GB
    maxCacheSize: 5.33 GB

    2024-09-28 15:26:54.529594: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
    2024-09-28 15:26:54.529683: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
    input_shape: (64, 128, 3)
    Model: "model"
    __________________________________________________________________________________________________
    Layer (type) Output Shape Param # Connected to
    ==================================================================================================
    input_1 (InputLayer) [(None, 64, 128, 3) 0 []
    ]

    conv2d (Conv2D) (None, 64, 128, 32) 896 ['input_1[0][0]']

    batch_normalization (BatchNorm (None, 64, 128, 32) 128 ['conv2d[0][0]']
    alization)

    activation (Activation) (None, 64, 128, 32) 0 ['batch_normalization[0][0]']

    conv2d_1 (Conv2D) (None, 64, 128, 32) 9248 ['activation[0][0]']

    batch_normalization_1 (BatchNo (None, 64, 128, 32) 128 ['conv2d_1[0][0]']
    rmalization)

    activation_1 (Activation) (None, 64, 128, 32) 0 ['batch_normalization_1[0][0]']

    max_pooling2d (MaxPooling2D) (None, 32, 64, 32) 0 ['activation_1[0][0]']

    conv2d_2 (Conv2D) (None, 32, 64, 64) 18496 ['max_pooling2d[0][0]']

    batch_normalization_2 (BatchNo (None, 32, 64, 64) 256 ['conv2d_2[0][0]']
    rmalization)

    activation_2 (Activation) (None, 32, 64, 64) 0 ['batch_normalization_2[0][0]']

    conv2d_3 (Conv2D) (None, 32, 64, 64) 36928 ['activation_2[0][0]']

    batch_normalization_3 (BatchNo (None, 32, 64, 64) 256 ['conv2d_3[0][0]']
    rmalization)

    activation_3 (Activation) (None, 32, 64, 64) 0 ['batch_normalization_3[0][0]']

    max_pooling2d_1 (MaxPooling2D) (None, 16, 32, 64) 0 ['activation_3[0][0]']

    conv2d_4 (Conv2D) (None, 16, 32, 128) 73856 ['max_pooling2d_1[0][0]']

    batch_normalization_4 (BatchNo (None, 16, 32, 128) 512 ['conv2d_4[0][0]']
    rmalization)

    activation_4 (Activation) (None, 16, 32, 128) 0 ['batch_normalization_4[0][0]']

    conv2d_5 (Conv2D) (None, 16, 32, 128) 147584 ['activation_4[0][0]']

    batch_normalization_5 (BatchNo (None, 16, 32, 128) 512 ['conv2d_5[0][0]']
    rmalization)

    activation_5 (Activation) (None, 16, 32, 128) 0 ['batch_normalization_5[0][0]']

    max_pooling2d_2 (MaxPooling2D) (None, 8, 16, 128) 0 ['activation_5[0][0]']

    conv2d_6 (Conv2D) (None, 8, 16, 256) 295168 ['max_pooling2d_2[0][0]']

    batch_normalization_6 (BatchNo (None, 8, 16, 256) 1024 ['conv2d_6[0][0]']
    rmalization)

    activation_6 (Activation) (None, 8, 16, 256) 0 ['batch_normalization_6[0][0]']

    conv2d_7 (Conv2D) (None, 8, 16, 256) 590080 ['activation_6[0][0]']

    batch_normalization_7 (BatchNo (None, 8, 16, 256) 1024 ['conv2d_7[0][0]']
    rmalization)

    activation_7 (Activation) (None, 8, 16, 256) 0 ['batch_normalization_7[0][0]']

    max_pooling2d_3 (MaxPooling2D) (None, 4, 8, 256) 0 ['activation_7[0][0]']

    conv2d_8 (Conv2D) (None, 4, 8, 256) 590080 ['max_pooling2d_3[0][0]']

    batch_normalization_8 (BatchNo (None, 4, 8, 256) 1024 ['conv2d_8[0][0]']
    rmalization)

    activation_8 (Activation) (None, 4, 8, 256) 0 ['batch_normalization_8[0][0]']

    conv2d_9 (Conv2D) (None, 4, 8, 256) 590080 ['activation_8[0][0]']

    batch_normalization_9 (BatchNo (None, 4, 8, 256) 1024 ['conv2d_9[0][0]']
    rmalization)

    activation_9 (Activation) (None, 4, 8, 256) 0 ['batch_normalization_9[0][0]']

    max_pooling2d_4 (MaxPooling2D) (None, 2, 4, 256) 0 ['activation_9[0][0]']

    flatten (Flatten) (None, 2048) 0 ['max_pooling2d_4[0][0]']

    char_1 (Dense) (None, 36) 73764 ['flatten[0][0]']

    char_2 (Dense) (None, 36) 73764 ['flatten[0][0]']

    char_3 (Dense) (None, 36) 73764 ['flatten[0][0]']

    char_4 (Dense) (None, 36) 73764 ['flatten[0][0]']

    ==================================================================================================
    Total params: 2,653,360
    Trainable params: 2,650,416
    Non-trainable params: 2,944
    __________________________________________________________________________________________________
    Batch X shape: (32, 64, 128, 3)
    Batch y shape: [(32, 36), (32, 36), (32, 36), (32, 36)]
    Count 0 list(self.files.keys()) value ['N5R5', 'IZJO', 'I8CB', 'NZ9O']
    IZJO
    Count 1 list(self.files.keys()) value ['N5R5', 'I8CB', 'NZ9O']
    N5R5
    Count 2 list(self.files.keys()) value ['I8CB', 'NZ9O']
    I8CB
    Count 3 list(self.files.keys()) value ['NZ9O']
    NZ9O
    Count 4 list(self.files.keys()) value []
    2024-09-28 15:26:54.847798: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
    Traceback (most recent call last):
    File "<>/train.py", line 194, in <module>
    main()
    File "<>/train.py", line 185, in main
    model.fit(training_data,
    File "/opt/anaconda3/envs/tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
    File "/opt/anaconda3/envs/tf/lib/python3.9/site-packages/keras/engine/training.py", line 1420, in fit
    raise ValueError('Unexpected result of `train_function` '
    ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.


    This is the corresponding code for training:

    import os
    import cv2
    import numpy
    import random
    import argparse
    import tensorflow as tf

    # Build a Keras model given some parameters
    def create_model(captcha_length, captcha_num_symbols, input_shape, model_depth=5, module_size=2):
    print(f"input_shape: {input_shape}") # After loading the batch
    input_tensor = tf.keras.Input(input_shape)
    x = input_tensor
    for i, module_length in enumerate([module_size] * model_depth):
    for j in range(module_length):
    x = tf.keras.layers.Conv2D((32*2**min(i,3)), kernel_size=3, padding='same', kernel_initializer='he_uniform')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation('relu')(x)
    x = tf.keras.layers.MaxPooling2D(2)(x)

    x = tf.keras.layers.Flatten()(x)
    x = [tf.keras.layers.Dense(captcha_num_symbols, activation='softmax', name='char_%d'%(i+1))(x) for i in range(captcha_length)]
    model = tf.keras.Model(inputs=input_tensor, outputs=x)

    return model

    class ImageSequence(tf.keras.utils.Sequence):
    def __init__(self, directory_name, batch_size, captcha_length, captcha_symbols, captcha_width, captcha_height):
    self.directory_name = directory_name
    self.batch_size = batch_size
    self.captcha_length = captcha_length
    self.captcha_symbols = captcha_symbols
    self.captcha_width = captcha_width
    self.captcha_height = captcha_height

    file_list = os.listdir(self.directory_name)
    self.files = dict(zip(map(lambda x: x.split('.')[0], file_list), file_list))
    self.used_files = []
    self.count = len(file_list)

    def __len__(self):
    return int(numpy.floor(self.count / self.batch_size))

    def __getitem__(self, idx):
    X = numpy.zeros((self.batch_size, self.captcha_height, self.captcha_width, 3), dtype=numpy.float32)
    y = [numpy.zeros((self.batch_size, len(self.captcha_symbols)), dtype=numpy.uint8) for i in range(self.captcha_length)]

    # Add print statements to verify the data
    print(f"Batch X shape: {X.shape}")
    print(f"Batch y shape: {[yi.shape for yi in y]}")

    for i in range(self.batch_size):
    print("Count", i, "list(self.files.keys()) value", list(self.files.keys()))
    if i == self.count:
    break
    random_image_label = random.choice(list(self.files.keys()))
    print(random_image_label)
    random_image_file = self.files[random_image_label]

    # We've used this image now, so we can't repeat it in this iteration
    self.used_files.append(self.files.pop(random_image_label))

    # We have to scale the input pixel values to the range [0, 1] for
    # Keras so we divide by 255 since the image is 8-bit RGB
    raw_data = cv2.imread(os.path.join(self.directory_name, random_image_file))
    rgb_data = cv2.cvtColor(raw_data, cv2.COLOR_BGR2RGB)
    processed_data = numpy.array(rgb_data) / 255.0
    X = processed_data

    # We have a little hack here - we save captchas as TEXT_num.png if there is more than one captcha with the text "TEXT"
    # So the real label should have the "_num" stripped out.

    random_image_label = random_image_label.split('_')[0]
    if len(random_image_label) != self.captcha_length:
    raise ValueError(f"Expected CAPTCHA length {self.captcha_length}, but got {len(random_image_label)} for image: {random_image_file}")

    for j, ch in enumerate(random_image_label):
    symbol_index = self.captcha_symbols.find(ch)
    if symbol_index == -1:
    raise ValueError(f"Character '{ch}' in CAPTCHA not found in symbols: {self.captcha_symbols}")

    y[j][i, :] = 0
    y[j][i, self.captcha_symbols.find(ch)] = 1

    return X, y

    def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--width', help='Width of captcha image', type=int)
    parser.add_argument('--height', help='Height of captcha image', type=int)
    parser.add_argument('--length', help='Length of captchas in characters', type=int)
    parser.add_argument('--batch-size', help='How many images in training captcha batches', type=int)
    parser.add_argument('--train-dataset', help='Where to look for the training image dataset', type=str)
    parser.add_argument('--validate-dataset', help='Where to look for the validation image dataset', type=str)
    parser.add_argument('--output-model-name', help='Where to save the trained model', type=str)
    parser.add_argument('--input-model', help='Where to look for the input model to continue training', type=str)
    parser.add_argument('--epochs', help='How many training epochs to run', type=int)
    parser.add_argument('--symbols', help='File with the symbols to use in captchas', type=str)
    args = parser.parse_args()

    if args.width is None:
    print("Please specify the captcha image width")
    exit(1)

    if args.height is None:
    print("Please specify the captcha image height")
    exit(1)

    if args.length is None:
    print("Please specify the captcha length")
    exit(1)

    if args.batch_size is None:
    print("Please specify the training batch size")
    exit(1)

    if args.epochs is None:
    print("Please specify the number of training epochs to run")
    exit(1)

    if args.train_dataset is None:
    print("Please specify the path to the training data set")
    exit(1)

    if args.validate_dataset is None:
    print("Please specify the path to the validation data set")
    exit(1)

    if args.output_model_name is None:
    print("Please specify a name for the trained model")
    exit(1)

    if args.symbols is None:
    print("Please specify the captcha symbols file")
    exit(1)

    captcha_symbols = None
    with open(args.symbols) as symbols_file:
    captcha_symbols = symbols_file.readline()

    print("Length of captcha symbols", len(captcha_symbols))

    physical_devices = tf.config.experimental.list_physical_devices('GPU')
    assert len(physical_devices) > 0, "No GPU available!"
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

    with tf.device('/device:GPU:0'):
    # with tf.device('/device:CPU:0'):
    # with tf.device('/device:XLA_CPU:0'):
    model = create_model(args.length, len(captcha_symbols), (args.height, args.width, 3))

    if args.input_model is not None:
    model.load_weights(args.input_model)

    model.compile(loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(1e-3, amsgrad=True),
    metrics=['accuracy'],
    run_eagerly=True)

    model.summary()

    training_data = ImageSequence(args.train_dataset, args.batch_size, args.length, captcha_symbols, args.width, args.height)
    validation_data = ImageSequence(args.validate_dataset, args.batch_size, args.length, captcha_symbols, args.width, args.height)

    callbacks = [tf.keras.callbacks.EarlyStopping(patience=3),
    tf.keras.callbacks.CSVLogger('log.csv'),
    tf.keras.callbacks.ModelCheckpoint(args.output_model_name+'.keras', save_best_only=False)]

    with open(args.output_model_name+".json", "w") as json_file:
    json_file.write(model.to_json())

    try:
    model.fit(training_data,
    validation_data=validation_data,
    epochs=args.epochs,
    verbose=1)
    except KeyboardInterrupt:
    print('KeyboardInterrupt caught, saving current weights as ' + args.output_model_name+'_resume.h5')
    model.save_weights(args.output_model_name+'_resume.h5')

    if __name__ == '__main__':
    main()


    Content of symbols.txt:

    ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789


    Why am I getting the ValueError? My input shape and output shape seem to be correct and as expected. Please let me know where I'm going wrong, I'm just beginning with Tensorflow and Keras, so forgive me if I have made a really stupid mistake. TIA!

    Continue reading...

Compartilhe esta Página