Training the Model

Deepwave currently recommends using TensorFlow 2 or PyTorch as the training framework. While we support the legacy TensorFlow 1, the interface for TensorFlow 2 is more user friendly and will be the supported version of TensorFlow going forward.

Start AirPack Docker Container

The AirPack Docker container is started the same way for all training framework. Ensure that all the steps in AirPack Installation have been completed.

Note: the AirPack directory is not contained within the docker image. It must be mounted when the container is started via the -v option. This also allows for the code and output of training to be accessible by the host machine. See below for details.

  • To start the docker container:

    docker run -it \
    -v <path_to_AirPack>:/AirPack \
    -v <path_to_AirPack_data>:/data \
    --gpus all \
    <docker-image-name>
    

    where:

    • <path_to_AirPack> - the path to the AirStack folder on the host

    • <path_to_AirPack_data> - the path to the AirStack data set. See here for more information.

    • <docker-image-name> - is the name of the docker image assigned when the image was created during the AirPack Installation.

  • After executing this command you are in a Linux environment within the Docker container. If you are unfamiliar with Docker, it is very similar to a virtual machine. the -v flag will mount the AirPack toolbox and data in /AirPack and /data, respectively.

  • Set up your copy of the AirPack Python code for use. This will allow you to edit code from both inside and outside of the container, as well as import the airpack module and use it from your own custom code. Do this each time you start a new container.

    $ pip install -e /AirPack
    

TensorFlow

TF2 Training Click on the above video to make it large and open in new window.

Train the Model

  • Run the training script from within the Docker container:

    $ python /AirPack/airpack_scripts/tf2/run_training.py
    
  • The script will periodically display a terminal output similar to the following:

    $ python /AirPack/airpack_scripts/tf2/run_training.py
    ...
    Epoch 1/10
    610/610 [==============================] - 11s 17ms/step - loss: 0.8916 - categorical_accuracy: 0.6776 - val_loss: 0.4115 - val_categorical_accuracy: 0.8332
    Epoch 2/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.3341 - categorical_accuracy: 0.8670 - val_loss: 0.2753 - val_categorical_accuracy: 0.8781
    Epoch 3/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.2468 - categorical_accuracy: 0.9020 - val_loss: 0.2411 - val_categorical_accuracy: 0.8997
    Epoch 4/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.1669 - categorical_accuracy: 0.9376 - val_loss: 0.1700 - val_categorical_accuracy: 0.9510
    Epoch 5/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.1401 - categorical_accuracy: 0.9542 - val_loss: 0.2257 - val_categorical_accuracy: 0.9282
    Epoch 6/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.1013 - categorical_accuracy: 0.9691 - val_loss: 0.1021 - val_categorical_accuracy: 0.9682
    Epoch 7/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.0772 - categorical_accuracy: 0.9771 - val_loss: 0.1036 - val_categorical_accuracy: 0.9659
    Epoch 8/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.0691 - categorical_accuracy: 0.9800 - val_loss: 0.0849 - val_categorical_accuracy: 0.9773
    Epoch 9/10
    610/610 [==============================] - 9s 15ms/step - loss: 0.0446 - categorical_accuracy: 0.9881 - val_loss: 0.1164 - val_categorical_accuracy: 0.9678
    Epoch 10/10
    610/610 [==============================] - 9s 14ms/step - loss: 0.0616 - categorical_accuracy: 0.9834 - val_loss: 0.1057 - val_categorical_accuracy: 0.9731
    
  • Once the script has completed the training iterations, it will produce multiple files in the /AirPack/output directory including the following:

    • saved_model.onnx - File that will be used for deployment on the AIR-T

Perform Inference with Trained Model

  • You may use the run_inference.py script to evaluate the performance of the model and plot the result. This script will load the saved_model.onnx file produced during training, feed the test data through the network, and create a result plot.

  • Run the inference script

    $ python /AirPack/airpack_scripts/tf2/run_inference.py
    
  • Running this script will produce an image in /AirPack/output/saved_model_plot.png demonstrating the inference performance for each signal type.


PyTorch

Train the Model

  • Run the training script from within the Docker container:

    $ python /AirPack/airpack_scripts/pytorch/run_training.py
    
  • The script will periodically display a terminal output similar to the following:

    $ python /AirPack/airpack_scripts/pytorch/run_training.py
    ...
    Training Progress:   0%|                                                    | 0/10 [00:00<?, ?epoch/s]Epoch 1 of 10
    Train Loss: 0.0128, Train Accuracy: 0.6765
    Val Loss: 0.0068, Val Accuracy: 0.8188
    Training Progress:  10%|████▍                                               | 1/10 [00:28<04:20, 28.99s/epoch]Epoch 2 of 10
    Train Loss: 0.0057, Train Accuracy: 0.8443
    Val Loss: 0.0045, Val Accuracy: 0.8744
    Training Progress:  20%|████████████                                        | 2/10 [00:58<03:52, 29.07s/epoch]Epoch 3 of 10
    Train Loss: 0.0041, Train Accuracy: 0.8846
    Val Loss: 0.0035, Val Accuracy: 0.9019
    Training Progress:  30%|███████████████████                                 | 3/10 [01:26<03:20, 28.71s/epoch]Epoch 4 of 10
    Train Loss: 0.0033, Train Accuracy: 0.9102
    Val Loss: 0.0039, Val Accuracy: 0.8978
    Training Progress:  40%|█████████████████████████                           | 4/10 [01:54<02:51, 28.56s/epoch]Epoch 5 of 10
    Train Loss: 0.0025, Train Accuracy: 0.9328
    Val Loss: 0.0031, Val Accuracy: 0.9165
    Training Progress:  50%|██████████████████████████████                      | 5/10 [02:21<02:20, 28.15s/epoch]Epoch 6 of 10
    Train Loss: 0.0021, Train Accuracy: 0.9449
    Val Loss: 0.0022, Val Accuracy: 0.9429
    Training Progress:  60%|██████████████████████████████████                  | 6/10 [02:48<01:51, 27.77s/epoch]Epoch 7 of 10
    Train Loss: 0.0018, Train Accuracy: 0.9526
    Val Loss: 0.0018, Val Accuracy: 0.9542
    Training Progress:  70%|███████████████████████████████████████              | 7/10 [03:16<01:23, 27.74s/epoch]Epoch 8 of 10
    Train Loss: 0.0015, Train Accuracy: 0.9608
    Val Loss: 0.0016, Val Accuracy: 0.9599
    Training Progress:  80%|████████████████████████████████████████████         | 8/10 [03:43<00:55, 27.69s/epoch]Epoch 9 of 10
    Train Loss: 0.0013, Train Accuracy: 0.9666
    Val Loss: 0.0016, Val Accuracy: 0.9636
    Training Progress:  90%|███████████████████████████████████████████████      | 9/10 [04:11<00:27, 27.73s/epoch]Epoch 10 of 10
    Train Loss: 0.0012, Train Accuracy: 0.9701
    Val Loss: 0.0013, Val Accuracy: 0.9691
    Training Progress: 100%|█████████████████████████████████████████████████████| 10/10 [04:40<00:00, 28.06s/epoch]
    
  • Once the script has completed the training iterations, it will produce multiple files in the /AirPack/output/pytorch directory including the following:

    • saved_model.onnx - File that will be used for deployment on the AIR-T

Perform Inference with Trained Model

  • You may use the run_inference.py script to evaluate the performance of the model and plot the result. This script will load the saved_model.onnx file produced during training, feed the test data through the network, and create a result plot.

  • Run the inference script:

    $ python /AirPack/airpack_scripts/pytorch/run_inference.py
    
  • Running this script will produce an image in /AirPack/output/pytorch/saved_model_plot.png demonstrating the inference performance for each signal type.


TensorFlow 1 (Legacy)

Note: Deepwave strongly recommends transitioning to TensorFlow 2 as TensorFlow 1 is deprecated.

Train the Model

  • Run the training script

    $ python /AirPack/airpack_scripts/tf1/run_training.py
    
  • The script will periodically display a terminal output similar to the following:

    $ python /AirPack/airpack_scripts/tf1/run_training.py
    ...
    (0 of 6094): Training Loss = 2.494922, Testing Accuracy = 0.109375
    (100 of 6094): Training Loss = 1.590902, Testing Accuracy = 0.445312
    (200 of 6094): Training Loss = 0.962753, Testing Accuracy = 0.664062
    (300 of 6094): Training Loss = 0.617013, Testing Accuracy = 0.812500
    (400 of 6094): Training Loss = 0.499497, Testing Accuracy = 0.773438
    (500 of 6094): Training Loss = 0.317061, Testing Accuracy = 0.890625
    (600 of 6094): Training Loss = 0.381197, Testing Accuracy = 0.867188
    (700 of 6094): Training Loss = 0.347956, Testing Accuracy = 0.843750
    (800 of 6094): Training Loss = 0.464664, Testing Accuracy = 0.796875
    (900 of 6094): Training Loss = 0.384519, Testing Accuracy = 0.820312
    ...
    ...
    (5800 of 6094): Training Loss = 0.025528, Testing Accuracy = 0.968750
    (5900 of 6094): Training Loss = 0.068839, Testing Accuracy = 0.960938
    (6000 of 6094): Training Loss = 0.017364, Testing Accuracy = 0.975000
    (6100 of 6094): Training Loss = 0.046915, Testing Accuracy = 0.976562
    
  • Once the script has completed the training iterations, it will produce multiple files in the /AirPack/data/output directory including the following:

    • checkpoint - file that defines the location of the saved model files

    • saved_model.meta - file that contains the graph and protocol buffer

    • saved_model.onnx - File that will be used for deployment on the AIR-T