Neural Network from scratch: Part 5; C++ Deep Learning Framework Implementation

C++ project.

The main goal of this article is to show how to develop a project in C++ by explaining key concepts of the language: abstract class/interface and inheritance, memory management, smart-pointers, iterator, constexpress, templates, std containers and eigen matrices, static class, namespace, makefile/cmake and debugging (step-by-step, memory leaks detection, profiling). This project will be applied to the development of a simple Deep Learning framework implementing the MSE loss, ReLU and softmax activation functions, linear layer, a feature/label generator and a mini-batch training function.

GitHub link: https://github.com/Apiquet/DeepLearningFrameworkFromScratchCpp

As explained above, this article will focus on the concepts of C++ and less on the actual understanding of the implementation of the layers. To understand in depth how a neural network is implemented, a complete implementation of a Deep Learning framework is available in the first parts of the series:

  • Part 1 explains the principle of gradient descent and its implementation : using the differential approach with a 2D example, using the perturbation approach with a 3D example.
  • Part 2 explains in detail: the forward and backward passes of the Sigmoid, ReLU, LeakyReLU, Softmax, Linear layers and MSE loss; how to build a sequential module to create a model and its training function.
  • Part 3 explains: Convolution, Flatten, Max and Mean Pooling layers and how to implement features such as: saving and loading a model to deploy it somewhere, getting its parameter count, drawing training curves, printing its description.

Table of contents

  1. Environment setup
    1. Compiler installation
    2. Code Editor
  2. Project architecture
    1. File organization – include/src/tests
    2. Compilation with g++ / Makefile
    3. CMake
  3. C++ concepts through examples
    1. Abstract class / Interface
    2. Namespace and includes
      • MSE Loss
    3. Eigen matrices
      • ReLU layer
      • Softmax
      • Linear layer
      • Data generator
    4. Smart-pointer, iterator and std containers
      • Sequential module
    5. Static class
      • Metrics
    6. Templates
      • Trainer
    7. Constexpr
      • Test framework
    8. Code format using Clang
  4. Debugging
    1. Static: CPPFLAGS
    2. Dynamic
      1. Manual: Unit / Integration / System / Acceptance tests
      2. Manual: step by step debugging
      3. Automatic: leak detection and profiling tools
        • Valgrind and kcachegrind
        • Perf and flamegraph

1) Environment setup

1-1) Compiler installation

This project can be done on Linux, Windows or Mac with a compiler installed.

To install GCC (GNU Compiler Collection) on Linux :

sudo apt install build-essential manpages-dev

Windows users have the option to switch to Linux or to install MSYS2: https://www.msys2.org/

1-2) Code editor

Any code editor will do, but the last section of this article on debugging will show how to debug “step by step” with Visual Studio Code as an example, a must-have feature for any developer.

2) Project architecture

2-1) File organization – include/src/unit tests

The architecture of a C++ project can be realized in different ways. Common structures often include 4 or 5 folders: src/, lib/, include/, tests/, doc/. The choice will depend on the project but in our case of a light application, having src/, include/, tests/ will be sufficient.

The include folder contains the headers and documentation of each function and class. This include folder in an application is often public, so we move all the implementation to the src/ folder.

The tests/ folder will contain all the network learning tests but also the unit tests which will be described in the Debugging part of this article.

2-2) Compilation with g++ / Makefile

Before moving on to the actual implementation of the project, a final configuration must be done. As C++ is a compiled language, a compilation must be done to create an executable to run our code. This compilation can be done with the g++ command (installed in part 1 of this article).
The following examples show how the g++ command should be used depending on the situation:

tests/
-----|main.cpp

g++ test/main.cpp
src/
-----|layer1.cpp
tests/
-----|main.cpp

g++ test/main.cpp src/layer1.cpp
include/
-----|layer1.hpp
src/
-----|layer1.cpp
tests/
-----|main.cpp

g++ test/main.cpp src/layer1.cpp -Iinclude/
include/
-----|layer1.hpp
-----|....hpp
-----|layerN.hpp
src/
-----|layer1.cpp
-----|...
-----|layerN.cpp
tests/
-----|main.cpp

g++ test/main.cpp src/*.cpp -Iinclude/
include/
-----|folder1/
-----|-----|layer1..N.hpp
-----|folder2/
-----|-----|layer1..N.hpp
-----|folder3/
-----|-----|layer1..N.hpp
src/
-----|folder1/
-----|-----|layer1..N.cpp
-----|folder2/
-----|-----|layer1..N.cpp
-----|folder3/
-----|-----|layer1..N.cpp
tests/
-----|main.cpp

g++ test/main.cpp src/folder1/*.cpp src/folder2/*.cpp src/folder3/*.cpp -Iinclude/folder1 -Iinclude/folder2 -Iinclude/folder3

As we have shown, the g++ command can become more and more complex as the project gets bigger. To solve this problem, a Makefile makes compilation much easier.

At the end of this article, the project will have the following architecture:

Project architecture

To build such a project, the g++ command must know all the include folders as well as all the source files contained in each folder of the src/ folder.

To loop over a folder to get all the include folders, the wildcard command can be used:

INCS = $(wildcard include/*)

This command will add all include folders to the g++ command. However, a ‘-I’ must be added in front of each item with the addprefix command:

INC_DIRS = $(addprefix -I, $(INCS))

For the source files, it is a bit more complex because every cpp file must be added to the command and not just the folders. To accomplish this trick, the foreach command can be used to loop through all the subfolders and then pass the result to the wildcard command:

SRCS = $(foreach sdir, src/*, $(wildcard $(sdir)/*.cpp))

To debug a Makefile, each variable can be printed to the console if the following line is added:

print-% : ; @echo $* = $($*)

Then, by doing: make print-INCS, the INCS variable will be displayed on the console.

Finally, the file tests/main.cpp must be added and the command g++ will be performed. Even if a new folder with new files is created, the Makefile will take care of it. The final Makefile :

PROG = tests/main
CC = g++
CPPFLAGS = -std=c++20

INCS = $(wildcard include/*) src/Trainers

INC_DIRS = $(addprefix -I, $(INCS))

SRCS = $(foreach sdir, src/*, $(wildcard $(sdir)/*.cpp))

print-% : ; @echo $* = $($*)

main.o :
	$(CC) $(CPPFLAGS) $(PROG).cpp $(SRCS) -Iinclude $(INC_DIRS) -o $(PROG)

Running the make command with this project with this Makefile will run the following g++ command:

g++  tests/main.cpp  src/Activations/ReLU.cpp src/Activations/Softmax.cpp src/Data/DataBuilder2D.cpp src/Layers/Linear.cpp src/Losses/MSE.cpp src/Metrics/Metrics.cpp src/Sequential/Sequential.cpp src/Trainers/Trainer.cpp -Iinclude -Iinclude/Activations -Iinclude/Data -Iinclude/Eigen -Iinclude/Layers -Iinclude/Losses -Iinclude/Metrics -Iinclude/Module -Iinclude/Sequential -Iinclude/Trainers -Isrc/Trainers -o tests/main

This command does the job but does not use some useful features like CPPFLAGS. This will be discussed in section 4 on debugging.

2-3) CMake

CMake is a tool that generates the makefiles. It is widely used in engineering.

To create the Makefile, CMake needs to know: the project name, the include directories, the executables, etc. This can be achieved with built-in functions:

cmake_minimum_required(VERSION 3.0.0)
project(TestDeepLearningFramework VERSION 0.1.0)

include_directories(
    ${PROJECT_SOURCE_DIR}/include
    ${PROJECT_SOURCE_DIR}/include/Activations
    ${PROJECT_SOURCE_DIR}/include/Data
    ${PROJECT_SOURCE_DIR}/include/Layers
    ${PROJECT_SOURCE_DIR}/include/Losses
    ${PROJECT_SOURCE_DIR}/include/Metrics
    ${PROJECT_SOURCE_DIR}/include/Module
    ${PROJECT_SOURCE_DIR}/include/Sequential
    ${PROJECT_SOURCE_DIR}/include/Trainers
    ${PROJECT_SOURCE_DIR}/src/Trainers
)

# source files
file(GLOB SOURCES "src/*/*.cpp" tests/main.cpp)

add_executable(TestDeepLearningFramework ${SOURCES})

As shown above, compared to writing a Makefile by ourselves, writing a CMakeLists is much more convenient. Moreover, it displays a progress bar to see the progress of the compilation:

3) C++ concepts through examples

3-1) Abstract class / Interface

The first concepts discussed will be abstract classes, interfaces and inheritance. This concept is shared with other programming languages. The principle of an abstract class is to declare a class with methods that are not yet implemented. Then, other classes can inherit from this class and will have to implement all the unimplemented methods (this is called override) in order not to be abstract themselves. The purpose of such an approach is to group several classes under the same parent class. An interface is an abstract class with only unimplemented methods.

This project is a good example of the usefulness of an abstract class as many classes will be very similar with forward and backward passes, a print description method, a fixed learning rate method, etc. Another great utility will be shown in the section that talks about std::containers.

Our abstract class Module can then be declared simply as follows:

class Module
{
    public:
        virtual void forward(Eigen::MatrixXf& out, const Eigen::MatrixXf& x) = 0;

        virtual void backward(Eigen::MatrixXf& ddout, const Eigen::MatrixXf& dout) = 0;

        virtual void printDescription() = 0;
};

It is an interface because all the methods are not implemented. As the user should not instantiate this class, the constructor is removed. The non-implemented methods are declared as virtual. The classes that will inherit from this class, will have to implement them.

A general rule for a class to inherit from another is that it must answer the question “is-a”. For example, a linear layer is a layer, so we can name the previous abstract class “Layer” and inherit from it in the linear layer. In our case, a more generic name was chosen “Module” to match all the layer classes, activation functions and losses.

To inherit from a class, you must add “: Class” :

class Linear: public Module
{
    ...
}

3-2) Namespace and includes

Using a namespace creates a scope for classes, functions, and variables. Classes of the same “type” can thus belong to the same group. For example, in our case, we might want to stack all losses under the same identifier “Losses”. This can be done for the MSE module:

namespace DeepLearningFramework
{
    namespace Losses
    {
        /**
         * Loss class: MSE.
         * 
         * forward: output = input if input > 0, else 0
         * backward: output = 1*input if forward input was > 0, else 0
         */
        class MSE
        {
            ...
        };
    }; // namespace Losses    
}; // namespace DeepLearningFramework

The MSE loss belongs to the Losses namespace which also belongs to the DeepLearningFramework namespace.
Once this module is implemented, we can instantiate it with the following line:

DeepLearningFramework::Losses::MSE mseLoss;

For example, the ReLU and Softmax activation functions will be DeepLearningFramework::Activations::ReLU and DeepLearningFramework::Activations::Softmax. This allows you to organize your classes well.

Note: to instantiate a class from another file, you must include it with ‘#include “filename”‘ so that the compiler knows where to look for the class in question. This inclusion can be done with a relative path to the file but with correct compilation (shown in the previous section), using the file name is sufficient. The file must also be between ” or < depending on whether it is a local file or an installed library ( for standard library header files and “filename” for local files).

3-3) Eigen matrices

Some libraries can make the programmer’s life easier, std (Standard Template Library) and eigen are two of them. In this section, some useful matrix operations with Eigen will be shown in the implementation of ReLU, Softmax, Linear layer and the data generator class.

As a first example, here is the ReLU forward pass equation:

ReLU forward pass
Backward pass

To implement this function, it is possible to use a for loop that goes through all the pixels setting the negative values to 0. The select method of eigen can also do the same thing. In addition, this introduces the array() and matrix() methods which tell eigen to do an element-wise operation or an operation on a matrix. For a comparison of values, the conversion to array must be used:

void ReLU::forward(Eigen::MatrixXf& out, const Eigen::MatrixXf& x)
{
    mForwardInput = x;
    out = (x.array() < 0.f).select(0.f, x);
}

void ReLU::backward(Eigen::MatrixXf& ddout, const Eigen::MatrixXf& dout)
{
    ddout = (mForwardInput.array() < 0.f).select(0.f, dout);
}

The implementation of the Softmax activation function allows to demonstrate how to apply the exponential function on all the elements of a matrix, how to subtract numbers and multiply matrices between them element-wise. Here are its equations:

Softmax forward pass

And the implementation:

void Softmax::forward(Eigen::MatrixXf& y, const Eigen::MatrixXf& x)
{
    Eigen::MatrixXf expX = x.array().exp();
    y = x;
    for (int row = 0; row < x.rows(); ++row) {
        y.row(row) = expX.row(row) / expX.row(row).sum();
    }
}

The exp() method is used per element thanks to the array() method. Then, a loop on the row is done to divide each sample by the sum of the exponentials. This loop could be replaced by another method of Eigen called rowwise() to apply the calculation in the same way.

The Linear module shows how to initialize an Eigen matrix in a random way. This can be done using “Random”:

mWeights = Eigen::MatrixXf::Random(inputFeaturesNumber, outputFeaturesNumber);
mBias = Eigen::MatrixXf::Random(1, outputFeaturesNumber);

The classical matrix operation can be applied with the matrix() method for the weights :

out = x.matrix() * mWeights.matrix();

The bias can then be added with a rowwise() operation.

Finally, the backtracking will conclude the operations that will be shown in this section with the colwise().mean() operation that averages by column and the transpose operation to two multiple matrices:

mWeights = mWeights.array() - mLR * (mForwardInput.transpose() * dout).array();
mBias = mBias.array() - mLR * dout.colwise().mean().array();

We can also build a dataset in the form of Eigen matrices. Here we will build a 2D dataset where class 1 will correspond to the points inside a circle and class 0 outside this circle:

features = Eigen::MatrixXf::Random(samplesCount, 2);
labels = Eigen::MatrixXf(samplesCount, 2);

for(int i = 0; i < samplesCount; i++)
{
    if(pow(features(i, 0), 2) + pow(features(i, 1), 2) / M_PI < discRadius)
    {
        labels(i, 0) = 0.f;
        labels(i, 1) = 1.f;
    }
    else
    {
        labels(i, 0) = 1.f;
        labels(i, 1) = 0.f;
    }
}

3-4) Smart-pointer, iterator and std containers

Our sequential module will be a bit more complex than the other modules because it will have to build the neural network. To build a model, it must infer in sequence each layer and then reverse the order to do the backpropagation. It would be very convenient to be able to store all the layers in sequence in a list and then loop through that list in both orders. Here is the standard library and all its available containers: array, vector, deque, list, set, map, etc. All the containers can be found in the official documentation here.

As it can be difficult to find the one that best suits our use case, some “cheat sheets” help us to make our choice (source):

Cheat sheet for std containers

For our example, the container std::vector will be used. This vector can be instantiated with or without an object but the type of its objects must be specified. This will show another usefulness of having all our classes belong to the same abstract class, we will be able to implement a vector with the type “Module” and then store any type of layer in this vector :

std::vector<Module*> mModel;

Then any layer can be instantiated and pushed to the vector with the push_back() method. Doing this is not the most optimal, a good practice may be to use the emplace_back() method to instantiate the layer directly into the vector:

/* Model creation */
std::vector<Module*> layers;
int inputFeaturesNumber = 2, outputFeaturesNumber = 2, hiddenSize = 10;
layers.emplace_back(new Layers::Linear((int)inputFeaturesNumber, (int)hiddenSize));
layers.emplace_back(new Activations::ReLU());
layers.emplace_back(new Layers::Linear((int)hiddenSize, (int)hiddenSize));
layers.emplace_back(new Activations::ReLU());
layers.emplace_back(new Layers::Linear((int)hiddenSize, (int)outputFeaturesNumber));
layers.emplace_back(new Activations::Softmax());

This vector does not really contain all the layers, but pointers to these layers, because the * is used. This avoids copying a lot of data between functions because only pointers are sent. However, these pointers allocate memory that must be freed! A well-known problem is the memory leak that occurs when programmers allocate memory using the new keyword and forget to deallocate memory using the delete function.

New pointer types have been introduced to allow automatic deletion. smart_pointers for example keeps an internal counter and when this count reaches 0 (no user is using the pointer anymore), an automatic deletion appears.

Finally, the sequential module must iterate the std::vector back and forth to perform the forward and backward passes. To do this, the std::iterator can be used as follows:

void Sequential::forward(Eigen::MatrixXf& x)
{
    std::vector<Module*>::iterator it;
    for(it = mModel.begin(); it != mModel.end(); it++)
        (*it)->forward(x, x);
}

void Sequential::backward(float& loss, const Eigen::MatrixXf& y, Eigen::MatrixXf& yPred)
{
    // calculate loss
    mLoss.forward(loss, y, yPred);

    // back propagation
    Eigen::MatrixXf lossDerivative;
    mLoss.backward(lossDerivative, y, yPred);
    for(auto it = mModel.rbegin(); it != mModel.rend(); it++)
        (*it)->backward(lossDerivative, lossDerivative);
}

The iterator can be declared manually (in the forward function) but the backward pass introduces the auto keyword which instructs the compiler to automatically infer the it object. The begin()/end() methods of the vector container return an iterator pointing to the first/last element in the vector.

3-5) Static class

Another useful feature is to use the static keyword. In fact, in many circumstances a class can be declared with methods, but a user may only need to use the functions without instantiating the class. The static keyword performs this function. For example, unlike the Linear layer which carries internal variables like weights and bias, the metric class will not need instantiation, only its methods are needed. The accuracy metric can be declared static:

class Metrics
{
    public:
        Metrics() = delete;
        ~Metrics() = delete;
 
         /**
         * accuracy static method
         * 
         * accuracy: count of good predictions / number of predictions
         *
         * @param[out] accuracy accuracy in range [0.f, 1.f]
         * @param[in] labels one-hot encoded labels in format [N, 2]
         * @param[in] features prediction in format [N, 2]
         */
        static void accuracy(float& accuracy, const Eigen::MatrixXf& labels, const Eigen::MatrixXf& features);
};

Now the method accuracy can be used without instantiating the metric class:

DeepLearningFramework::Metrics::accuracy(accuracy, labels, tmpFeatures);

3-6) Templates

The templates can be used to create a generic data structure or function and also for optimization.
“Function template” refers to a type of template used to generate functions by a compiler.
The “template function” is an instance of a function template. As it can be abstract without example, let’s imagine a function returning the minimum of 2 variables. If in our code we need this function for int, uint, float, double, etc, a specific function must be declared for each type:

void getMinimum(int& minValue, int a, int b){ ... }
void getMinimum(uint& minValue, uint a, uint b){ ... }
...
void getMinimum(float& minValue, float a, float b){ ... }

By using a template function, the compiler will automatically generate all the necessary functions:

Template <class T>
void getMinimum(T& minValue, T a, T b){ ... }

Another example will be shown with this project. Indeed, in the trainer class, batches of data must be created to train by mini-batch. These batches can be created with the block() method of the clean matrix:

Eigen::MatrixXf batch = fatures.block<32, 2>(0, 0);

Thanks to the previous line of code, a batch of 32 rows, 2 columns, starting at the index (0,0) of the matrix is ​​created. However, the values ​​ (batch size) must be known at compile time, which is not the case for the index where we want to start the batch. The training function will look like:

for(int i = 0; i < epochsCount; i++)
    {
        float loss = 0.f;
        for (int batch_idx = 0; batch_idx < batchesCount; batch_idx++)
        {
            float batchLoss = 0.f;
            Eigen::MatrixXf batchFeatures = trainFeatures.block<batchSize, 2>(batch_idx*batchSize, 0);
            Eigen::MatrixXf batchTarget = trainTarget.block<batchSize, 2>(batch_idx*batchSize, 0);
            model.forward(batchFeatures);
            model.backward(batchLoss, batchTarget, batchFeatures);
            loss += batchLoss;
        }
        addAccuracy(trainAccuracyHistory, model, trainTarget, trainFeatures);
        addAccuracy(testAccuracyHistory, model, testTarget, testFeatures);
    }

However, the batchSize variable must be known at compile time to be able to declare the size of the block to be instantiated (). This means that it must be a constant expression that must be known by the compiler at compile time but also a value that cannot change. Using a template solves the problem:

template<uint32_t batchSize>
static void trainModel(
    std::vector<float> trainLossHistory,
    ...
    const Eigen::MatrixXf& testFeatures,
);

Once this function has its variable declared with a template, the function should be called the same way we called the block function: with “<var>”. But this variable should be a constexpr, a type explained in the next section.

3-7) Constexpr

Constexpr declares that a value of a function/variable can be evaluated at compile time. It optimizes the code because it will be replaced directly with the value in the code instead of executing the constexpr function or accessing the constexpr variable. In our case, declaring in the code a constexpr batchSize = 32 will force the compiler to replace batchSize by 32 in the code:

constexpr auto batchSize = 32;

// Train model
Trainer2D::trainModel<batchSize>(
    trainLossHistory,
    ...
    testFeatures
);

A function can also be declared as constexpr, for instance:

constexpr float doubleValue(float a){ return a*a;}

int main(){
    ...
    float doubleNeeded = doubleValue(1.f);
    ...
}

In this scenario, the compiler will directly replace doubleNeeded with the value 2.f. This avoids doing the multiplication at the time of the inference.

3-8) Format code with Clang

Writing clean code with unified formatting between developers is essential. Having good code practices in mind when developing is also essential but some tools allow to correct things that can escape us. For this, we will use clang-format as an example.
Its use is very simple:

clang filename

We could also run it on all the .hpp/.cpp files:

for file in $(find -name *.*pp); do clang $file; done

The following screenshot, from this commit, shows how clang modifies a file with a lot of incorrect formatting:

Example of clang-format

4) Debugging

4-1) Static: CPPFLAGS

In the section 2-2, we have described how to compile a program. However, g++ used for this compilation provide automatic checks that can help.

Using the -Wall flag sets almost all g++ warnings. For example, this flag will raise a warning for an incompatible type comparison, undefined behavior of a called destructor, etc. To set other warnings such as unused variables, the -Wextra option must be used.

Many other flags exist, for instance -Weffc++ checks the violation of concepts introduced in the Effective and More Effective C++ book written by Scott Meyers.

These flags can be added in the Makefile written in section 2-2 to the variable CPPFLAGS:

CPPFLAGS = -Wall

4-2) Dynamic

4-2-a) Manual: Unit / Integration / System / Acceptance tests

4-2-a-i) Unit tests

Many types of tests must be performed in programming (in any language) to ensure that the code works perfectly as expected and to avoid any regression later if a developer modifies a code to optimize it for example. These tests are also very useful because they allow you to quickly check that a change does not break any functionality.

Unit testing is a kind of testing that should verify low level functionality like the behavior of a single function. In our case, many unit tests are available under tests/unitTests to ensure that each layer behaves as expected.

For example, to check the behavior of the forward pass of the linear layer, the test should set the weights and bias to a certain value, then declare a specific input matrix and check that the output of the layer matches the expected one:

void linearLayerForwardPassTest() {
  std::cout << "Forward test:" << std::endl;

  int inputFeaturesNumber = 2, outputFeaturesNumber = 4;
  Layers::Linear linearLayer(inputFeaturesNumber, outputFeaturesNumber);

  Eigen::MatrixXf weights{
      {0.5f, 0.1f, -0.5f, 0.1f},
      {0.09f, -0.5f, 0.1f, 0.09f},
  };
  Eigen::MatrixXf bias{{-0.2f, 1.f, 0.f, -0.5f}};

  Eigen::MatrixXf x{
      {-9.f, -5.f},
      {1.f, -3.f},
      {-2.f, 7.f},
  };

  Eigen::MatrixXf target{
      {-4.75f, 0.6f, 4.f, -0.85f},
      {0.43f, 0.6f, -0.8f, 0.33f},
      {-0.17f, -4.7f, 1.7f, 0.93f},
  };

  linearLayer.setWeightsAndBias(weights, bias);

  Eigen::MatrixXf out;
  linearLayer.forward(out, x);

  if (out.rows() != x.rows()) {
    std::cout << "Output rows number KO" << std::endl;
    std::cout << "Expect: " << x.rows() << std::endl;
    std::cout << "Got: " << out.rows() << std::endl;
    return;
  }

  if (out.cols() != outputFeaturesNumber) {
    std::cout << "Output cols number KO" << std::endl;
    std::cout << "Expect: " << outputFeaturesNumber << std::endl;
    std::cout << "Got: " << out.cols() << std::endl;
    return;
  }

  if (!target.isApprox(out)) {
    std::cout << "Result KO" << std::endl;
    std::cout << "Expect: " << target << std::endl;
    std::cout << "Got: " << out << std::endl;
    return;
  }

  std::cout << "OK" << std::endl;
}

The above test first declares the weights, bias, input and expected output. Then it uses the setter method to set the weights and bias. It then calls the forward method to get the output of the layer. The test can now check the dimension of the output and its values. If a test fails, it displays the details for easy debugging. This unit test can be split into several tests, one to check the dimension of the output and one for its values. Each unit test can also work with different weights, biases and input values.

4-2-a-ii) Integration tests

Integration tests evaluate the module as a group. It occurs after unit testing and before system testing. Integration testing takes as its input modules that have been unit tested, groups them in larger aggregates, applies tests defined in an integration test plan to those aggregates, and delivers as its output the integrated system ready for system testing.

From wikipedia

An integration test could be to have several layers in sequence and verify that for a single forward pass with fixed weights ahead, the final model output is correct. This test verifies that the combination of several modules, which have been unit tested, works. This test could also take the form of a unit test of the sequential module.

4-2-a-iii) System tests

System testing takes, as its input, all of the integrated components that have passed integration testing. The purpose of integration testing is to detect any inconsistencies between the units that are integrated together (called assemblages). System testing seeks to detect defects both within the “inter-assemblages” and also within the system as a whole.

From wikipedia

A system test can be performed in our case by building a neural network and training it with data. If the training finishes with good accuracy (within a predefined range), the test is successful:

int main() {
  /* Model creation */
  std::vector<Module *> layers;
  int inputFeaturesNumber = 2, outputFeaturesNumber = 2, hiddenSize = 10;
  layers.emplace_back(
      new Layers::Linear((int)inputFeaturesNumber, (int)hiddenSize));
  layers.emplace_back(new Activations::ReLU());
  layers.emplace_back(new Layers::Linear((int)hiddenSize, (int)hiddenSize));
  layers.emplace_back(new Activations::ReLU());
  layers.emplace_back(
      new Layers::Linear((int)hiddenSize, (int)outputFeaturesNumber));
  layers.emplace_back(new Activations::Softmax());

  Losses::MSE mseLoss;

  Sequential model(layers, mseLoss);

  /* Train params */
  float learningRate = 0.03f;
  // number of train and test samples
  uint32_t samplesCount = 2000;
  std::vector<float> trainLossHistory, trainAccuracyHistory, testLossHistory,
      testAccuracyHistory;
  uint32_t epochsCount = 100, verboseFrequence = 1;
  constexpr auto batchSize = 64;

  // Update learning rate for model
  model.setLR(learningRate);

  /* Generate train and test sets */
  Eigen::MatrixXf trainTarget, trainFeatures;
  DataBuilder2D::generateDiscSet(trainTarget, trainFeatures, samplesCount, 0.3);
  Eigen::MatrixXf testTarget, testFeatures;
  DataBuilder2D::generateDiscSet(testTarget, testFeatures, samplesCount, 0.3);

  // Train model
  Trainer2D::trainModel<batchSize>(trainLossHistory, trainAccuracyHistory,
                                   testLossHistory, testAccuracyHistory, model,
                                   epochsCount, trainTarget, trainFeatures,
                                   testTarget, testFeatures, verboseFrequence);
}

This test could evaluate at the end the variables trainLossHistory, trainAccuracyHistory, testLossHistory, testAccuracyHistory to check the model behavior and its final loss and accuracy.

4-2-a-vi) Acceptance tests

Acceptance testing is a test conducted to determine if the requirements of a specification or contract are met.

From wikipedia

For example, if we got a contract with a company to successfully classify data with at least 98% accuracy. We could run an acceptance test that builds the decided neural network architecture, trains it on the available data, and then calculates its test accuracy. The acceptance test is successful if the model has an accuracy greater than 98%.

4-2-b) Manual: step by step debugging

To debug bad behavior in a program, a useful method is to debug it step by step. This type of debugging allows you to run the code line by line so that you can check the values of any variable at any time, the correct behavior of a state machine, etc.

This debugging can be done, for instance, with Visual Studio Code.

In Visual Studio Code, install the extension for C/C++.

As an example, the step-by-step debugging will be explained on the tests/main.cpp file which implements a neural network and trains it on a data set.

Under Terminal > Run Task > Configure Task, click on the g++ application. This will create a task.json file that will be the file that configures the build task. This json file should call the Makefile:

{
	"version": "2.0.0",
	"tasks": [
		{
			"type": "cppbuild",
			"label": "C/C++: g++-8 build active file",
			"command": "make",
			"args": [
				"-C",
				"${workspaceRoot}",
			],
			"options": {
				"cwd": "${fileDirname}"
			},
			"problemMatcher": [
				"$gcc"
			],
			"group": "build",
			"detail": "compiler: /bin/g++-8"
		}
	]
}

This task.json executes the “command” (make), with the specified arguments (args). Since we might want to run the task.json to debug any program in the project, we need to specify the location of the Makefile. The -C option allows this by specifying the root of the workspace (workspaceRoot).

Then, to debug a program, this program should be compiled with the -g flag: it generates debugging information that is used by gdb-based debuggers. This flag can be added to the Makefile:

CPPFLAGS = -Wall -g

Finally, we can open the tests/main.cpp file and start debugging with the F5 shortcut. This will create a launch.json file that specifies that the currently open file will be used for debugging:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "make - Build and debug active file",
            "type": "cppdbg",
            "request": "launch",
            "program": "${fileDirname}/${fileBasenameNoExtension}",
            "args": [],
            "stopAtEntry": true,
            "cwd": "${fileDirname}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ],
            "preLaunchTask": "C/C++: g++-8 build active file",
            "miDebuggerPath": "/usr/bin/gdb"
        }
    ]
}

With stopAtEntry set to True, the program will automatically stop at the first line of tests/main.cpp:

Step-by-step debugging: stopAtEntry demo

At this point, the program waits for the user’s input. Several options are possible, the program can be executed line by line with the F10 shortcut. A breakpoint can also be specified somewhere in the code, then the F5 shortcut will execute the code until the breakpoint is reached. In our case, a breakpoint will be placed at line 49 just before the train function is executed:

Step-by-step debugging: variable inspection

As shown in the screenshot above, all values/contents of variables can be checked in the debugger. This allows to find wrong behavior in the assignment of variables. The yellow highlighting also allows to see, step by step, which lines are executed to debug erroneous behavior in the code execution.

4-3) Automatic: leak detection and profiling tools

4-3-a) Memory leak

Powerful tools can be used to detect memory leaks in a program. Valgrind is one of them and can be installed under Linux with the following line:

sudo apt install valgrind

Then Valgrind can be launched with a complete leakage analysis:

valgrind --leak-check=full \
--show-leak-kinds=all \
--track-origins=yes \
--verbose \
--log-file=valgrind_log.txt \
./tests/main

This will save all results in the file valgrind_log.txt. A summary is available at the end of the file indicating that no memory leaks were found:

Valgrind: no memory leaks

However, we can create a memory leak in our program by forgetting to delete the six pointers (one per layer) from our model:

Create a model with six layers (pointers)
Disable destructor
Valgrind: 6 memory leaks errors (one per pointer)

This kind of tool can prevent the creation of memory leaks in our programs.

4-3-b) Profiling tools

4-3-b-i) Valgrind and kcachegrind

Valgrind can also be used for profiling and tools can be used to facilitate analysis by the developer. A good combination can be the use of Valgrind with kcachegrind:

sudo apt install kcachegrind

Then, the following line will do the analysis:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes tests/main

Valgrind créera un dossier callgrind avec toutes les informations.
kcachegrind sera ensuite utilisé via une interface graphique pour analyser les résultats:

kcachegrind GUI example
4-3-b-ii) Perf and flamegraph

A very useful feature for optimizing a program is to know how long each function takes in executing the code. For example, in our training loop, we use Sequential::forward() to infer the model, Sequential::backward() to update its weights, and Trainer2D::addAccuracy() to calculate our metric. One may wonder how important, in terms of execution time, each function can be in the training function.

The tools perf and FlameGraph will allow us to know it:

sudo apt install linux-perf-4.19
git clone https://github.com/brendangregg/FlameGraph

Perf is a tool that listens to a process and looks at where the code is executed during a specific event. This tool is “probabilistic” because it will count each time it sees the code under each function. If we run it long enough (a few seconds for a small project), it will have a very good representation of the execution time of each function.

To run Perf, we must first run the tests/main.cpp program with a lot of epoch (to have a long execution time. Then, Perf will find the process “main” running:

perf record -a -F 99 -g -p $(pgrep -x main)

After a few seconds, we can ctrl+c the perf command, then a perf.data should be present. The next step is to parse the results:

perf script > perf.script

Finally, FlameGraph will create a .svg file to display the results:

FlameGraph/stackcollapse-perf.pl perf.script | FlameGraph/flamegraph.pl > flamegraph.svg

This create the following flame graph:

Flame graph example

This flame graph should be read from bottom to top. In particular, you can see under the cyan rectangle the “main” function. Next, the rectangle shows the DeepLearningFramework::Trainer2D()::trainModel() function with a line running from the far left to the far right. This means that the trainModel() function takes almost 100% of the execution time of the main function. Then we can see above the trainModel() function that Sequential::backward() takes ~20%, Sequential::forward() ~26%, finally Trainer2D::addAccuracy() ~54% of the Trainer2D::trainModel() function. This result is understandable because the addAccuracy() function is called twice per epoch on the training and test sets.

This type of visualization can be very useful in determining which part of a program should be optimized first.

Conclusion

Through this project, we learned how to implement a simple Deep Learning framework that can learn to classify data. More importantly, we also learned how to implement a project in C++, its main key concepts such as abstract class/interface and inheritance, smart-pointers, iterator, constexpress, templates, containers and static class. We saw how to compile a project with a Makefile and how to debug it with tests, step-by-step debugging, memory leak detection and profiling.


You can find the project at the following link:

https://github.com/Apiquet/DeepLearningFrameworkFromScratchCpp