C++ Perfect Singleton Pattern

Original Singleton Pattern

The singleton pattern needs to do the following:

Cannot be constructed through a constructor, otherwise multiple instances could be created. The constructor needs to be declared private.
Ensure only one instance can be produced.

Here is a simple implementation:

class Singleton
{
  private:
    static Singleton *local_instance;
    Singleton(){};

  public:
    static Singleton *getInstance()
    {
        if (local_instance == nullptr)
        {
            local_instance = new Singleton();
        }
        return local_instance;
    }
};

Singleton * Singleton::local_instance = nullptr;

int main()
{
    Singleton * s = Singleton::getInstance();
    return 0;
}

Using Local Static Objects to Solve Two Existing Problems

There are two problems in the code above. One is that in a multi-threaded situation, the new operation might be executed twice. The other is that the destructor is not called when the program exits. The following solution uses a static object to solve these problems.

class Singleton
{
  private:
    static Singleton *local_instance;
    Singleton(){
        cout << "Constructor" << endl;
    };
    ~Singleton(){
        cout << "Destructor" << endl;
    }

  public:
    static Singleton *getInstance()
    {
        static Singleton locla_s;
        return &locla_s;
    }
};


int main()
{
    cout << "Before first access to singleton" << endl;
    Singleton * s = Singleton::getInstance();
    cout << "After first access to singleton" << endl;
    cout << "Before second access to singleton" << endl;
    Singleton * s2 = Singleton::getInstance();
    cout << "After second access to singleton" << endl;
    return 0;
}

Output results

This code might cause multiple constructor calls in versions before C++11, so it can only be used with newer compilers.

If Using Pre-C++11 Versions, Static Objects Will Not Be Thread-Safe

The version below uses mutex and a static member to destruct the singleton. The disadvantage of this approach is that the lock causes slow speed and low efficiency. But at least it’s correct and can be used in versions before C++11. Sample code is as follows:

class Singleton
{
  private:
    static Singleton *local_instance;
    static pthread_mutex_t mutex;
    Singleton(){
        cout << "Constructor" << endl;
    };
    ~Singleton(){
        cout << "Destructor" << endl;
    }
    class rememberFree{
        public:
        rememberFree(){
            cout << "Member constructor" << endl;
        }
        ~rememberFree(){
            if(Singleton::local_instance != nullptr){
                delete Singleton::local_instance;
            }
        }
    };
    static rememberFree remember;

  public:
    static Singleton *getInstance()
    {
        pthread_mutex_lock(&mutex);
        if (local_instance == nullptr)
        {
            local_instance = new Singleton();
        }
        pthread_mutex_unlock(&mutex);
        return local_instance;
    }
};

Singleton * Singleton::local_instance = nullptr;
pthread_mutex_t Singleton::mutex = PTHREAD_MUTEX_INITIALIZER;
Singleton::rememberFree Singleton::remember;

Double-Checked Locking Causes Uninitialized Memory Access

Using the following code to implement direct return of an already initialized object will greatly improve the performance of the above code. But the same code has obvious problems in Java, where CPU out-of-order execution may lead to accessing a reference to an uninitialized object. Does C++ have the same problem? See the following article: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf The conclusion is the same - C++ has the same issue and may lead to undefined behavior causing segmentation faults. An example of double-checked locking code is as follows:

static Singleton *getInstance()
    {
        if(local_instance == nullptr){
            pthread_mutex_lock(&mutex);
            if (local_instance == nullptr)
            {
                local_instance = new Singleton();
            }
            pthread_mutex_unlock(&mutex);
        }
        return local_instance;
    }

If thread A enters the lock and allocates space for the object, but due to possible instruction reordering, local_instance is actually pointed to a block of unallocated memory first, and then initialization occurs on this memory block. But after pointing and before initialization, another thread B might get this pointer through getInstance.

Attempting to Use Local Variables Cannot Guarantee Instruction Execution Order

When trying to use temporary variables to force the order of instruction execution, they may still be considered useless variables by the compiler and optimized away. The following code is a good idea but fails to achieve its purpose:

        if(local_instance == nullptr){
            static mutex mtx;
            lock_guard<mutex> lock(mtx);
            if (local_instance == nullptr)
            {
                auto tmp = new Singleton()
                local_instance = tmp;            
            }
        }
        return local_instance;

Inelegant Use of volatile to Solve Instruction Reordering Problems in Double-Checked Locking

Trying to use volatile to declare the internal pointer, the code is as follows:

class Singleton
{
  private:
    static Singleton * volatile local_instance;
    Singleton(){
        cout << "Constructor" << endl;
    };
    ~Singleton(){
        cout << "Destructor" << endl;
    }
    class rememberFree{
        public:
        rememberFree(){
            cout << "Member constructor" << endl;
        }
        ~rememberFree(){
            if(Singleton::local_instance != nullptr){
                delete Singleton::local_instance;
            }
        }
    };
    static rememberFree remember;
    

  public:
    static Singleton *getInstance()
    {
        if(local_instance == nullptr){
            static mutex mtx;
            lock_guard<mutex> lock(mtx);
            if (local_instance == nullptr)
            {
                auto tmp = new Singleton();
                local_instance = tmp;
            }
        }
        return local_instance;
    }
};

Singleton * volatile Singleton::local_instance = nullptr;
Singleton::rememberFree Singleton::remember;

int main()
{
    cout << "Before first access to singleton" << endl;
    Singleton * s = Singleton::getInstance();
    cout << "After first access to singleton" << endl;
    cout << "Before second access to singleton" << endl;
    Singleton * s2 = Singleton::getInstance();
    cout << "After second access to singleton" << endl;
    return 0;
}

In this code, although temp is volatile, *temp is not, nor are its members. So it may still be optimized. Try to declare *temp as volatile as well, and you’ll find that your code is full of volatile. But at least it’s correct:

class Singleton
{
  private:
    static volatile Singleton * volatile local_instance;
    Singleton(){
        cout << "Constructor" << endl;
    };
    ~Singleton(){
        cout << "Destructor" << endl;
    }
    class rememberFree{
        public:
        rememberFree(){
            cout << "Member constructor" << endl;
        }
        ~rememberFree(){
            if(Singleton::local_instance != nullptr){
                delete Singleton::local_instance;
            }
        }
    };
    static rememberFree remember;
    

  public:
    static volatile Singleton *getInstance()
    {
        if(local_instance == nullptr){
            static mutex mtx;
            lock_guard<mutex> lock(mtx);
            if (local_instance == nullptr)
            {
                auto tmp = new Singleton();
                local_instance = tmp;
            }
        }
        return local_instance;
    }
};

volatile Singleton * volatile Singleton::local_instance = nullptr;
Singleton::rememberFree Singleton::remember;

int main()
{
    cout << "Before first access to singleton" << endl;
    volatile Singleton * s = Singleton::getInstance();
    cout << "After first access to singleton" << endl;
    cout << "Before second access to singleton" << endl;
    volatile Singleton * s2 = Singleton::getInstance();
    cout << "After second access to singleton" << endl;
    return 0;
}

The Ultimate Weapon — Memory Barrier

In the new standard, the atomic class implements memory barriers, making memory access controllable across multiple cores. This utilizes the controllable memory access order in C++11. Here is the code implementation:

class Singleton
{
  private:
    // static volatile Singleton * volatile local_instance;
    static atomic<Singleton*> instance;
    Singleton(){
        cout << "Constructor" << endl;
    };
    ~Singleton(){
        cout << "Destructor" << endl;
    }
    class rememberFree{
        public:
        rememberFree(){
            cout << "Member constructor" << endl;
        }
        ~rememberFree(){
            Singleton* local_instance = instance.load(std::memory_order_relaxed);
            if(local_instance != nullptr){
                delete local_instance;
            }
        }
    };
    static rememberFree remember;
    

  public:
    static Singleton *getInstance()
    {
        Singleton* tmp = instance.load(std::memory_order_relaxed);
        atomic_thread_fence(memory_order_acquire);
        if(tmp == nullptr){
            static mutex mtx;
            lock_guard<mutex> lock(mtx);
            tmp = instance.load(memory_order_relaxed);
            if (tmp == nullptr)
            {
                tmp = new Singleton();
                atomic_thread_fence(memory_order_release);
                instance.store(tmp, memory_order_relaxed);
            }
        }
        return tmp;
    }
};

atomic<Singleton*> Singleton::instance;
Singleton::rememberFree Singleton::remember;

int main()
{
    cout << "Before first access to singleton" << endl;
    Singleton * s = Singleton::getInstance();
    cout << "After first access to singleton" << endl;
    cout << "Before second access to singleton" << endl;
    Singleton * s2 = Singleton::getInstance();
    cout << "After second access to singleton" << endl;
    return 0;
}

The above code may be difficult to read. The two loads of instance can be executed out of order. But changes during this period cannot be observed by other CPU cores. In the Muduo book, memory barriers are also rated as the ultimate weapon.

Using Atomic Operation Memory Order

There are six memory sequence options that can be applied to operations on atomic types: memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, and memory_order_seq_cst. Unless you specify a sequence option for a specific operation, the memory sequence option for all atomic types defaults to memory_order_seq_cst. Although there are six options, they only represent three memory models: sequentially consistent, acquire-release sequence (memory_order_consume, memory_order_acquire, memory_order_release and memory_order_acq_rel), and relaxed sequence (memory_order_relaxed).

The models that can be adopted here are: the default memory_order_seq_cst, which is sequentially consistent, and memory_order_acquire, memory_order_release, which is the acquire-release sequence. The latter may perform better.

To be improved

Using pthread_once or call_once

The former comes from the pthread library. The latter comes from std::atomic.

To be improved