StackExchange.Redis Timeout performing timeout problem

Keywords: C# Database Redis

In a recent project, I used. net core 2.1, and then Redis for caching. The cache related package was written by colleagues, and the driver used was StackExchange.Redis version 2.0.571. I've always heard that there was a TimeOut bug when the driver was concurrent. After the project was almost developed, I tested it and simply simulated 30 users' continuous access to a query interface useful to the cache, As a result, the timeout exception occurred under such a small pressure:

Timeout performing GET my_141 (5000ms), inst: 30, qu: 0, qs: 20, in: 20320, serverEndpoint: 172.16.3.119:6379, mgr: 10 of 10 available, clientName: s-119, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=120,Free=32747,Min=1,Max=32767), v: 2.0.571.20511(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts))

Followed by stack information

The egg hurt for a long time. I searched many articles and got the following results

Solution

1. Replace the driver and do not use it (you can see the recommendation of. net core redis driver. Why not use StackExchange.Redis)

2. The redis operation is modified to be all asynchronous&& ThreadPool.SetMinThreads(200, 200);

The second one I used solves the problem. I may encounter a pit when I change the drive; There is also the problem of time cost;

Reason analysis

We see the following paragraph in the above exception information:

IOCP: (Busy=0,Free=1000,Min=1,Max=1000),

WORKER: (Busy=120,Free=32747,Min=1,Max=32767),

It means that there are currently 120 busy WORKER threads, and the system "needs to create the minimum number of new WORKER threads according to the needs of the thread pool". That is, the number of WORKER threads created by the system is not enough to meet the needs of the busy threads of redis's Get operation, resulting in some Get operation threads blocking and timeout;

Therefore, we modify the "minimum thread worker threads" to 200 to solve the problem;

200 is set by the production environment server. Unreasonable setting of this value may lead to performance problems;

StackExchange.Redis timeout

Recently, the company has a project. In the case of a large number of requests, there are a large number of error logs about redis timeout:

Timeout performing SET XXX, inst: 27, mgr: ProcessReadQueue, err: never, queue: 3, qu: 0, qs: 3, qc: 0, wr: 0, wq: 0, in: 15, ar: 1, clientName: XXX, serverEndpoint: 192.168.x.x:6379, keyHashSlot: 944, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=4091,Min=4,Max=4095) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)

Find a solution online. Some people say that the memory set on the redis server is not enough, the client connection timeout is set longer, and the synchronization time is set longer. In the end, it didn't work, but it was found that it was a client problem, and there was no problem with the server. Then continue to find it on the Internet

StackExchange.Redis has a timeout bug. Finally, csredis was found to solve the problem. Replaced stackexchange. Redis.

Timeout of DevOps StackExchange connection to Redis

Colleagues reported that a large number of Redis connection Timeout errors occurred, because Session and many config cache data are stored on Redis. If Redis is abnormal, it will seriously affect the online service, so it is necessary to debug immediately

First, use Redis cli to confirm that the service is still running normally. Then, execute Redis Benchmark to check the server response. No exception is found in the data. The Redis data can also be obtained normally by connecting with Redis Desktop Manager. It is speculated that the Redis server itself should be normal

After confirming the usage situation of colleagues, it is found that Redis timeout is not comprehensive, and only accessing several specific keys will cause problems. After careful inspection, it is found that the key causing timeout error has the feature of large size. It is speculated that it may be caused by too much data

Error message

  • Error message
2017-06-28 18:04:56,025 [17] [ERROR] [RedisObjectStore`1] 
Inner exception number - 0
Timeout performing HGETALL faqscache:******.portal.bll.appdatamanager+cachedata, inst: 1, queue: 18, qu: 0, qs: 18, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName: TestAPP01, serverEndpoint: 127.0.0.1:8188, keyHashSlot: 3414, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=6,Free=8185,Min=2,Max=8191) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)
    at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
    at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
    at StackExchange.Redis.RedisDatabase.HashGetAll(RedisKey key, CommandFlags flags)
    at ******.Portal.BLL.Redis.RedisObjectStore`1.Get(String key)
Main exception & its inner exception Log end

Solution

Modify the data synchronization timeout setting - relax the syncTimeout time (default 1000 milliseconds)

  1. Original redis connection information
public static class RedisConnectionFactory
{
    
    private static readonly Lazy<ConnectionMultiplexer> Connection;
    public static IServer RedisServer;
        
    static RedisConnectionFactory()
    {
        var connectionString = "127.0.0.1:6379,127.0.0.1:6380,password=password";
        var options = ConfigurationOptions.Parse(connectionString);
        Connection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
        RedisServer = GetConnection.GetServer(options.EndPoints.First());
    }
    public static ConnectionMultiplexer GetConnection => Connection.Value;
    public static IDatabase RedisDB => GetConnection.GetDatabase();
}
  1. After modification
public static class RedisConnectionFactory
{
    private static readonly Lazy<ConnectionMultiplexer> Connection;
    public static IServer RedisServer;
    static RedisConnectionFactory()
    {
        var connectionString = "127.0.0.1:6379,127.0.0.1:6380,password=password,syncTimeout =3000";
        var options = ConfigurationOptions.Parse(connectionString);
        Connection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
        RedisServer = GetConnection.GetServer(options.EndPoints.First());
    }
    public static ConnectionMultiplexer GetConnection => Connection.Value;
    public static IDatabase RedisDB => GetConnection.GetDatabase();
}

Experience

After relaxing the data synchronization time, the timeout problem has indeed been solved, but I still suggest colleagues to reduce the amount of data in redis from the source, which can reduce the io transmitted by the network and increase the response speed of redis

StackExchange.Redis performance tuning

We often have the problem of timeout of synchronous call to Redis, but we find very few errors after changing to asynchronous, but we may find that the Redis command is very slow through pre and post recording logs.

PS: later codes will run in Windows bash. StackExchange.Redis version is 1.2.6

First, quickly reproduce and solve the problem. Let's run the following code first

public static async Task Main(string[] args)
{
    ThreadPool.SetMinThreads(8, 8);
    using (var connection = await ConnectionMultiplexer.ConnectAsync("localhost"))
    {
        connection.PreserveAsyncOrder = false;

        var db = connection.GetDatabase(0);
        var sw = Stopwatch.StartNew();

        await Task.WhenAll(Enumerable.Range(0, 10)
            .Select(_ => Task.Run(() =>
            {
                db.StringGet("aaa");

                Thread.Sleep(1000);
            })));

        Console.WriteLine(sw.ElapsedMilliseconds);
    }
}

Running discovery throws StackExchange.Redis.RedisTimeoutException. Why? The reason is that the current worker thread is not enough, and the synchronization waiting has timed out. See the source code for details

What if you change the above ThreadPool.SetMinThreads(8, 8) to ThreadPool.SetMinThreads(100, 100)? Don't throw exceptions.

Let's talk about the slow down of asynchronous interfaces. Let's run the following code first:

public static async Task Main(string[] args)
{
    var tcs = new TaskCompletionSource<bool>();
    var sw = Stopwatch.StartNew();

    Console.WriteLine($"Main1: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");

    var task = Task.Run(() =>
    {
        Thread.Sleep(10);
        Console.WriteLine($"Run1: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
        tcs.TrySetResult(true);
        Console.WriteLine($"Run2: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
        Thread.Sleep(10000);
    });

    var a = tcs.Task.ContinueWith(_ => { Console.WriteLine($"a: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });
    var b = tcs.Task.ContinueWith(_ => { Console.WriteLine($"b: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });
    var c = tcs.Task.ContinueWith(_ => { Console.WriteLine($"c: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });

    await tcs.Task;
    Console.WriteLine($"Main2: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
    Thread.Sleep(100);
    await Task.Delay(10);
    Console.WriteLine($"Main3: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
}

The final output shows that Run1 and Main2 use the same thread, and the elapsed milliseconds of Run2 basically adds 100 to Run1.

Then go back to calling Redis code

static async Task Main(string[] args)
{
   ThreadPool.SetMinThreads(100, 100);
    using (var connection = await ConnectionMultiplexer.ConnectAsync("localhost"))
    {
        var db = connection.GetDatabase(0);
        var sw = Stopwatch.StartNew();

        await Task.WhenAll(Enumerable.Range(0, 10)
            .Select(_ => Task.Run(async () =>
            {
                await db.StringGetAsync("aaa");

                Thread.Sleep(100);
            })));

        Console.WriteLine(sw.ElapsedMilliseconds);
    }
}

Do you find that the output is more than 100 or more than 1000? Why? It turns out that there is a special setting in the sdk to protect the order of asynchronous code execution. Then we add a code connection.PreserveAsyncOrder = false before the GetDatabase line;

Then run it again to see what the result is? Through the above code, we can basically determine the relationship between asynchronous slowness and TaskCompletionSource. Please refer to the source code of sdk for details.

To sum up the above two points, most problems can be solved simply by SetMinThreads and connection. Preservesyncorder = false, but how to find more deep-seated problems?

The two artifact StackExchange.Redis, ConnectionCounters and IProfiler, are introduced below

  1. After the object obtained through connection.GetCounters().Interactive, three properties are very useful
public class ConnectionCounters
{
    /// <summary>
    /// Operations that have been requested, but which have not yet been sent to the server
    /// </summary>
    public int PendingUnsentItems { get; }

    /// <summary>
    /// Operations that have been sent to the server, but which are awaiting a response
    /// </summary>
    public int SentItemsAwaitingResponse { get; }

    /// <summary>
    /// Operations for which the response has been processed, but which are awaiting asynchronous completion
    /// </summary>
    public int ResponsesAwaitingAsyncCompletion { get; }
}
  1. Each attribute represents the current status of the pending commands of the current redis connection. You can know from the literal meaning that PendingUnsentItems indicates that the command has been sent and the queue to be sent has not been sent; SentItemsAwaitingResponse indicates the command that has been sent but has not received the response result; Responseawatingasynccompletion indicates that the response command has been received, but the command of taskcompletionsource < T > (). Trysetresult() has not been called.
    The reason why PendingUnsentItems and sentitemsaawaitingresponse are too large is basically because the network is blocked. You need to check whether the network bandwidth or redis value is large.
    Responseawatingasynccompletion is because the code after await, such as the code in the above example, takes a long synchronization time. It is necessary to optimize the code and set preservesyncorder to false.

  2. ConnectionCounters analyzes the instantaneous state of a thread, while IProfiler can track the total number of redis commands executed by a request and how long they have been used respectively. Please write code for details. Reference documents

If you find a problem, you need to solve it, and you need to learn deeply to solve it. I don't like writing articles, but I found several articles about redis timeout recently. Finally, I still want to share my experience of stepping on the pit with you.

A good news in this is that StackExchange.Redis 2.0 has reconstructed asynchronous queues from and used pipes to solve the problem of slow asynchrony

StackExchange.Redis operation Redis link string configuration (linking Redis clusters)

Redis link string can be put into Config file:

  <connectionStrings>
    <add name="Connection_Redis" connectionString="127.0.0.1:6379,127.0.0.1:6380,127.0.0.1:6381,127.0.0.1:6382,password=123456,abortConnect=false" />
  </connectionStrings>

When there are multiple Redis instances, you can set to link multiple instances, separated by commas (for example, using Redis clusters).

public class RedisCacheHelper
    {
        private readonly Logger _log = LogManager.GetCurrentClassLogger();

        /// <summary>
        ///Connection string
        /// </summary>
        private static string _connectionString;

        /// <summary>
        ///redis connection object
        /// </summary>
        private static IConnectionMultiplexer _connMultiplexer;

        /// <summary>
        ///Instantiate object
        /// </summary>
        private static RedisCacheHelper _cacheHelper;
        /// <summary>
        ///Instance
        /// </summary>
        private static RedisCacheHelper _instance;

        /// <summary>
        ///Lock
        /// </summary>
        private static readonly object Locker = new object();

        /// <summary>
        ///Database
        /// </summary>
        private IDatabase _db;


        /// <summary>
        ///Default link instance
        /// </summary>
        private RedisCacheHelper()
        {
            _connectionString = ConfigurationManager.ConnectionStrings["Connection_Redis"].ConnectionString;
            _connMultiplexer = ConnectionMultiplexer.Connect(_connectionString);
            //Add registration event
            AddRegisterEvent();
        }

        /// <summary>
        ///Get Redis connection object
        /// </summary>
        private IConnectionMultiplexer Connnection
        {
            get
            {
                if (_connMultiplexer == null || !_connMultiplexer.IsConnected)
                {
                    lock (Locker)
                    {
                        if (_connMultiplexer == null || !_connMultiplexer.IsConnected)
                        {
                            _connMultiplexer = ConnectionMultiplexer.Connect(_connectionString);
                        }
                    }
                }
                return _connMultiplexer;
            }
        }

        /// <summary>
        ///Gets the specified db, which is not specified by default
        /// </summary>
        /// <param name="db"></param>
        /// <returns></returns>
        private IDatabase GetDatabase(int db = -1)
        {
            return Connnection.GetDatabase(db);
        }

        /// <summary>
        ///Call the instance through which Redis is called
        /// </summary>
        public static RedisCacheHelper Instance
        {
            get
            {
                if (_cacheHelper != null) return _cacheHelper;
                lock (Locker)
                {
                    if (_cacheHelper != null) return _cacheHelper;
                    _cacheHelper = new RedisCacheHelper();
                }
                return _cacheHelper;
            }
        }

        #region registration event

        /// <summary>
        ///Add registration event
        /// </summary>
        private void AddRegisterEvent()
        {
            _connMultiplexer.ConnectionRestored += ConnMultiplexer_ConnectionRestored;
            _connMultiplexer.ConnectionFailed += ConnMultiplexer_ConnectionFailed;
            _connMultiplexer.ErrorMessage += ConnMultiplexer_ErrorMessage;
            _connMultiplexer.ConfigurationChanged += ConnMultiplexer_ConfigurationChanged;
            _connMultiplexer.HashSlotMoved += ConnMultiplexer_HashSlotMoved;
            _connMultiplexer.InternalError += ConnMultiplexer_InternalError;
            _connMultiplexer.ConfigurationChangedBroadcast += ConnMultiplexer_ConfigurationChangedBroadcast;
        }

        /// <summary>
        ///When reconfiguring broadcast (usually means master-slave synchronization changes)
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_ConfigurationChangedBroadcast(object sender, EndPointEventArgs e)
        {
            _log.Info($"{nameof(ConnMultiplexer_ConfigurationChangedBroadcast)}: {e.EndPoint}");
        }

        /// <summary>
        ///When an internal error occurs (mainly for debugging)
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_InternalError(object sender, InternalErrorEventArgs e)
        {
            _log.Error($"{nameof(ConnMultiplexer_InternalError)}: {e.Exception}");
        }

        /// <summary>
        ///When changing the cluster
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_HashSlotMoved(object sender, HashSlotMovedEventArgs e)
        {
            _log.Info(
                $"{nameof(ConnMultiplexer_HashSlotMoved)}: {nameof(e.OldEndPoint)}-{e.OldEndPoint} To {nameof(e.NewEndPoint)}-{e.NewEndPoint}");
        }

        /// <summary>
        ///When configuration changes
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_ConfigurationChanged(object sender, EndPointEventArgs e)
        {
            _log.Info($"{nameof(ConnMultiplexer_ConfigurationChanged)}: {e.EndPoint}");
        }

        /// <summary>
        ///When an error occurs
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_ErrorMessage(object sender, RedisErrorEventArgs e)
        {
            _log.Error($"{nameof(ConnMultiplexer_ErrorMessage)}: {e.Message}");
        }

        /// <summary>
        ///When physical connection fails
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_ConnectionFailed(object sender, ConnectionFailedEventArgs e)
        {
            _log.Fatal($"{nameof(ConnMultiplexer_ConnectionFailed)}: {e.Exception}");
        }

        /// <summary>
        ///When establishing a physical connection
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void ConnMultiplexer_ConnectionRestored(object sender, ConnectionFailedEventArgs e)
        {
            _log.Info($"{nameof(ConnMultiplexer_ConnectionRestored)}: {e.Exception}");
        }

        #endregion
    }

The above Redis links are configured and used as follows: first, start the master-slave services in the cluster, 3 master and 3 slave

Then, perform operation verification in the code, insert a piece of data into the cache, and then cycle to read and write data. During the cycle, manually shut down 1 ~ 2 servers in the Redis service

while (true)
            {
                //Set age
                RedisCacheHelper.Instance.Set("age", "10", new TimeSpan(0, 5, 0));

                //Get age
                var getage = RedisCacheHelper.Instance.Get("age");
                Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff") + ": " + getage);

                Thread.Sleep(1000); // Wait for 1s
            }

The cache is successfully stored in the server. The operation effect is as follows:

At this time, check the Redis cluster and find that the master and slave instances have data. Next, we turn off any Redis instance (do not close the circular code program running above)

Found that the program is running normally:

However, I have tested the above steps for many times. When closing an instance arbitrarily, I occasionally report the following error: the cluster hangs

We comment out the written methods and only keep the read code. Exceptions will also be reported occasionally.

However, in most cases, if one of the instances is closed, the program will run normally:

Use StackExchange.Redis client to troubleshoot Timeout exceptions in Redis access

Problem generation

During the use of redis in these two days, when the number of parallel clients reached 200 +, a large number of timeout exceptions were generated. The typical exception information is as follows:

Timeout performing HVALS Parser2#Hash#VersionState, inst: 1, mgr: ExecuteSelect, err: never, queue: 2, qu: 0, qs: 2, qc: 0, wr: 0, wq:> 0, in: 0, ar: 0, clientName: GS-SERVER-2894, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=0,Free=32767,Min=8,Max=32767), Local-CPU: 0% (Please take a look> at this article for some common client-side issues that can cause timeouts: https://github.com/StackExchange/StackExchange.Redis/tree/master/Docs/Timeouts.md)

No connection is available to service this operation: HVALS Parser2#Hash#VersionState

SocketFailure on HVALS

Operating environment:

Redis server version: 2.8.19
. net Framework version: 4.5.2
StackExchange.Redis version: 1.1.603

problem analysis

First, locate the problem, eliminate the problem of redis basic components, and use the benchmark tool provided by redis to test:

redis-benchmark -h 10.200.XX.XX -p 30301 -q -n 100000 -c 300 -d 10240

Because redis uses docker to provide services, the port is mapped to 30301, simulating 100000 real scene requests, 300 concurrent requests, and the data size of each request is 10K.

After testing, the TPS for common short-time operations such as hset, hget, Sadd and spop is kept between 25K and 30K, so the problem of redis service provided by docker is preliminarily eliminated.

Therefore, the problem is roughly located in the business system code, that is, the redis client tool (StackExchange.Redis).

The common causes of timeout exceptions include (but are not limited to):

1: Insufficient server resources;

2: Instructions that take too long (the read-only property of StackExchange client connection TimeoutMilliseconds=1000);

3: StackExchange has too many blocking tasks (the qs value in the exception information continues to grow). In this case, StackExchange gives suggestions to implement a connection multiplexer pool and dynamically select the multiplexer with the lowest load to connect, so as to avoid the blocking of all links caused by a link timeout;

4: The minimum number of threads created on demand in the Thread Pool in the CLR is too small, resulting in the waiting cost (500ms). It can be seen that the ConnectionMultiplexer of stackeexchange uses the Thread Pool;

In addition, in the topics related to Time out in the Issue of stackeexchange.redis project, some people also mentioned that the exception no longer occurs after updating the client version of stackeexchange.redis (1.1.605 +).

Problem solving

According to the above possible causes of problems, the following modifications are made to the business code:

1: No longer use the master / slave thread pool encapsulated in StackExchange.Redis, and specify the master / slave ConnectionMultiplexer object;

2: Modify the minimum threshold number of threads created in the Thread Pool of the CLR;

3: Add the judgment of its IsConnected attribute to the line of sight of the ConnectionMultiplexer singleton, and manually release resources and reconnect after its connection is disconnected;

4: Avoid instructions such as HVALS that may cause slow operation;

5: Upgrade StackExchange.Redis to 1.1.605 + (the latest version is 1.1.608).

Test after modification. The time out exception has been eliminated.

The StackExchange.Redis client (V1.1.608) has been tested to have such a characteristic, that is, the IDatabase object created by the complexer singleton object will cause the complexer singleton object to time out after a Timeout exception is generated, that is, the new IDatabase object created by it will continue to time out, which is quite painful...

During actual use, due to various network reasons, Timeout exceptions are inevitable, but it is really troublesome that they cannot be recovered. The test complexer object cannot be disposed. Rebuild it after closing..

Posted by RobOgden on Thu, 07 Oct 2021 01:12:31 -0700