Evaluation of million level data migration scheme

Keywords: MySQL Database SQL Windows

preface

Recently, when the company used ABP to reconstruct the old project, the database was also switched from SQL SERVER to MySql. Tucao, the products used before make complaints about Windows Server 2008, SqlServer 2008R2,.Net Framework 4.5, and now embrace.net core. Back to the point. At present, there are 10w + and 100w + data in a single table. We will test them later. Database switching, as well as changes in the database table structure, can not avoid the need for data migration. There are not many migration schemes. Here are two schemes I try to use for testing.

Multithreaded bulk write

private static async Task BatchInsertTestUsers(List<TestUser> testUsers)
        {
            var prefix =
                "INSERT INTO users (Id,Name,Age) VALUES";
            using (IDbConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr))
            {
                var sqlText = new StringBuilder();
                sqlText.Append(prefix);

                foreach (var testUser in testUsers)
                {
                    sqlText.AppendFormat(
                        $"({testUser.Id},'{testUser.Name}', {testUser.Age}),");
                }

                var insertSql = sqlText.ToString().Substring(0, sqlText.ToString().LastIndexOf(','));
                await conn.ExecuteAsync(insertSql);
            }
        }
  • BatchInsertTestUsers splices the incoming collection into SQL and executes it.
public static Task RunMultiTasks(List<TestUser> users)
        {
            var tasks = new List<Task>();
            var pageSize = 10000;
            var writeCount = (users.Count() / pageSize) + 2;

            for (var i = 1; i < writeCount; i++)
            {
                var skipCount = (i - 1) * pageSize;
                var batchInsertList = users.Skip(skipCount).Take(pageSize).ToList();

                var task = Task.Run(() => { BatchInsertTestUsers(batchInsertList); });
                tasks.Add(task);
            }

            var sw = new Stopwatch();
            sw.Start();
            Task.WaitAll(tasks.ToArray());
            sw.Stop();
            Console.WriteLine($"Multi thread batch insert time:{sw.ElapsedMilliseconds} ms");

            return Task.FromResult(0);
        }
  • RunMultiTasks will batch the data and insert 1w items at a time.

MySQL bulkloader scheme

I learned that MySQL bulkloader is due to the Sqlbulkcopy of SqlServer. MySQL bulkloader does not support the import of collections. You need to export the data to. csv format first, and then read the. csv data import.

public static async Task Export(string filePath, List<TestUser> items)
        {
            IExporter exporter = new CsvExporter();
            await exporter.Export(filePath, items);
        }
  • Here, the data is exported using the open source of Chinese people dotnetcore/Magicodes.IE I this export code, should understand it! Simple operation!!!
public static void Load(string filePath, string tableName)
        {
            using MySqlConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr);
            var bulk = new MySqlBulkLoader(conn)
            {
                FieldTerminator = ",",
                FieldQuotationCharacter = '"',
                EscapeCharacter = '"',
                LineTerminator = "\r\n",
                FileName = filePath,
                Local = true,
                NumberOfLinesToSkip = 1,
                TableName = tableName,
                CharacterSet = "utf8mb4",
            };

            bulk.Load();
        }
  • Here, because the database is not on its own machine, Local = true is set to read local files and import them.

Test instructions

  • This test was conducted locally. The database runs in a Docker container on a machine deployed on the intranet, and uses a mechanical hard disk. If you use SSD hard disk, the effect will be better.
  • The test here is mainly to insert simple user data, which is defined as follows:
public class TestUser
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public int Age { get; set; }
    }
  • Test the performance of 1w, 10w,100w data insertion, and the impact of index on and off
  • The test execution code is as follows:
class Program
    {
        static async Task Main(string[] args)
        {
            var testData = DataGen.Run(100 * 10000);
            await RunMultiTasks(testData);
            await RunMySqlLoaderTask(testData);
        }

        public static async Task RunMultiTasks(List<TestUser> users)
        {
            await DataMigrateTask.RunMultiTasks(users);
        }

        public static async Task RunMySqlLoaderTask(List<TestUser> users)
        {
            var fileName = "users";
            var filePath = Directory.GetCurrentDirectory() + "\\" + fileName + ".csv";
            await DataMigrateTask.Export(filePath, users);
            var sw = new Stopwatch();
            sw.Start();
            DataMigrateTask.Load(filePath, "users");
            sw.Stop();
            Console.WriteLine($"MySqlBulkLoader Duration:{sw.ElapsedMilliseconds} ms");
        }
    }

test result

That's the point.

programme 1w 10w 100w
RunMultiTasks 367ms 3548ms 91263ms
RunMySqlLoaderTask 2031ms 1597ms 13105ms
RunMultiTasks (close index) 233ms 3230ms 67040ms
Runmysql loadertask (close index) 1785ms 1367ms 12456ms

last

The above tests are for reference only. For the simple test above, MySQL loadertask has obvious advantages when there is a large amount of data. Multithreading batch insertion is better for those with less than 1w data volume. Interested partners can download their own code to play. If there is a better one
The plan is open to instruction.

pit

  • mysql loader imports null data using null instead of the mysql document

Posted by cdwhalley.com on Mon, 29 Jun 2020 20:15:36 -0700