On Shallow Copy and Deep Copy

Keywords: Javascript JSON Attribute JQuery github

Preface: how to realize a deep copy of js

This is an old-fashioned question, but also in the process of job-hunting high-frequency interview questions, the investigation of knowledge points are very rich, this paper will make a shallow-to-deep comb of the difference between shallow and deep copies, implementation and so on.

The Difference between Assignment, Shallow Copy and Deep Copy

In js, variable types are divided into basic types and reference types. Direct assignment copies of variables:

  • For primitive types, the values stored on the stack are copied
  • For reference types, the copy is a pointer stored in the stack, pointing to the real address of the reference type data in the heap.

Direct copying of reference type variables is just copying the pointer address of variables. Both of them point to the same reference type data, and the execution of one of them will cause another change.

On shallow and deep copies:

  • Shallow copy is an exact copy of the original data. If the sub-data is the basic type, the copy value is copied. If the sub-data is the reference type, the copy address is copied. Both share memory space, and one modification will affect the other.
  • Deep copy opens up new memory space and replicates the original data completely.

Therefore, the fundamental difference between shallow copy and deep copy is whether memory space is shared. Simply put, deep copy is a shallow copy of the original data recursion.

A simple comparison of the three is as follows:

Does it point to the original data? Subdata is a basic type Subdata contains reference types
assignment yes Change of original data when changing Change of original data when changing
shallow copy no The original data remains unchanged when changed Change of original data when changing
deep copy no The original data remains unchanged when changed The original data remains unchanged when changed

Primary shallow copy method

There are several common shallow copy methods for arrays and objects:

  • Array.prototype.slice
  • Array.prototype.concat
  • Array.from
  • Object.assign
  • ES6 deconstruction

Use the following Use Case 1.test.js Testing:

const arr = ['test', { foo: 'test' }]
const obj = {
  str: 'test',
  obj: {
    foo: 'test'
  }
}

const arr1 = arr.slice()
const arr2 = arr.concat()
const arr3 = Array.from(arr)
const arr4 = [...arr]

const obj1 = Object.assign({}, obj)
const obj2 = {...obj}

//Modify arr
arr[0] = 'test1'
arr[1].foo = 'test1'

// Modify obj
obj.str = 'test1'
obj.obj.foo = 'test1'

The results are as follows:

You can see that after shallow copy, we modify the basic type data in the original object or array, and the corresponding data after copy does not change; but when we modify the reference type data in the original object or array, the copied data will change correspondingly, and they share the same memory space.

Deep copy implementation

Here we list common deep copy methods and try to implement them manually. Finally, we summarize and compare them.

1. Fast implementation of JSON serialization

Using JSON.parse(JSON.stringify(data)) to achieve deep copy, this method can basically cover 90% of the use scenarios, but it also has its shortcomings. When dealing with the following situations, we need to consider using other methods to achieve deep copy:

  • JSON.parse Only serialization can be processed as JSON Format data, so the following data cannot be processed

    • Special data such as undefined , NaN , Infinity etc.
    • Special objects such as time objects, regular expressions, functions, Set,Map etc.
    • Failure to handle circular references, such as rings, will result in direct error reporting
  • JSON.parse It can only serialize the enumerable attributes of the object itself, thus discarding the constructor's constructor 

Use the following Use case 2.test.js To verify the basic types:

const data = {
    a: 1,
  b: 'str',
  c: true,
  d: null,
  e: undefined,
  f: NaN,
  g: Infinity,
}

const dataCopy = JSON.parse(JSON.stringify(data))

You can see that NaN, Infinity are transformed into null during serialization, while undefined is lost:

Reuse Use Case 3.test.js Testing reference types:

const data = {
  a: [1, 2, 3],
  b: {foo: 'obj'},  
    c: new Date('2019-08-28'),
  d: /^abc$/g,
  e: function() {},
  f: new Set([1, 2, 3]),
  g: new Map([['foo', 'map']]),
}

const dataCopy = JSON.parse(JSON.stringify(data))

For reference type data, in the process of serialization and deserialization, only arrays and objects are copied normally, in which time objects are converted into strings, functions are lost, and others are converted into empty objects:

utilize Use Case 4.test.js Verify the constructor:

function Person(name) {
  // Constructor instance attribute name
    this.name = name
  // Constructor instance method getName
  this.getName = function () { 
      return this.name
  }
}
// Constructor prototype attribute age
Person.prototype.age = 18

const person = new Person('xxx')
const personCopy = JSON.parse(JSON.stringify(person))

During the copying process, only the enumerable attributes of the object will be serialized, so the prototype attribute age on Person cannot be copied; because the constructor will be lost during the serialization process, the constructor of personCopy will point to the top-level native constructor Object rather than the custom constructor Person.

2. Manual implementation of deep copy

Simple Edition

We first implement a simple version of the deep copy, the idea is to determine the data type, if not the reference type, directly return; if the reference type, then determine whether the data is an array or an object, and recursively traverse the data, as follows:

function cloneDeep(data) {
  if(typeof data !== 'object') return data
  const retVal = Array.isArray(data) ? [] : {}
  for(let key in data) {
    retVal[key] = deepClone(data[key])
  }
  return retVal
}

Execution Use case clone 1. test. JS :

const data = {
  str: 'test',
  obj: {
    foo: 'test'
  },
  arr: ['test', {foo: 'test'}]
}

const dataCopy = cloneDeep(data)

You can see that the correct copy can be achieved for objects and arrays

First, only two types of objects and arrays are considered, and other reference type data still share the same memory space with the original data, which needs to be improved. Second, for a custom constructor, the instance object constructor will be lost in the process of copying, so its constructor will become the default Object.

Processing other data types

In the last step, we implemented a simple deep copy, which only considers two types of reference data, object and array. Next, we will deal with other commonly used data structures accordingly.

Define generic methods

First, we define a method to get the type of data correctly. Here we use the toString method on the Object prototype object, which returns the value of [object type]. We can intercept the type in it. Then the constants of the data type set are defined as follows:

const getType = (data) => {
  return Object.prototype.toString.call(data).slice(8, -1)
}

const TYPE = {
  Object: 'Object',
  Array: 'Array',
  Date: 'Date',
  RegExp: 'RegExp',
  Set: 'Set',
  Map: 'Map',
}

Then we refine the other types of processing, where deep copy can be divided into two steps: first, the initialization of data, and then the traversal of traversable objects.

Initialization

According to different data types, the copied values are initialized accordingly.

function dataInit(data, type) {
  const reFlags = /\w*$/
  const Constructor = data.constructor
  switch(type) {
    case TYPE.Object:
      // Getting the prototype of the original object
      return Object.create(Object.getPrototypeOf(data))
    case TYPE.Array:
      return []
    case TYPE.Date:
      // Special Processing Date
      return new Constructor(data.getTime())
    case TYPE.RegExp:
      // Special processing regexp, the lastIndex attribute will be lost in the copy process
      const reg = new Constructor(data.source, reFlags.exec(data))
      reg.lastIndex = data.lastIndex
      return reg
    case TYPE.Set:
    case TYPE.Map:
      return new Constructor()
    default:
      return data
  }
}
Ergodic data

In the main function, the traversable data types are recursively traversed. Where a symbol type is used as an identifier in an assignment statement (e.g. the key name of an object), when the property is anonymous and non-enumerable, it will not be captured in a for...in loop, nor will it be returned by Object.getOwnPropertyNames, only through the original symbolic value or Object.getOwnPropertySymbols Method acquisition

function cloneDeep(data) {
  const dataType = getType(data)
  // If it is of other types, return directly
  if(!TYPE[dataType]) return data
  // Initialize data
  const retVal = dataInit(data, dataType)
  // Ergodic traversable type
  switch (dataType) {
    case TYPE.Array:
      data.forEach(value => retVal.push(cloneDeep(value)))
      break
    case TYPE.Object:
      for (let key in data) {
        // Regardless of inheritance attributes
        if (data.hasOwnProperty(key)) {
          retVal[key] = cloneDeep(data[key])
        }
      }
      // Processing Symbol Type Key Names in Object
      Object.getOwnPropertySymbols(data).forEach(symbol => {
        retVal[symbol] = cloneDeep(data[symbol])
      })
      break
    case TYPE.Set:
      data.forEach(value => retVal.add(cloneDeep(value)))
      break
    case TYPE.Map:
      for (let [mapKey, mapValue] of data) {
        // Map keys and values can all be reference types, so they all need to be copied
        retVal.set(cloneDeep(mapKey), cloneDeep(mapValue))
      }
      break
  }
  return retVal
}

The complete version of the above code is available for reference. clone2.js Next, use Use case clone 2. test. JS Verification:

const symbol = Symbol('sym')

const data = {
    obj: {},
  arr: [],
  reg: /reg/g,
  date: new Date('2019'),
  person: new Person('lixx'),
  [symbol]: 'symbol',
  set: new Set([{test: 'set'}]),
  map: new Map([[{key: 'map'}, {value: 'map'}]])
}

function Person(name) {
    this.name = name
}

const dataClone = cloneDeep(data)

It can be seen that correct copies of different types of reference data can be achieved. The results are as follows:

About function

I have not implemented a copy of the function here, and there is no problem with using the same memory space for functions in two objects. In fact, after looking at the related implementation of lodash/cloneDeep, it returns directly to the function:

At this point, our deep copy method has taken shape. In fact, the data types that need special processing are far more than these. There are Error, Buffer, Element and so on. Interested partners can continue to explore and implement it.~

Handling circular references

Up to now, deep copy can handle most of the commonly used data structures, but when circular references occur in the data, it is helpless.

const a = {}
a.a = a

cloneDeep(a)

As you can see, for circular references, a recursive call becomes a dead loop, leading to stack overflow:

So how to crack it?

Regardless of circular references, let's first look at the basic problem of references. The deep copy method and JSON serialized copies implemented in the previous section will remove references to other data from the original reference type. Let's look at this Example:

const temp = {}
const data = {
    a: temp,
  b: temp,
}
const dataJson = JSON.parse(JSON.stringify(data))
const dataClone = deepClone(data)

Verify the reference relationship:

If you want to dissolve this quotation relationship, it's totally ok. If you want to maintain a reference relationship between data, how do you implement it?

One approach is to store copied content with a data structure and query it before each copy. If it is found that it has been copied, the original reference relationship can be maintained by returning the stored copy value directly.

Because all the data that can be copied correctly are reference types, we need a key-value data structure, which can be reference type. We naturally think that we can use Map/WeakMap to implement it.

Here we use a WeakMap data structure to save the copied structure. The biggest difference between WeakMap and Map is that its key is weak reference. Its reference to value is not included in the garbage collection mechanism. That is to say, when other references are released, the garbage collection mechanism releases the memory of the object. If a strongly referenced Map is used, this part of memory will not be released unless the reference is manually de-referenced, which can easily lead to memory leaks.

Concrete Realization As follows:

function cloneDeep(data, hash = new WeakMap()) {
  const dataType = getType(data)
  // If it is of other types, return directly
  if(!TYPE[dataType]) return data
  // Query whether it has been copied
  if(hash.has(data)) return hash.get(data)
  const retVal = dataInit(data, dataType)
  // For circular references, you need to write hash before recursive loops, otherwise the stack will still overflow
  hash.set(data, retVal)
  switch (dataType) {
    case TYPE.Array:
      data.forEach(value => retVal.push(cloneDeep(value, hash)))
      break
    case TYPE.Object:
      for (let key in data) {
        // Regardless of inheritance attributes
        if (data.hasOwnProperty(key)) {
          retVal[key] = cloneDeep(data[key], hash)
        }
      }
      // Processing Symbol Type Key Names in Object
      Object.getOwnPropertySymbols(data).forEach(symbol => {
        retVal[symbol] = cloneDeep(data[symbol], hash)
      })
      break
    case TYPE.Set:
      data.forEach(value => retVal.add(cloneDeep(value, hash)))
      break
    case TYPE.Map:
      for (let [mapKey, mapValue] of data) {
        // Map keys and values can all be reference types, so they all need to be copied
        retVal.set(cloneDeep(mapKey, hash), cloneDeep(mapValue, hash))
      }
      break
  }
  return retVal
}

After modification, the deep copy function can retain the reference relationship of the original data, and also can correctly handle the circular reference of different reference types, using the following Use case clone 3. test. JS To verify:

const temp = {}
const data = {
    a: temp,
  b: temp,
}
const dataClone = deepClone(data)

const obj = {}
obj.obj = obj

const arr = []
arr[0] = arr

const set = new Set()
set.add(set)

const map = new Map()
map.set(map, map)

The results are as follows:

Think: Use non-recursive

In the previous deep copy implementations, traversal is carried out recursively. When the level of recursion is too deep, stack overflow will occur. We use the following create method to create sample data with depth of 10000 and breadth of 100:

function create(depth, breadth) {
  const data = {}
  let temp = data
  let i = j = 0
  while(i < depth) {
    temp = temp['data'] = {}
    while(j < breadth) {
      temp[j] = j
      j++
    }
    i++
  }
  return data
}

const data = create(10000, 100)
cloneDeep(data)

The results are as follows:

So how do we achieve this without recursion?

In the case of objects, there is a data structure as follows:

const data = {
  left: 1,
  right: {
      left: 1,
    right: 2,
  }
}

In other words, it is a tree-like structure.


Our traversal of the object is actually equivalent to simulating the traversal of the tree. Tree traversal is mainly divided into depth-first traversal and breadth-first traversal. The former is generally implemented by stack, while the latter is generally implemented by queue.

In this paper, we simulate the depth-first traversal of trees, and only consider objects and non-objects. We use stack to implement a simple deep copy method without recursion.

function cloneDeep(data) {
  const retVal = {}
    const stack = [{
      target: retVal,
    source: data,
  }]
  // Loop the entire stack
  while(stack.length > 0) {
    // The top node of the stack goes out of the stack
    const node = stack.pop()
    const { target, source } = node
    // Traversing the current node
    for(let item in source) {
      if (source.hasOwnProperty(item)) {
        if (Object.prototype.toString.call(source[item]) === '[object Object]') {
          target[item] = {}
            // If the child node is an object, the node is stacked
          stack.push({
            target: target[item],
            source: source[item],
          })
        } else {
            // If the child node is not an object, copy it directly
             target[item] = source[item]
        }
      }
    }
  }
  return retVal
}

For complete deep copy non-recursive implementation, you can refer to clone4.js The corresponding test case is Use case clone 4. test. JS This is not given here.

3. Comparison of deep copy methods

Several common deep copy methods are listed and compared.

  • JSON.parse(JSON.stringify(data))
  • $. extend in jQuery
  • We realized it here by ourselves. clone3.js Clone Deep in
  • cloneDeep in loadsh

With regard to time-consuming comparison, a data with 1000 breadth and depth is created by using the previous create method. Under the environment of node v10.14.2, the following methods are executed 10,000 times each, and the time-consuming value here is run ten times. test case The average values are as follows:

Basic types Arrays, objects Special reference types Circular reference time consuming
JSON Unable to process NaN, Infinity, Undefined Lost object prototype 7280.6ms
$.extend Undefined cannot be handled Lost Object Prototype, Copy Prototype Properties
(Use the same reference)
5550.6ms
cloneDeep ✔️ ✔️ (to be improved) ✔️ 5035.3ms
_.cloneDeep ✔️ ✔️ ✔️ ✔️ 5854.5ms

In the daily use process, if you are sure that your data only has arrays, objects and other common types, you can safely use JSON serialization for deep copy, or in other cases recommend the introduction of loadsh/cloneDeep to achieve.

Summary

Deep copies of water are "deep" and shallow copies are not "shallow". Small deep copies contain rich knowledge points:

  • Consider whether the problem is comprehensive and rigorous
  • Basic knowledge, api proficiency
  • Understanding of Data Types
  • Recursive/non-recursive (loop)
  • Tree traversal (Gangzhen, you can talk about it here, there are many points to ask)
  • Symbol, Set, Map/WeakMap, etc.

I believe that if the interviewer is willing to dig, there are more than so many points of knowledge to be examined. At this time, it is necessary to test the depth of your basic skills and knowledge. In a word, the interview is a two-way selection process and an opportunity to show yourself. If you can do more than bb, you will not write code.

If there are any mistakes in this article, please criticize and correct them.~

Reference resources

Links to the original text: On Shallow Copy and Deep Copy 

Posted by mherr170 on Sun, 15 Sep 2019 21:30:56 -0700