V8數(shù)組排序方法sort淺析

instein 發(fā)布于2019-08-23 12:34 / 2333人閱讀

摘要：出于性能優(yōu)化的目的，當(dāng)數(shù)組排序區(qū)間長(zhǎng)度在之內(nèi)時(shí)，實(shí)際的排序方法是插入排序，其余時(shí)候使用快速排序。大體上，這個(gè)排序方法的思想是對(duì)數(shù)組進(jìn)行區(qū)間劃分，當(dāng)排序區(qū)間大于時(shí)，使用快排，使局部有序，當(dāng)區(qū)間小于等于時(shí)使用插入排序，使數(shù)組整體有序。

數(shù)組排序方法sort淺析

數(shù)組提供了排序方法，使用時(shí)傳入一個(gè)比較函數(shù)，根據(jù)比較函數(shù)的返回來確定元素最終在數(shù)組中的位置。默認(rèn)排序順序是根據(jù)字符串Unicode碼點(diǎn)。

var a = [12,4,6,8,9,54,11,13];
a.sort(); // [11, 12, 13, 4, 54, 6, 8, 9]

如果指明了比較函數(shù)，那么數(shù)組會(huì)按照調(diào)用該函數(shù)的返回值排序。比較函數(shù)接受兩個(gè)參數(shù)(x,y)，表示數(shù)組中待排序的元素，根據(jù)返回結(jié)果來決定如何排序：

返回結(jié)果小于0，表示x在前y在后。

返回結(jié)果等于0，則x和y位置不改變。（備注： ECMAScript 標(biāo)準(zhǔn)并不要求這一行為，說明sort排序不一定是穩(wěn)定的）

返回結(jié)果大于0，表示x應(yīng)該在y之后。

在MDN中還特別指出，無法保證排序的時(shí)間和空間復(fù)雜性。這是因?yàn)椴煌鎸?shí)現(xiàn)排序方法的方式不一定相同。

V8的排序方法

函數(shù)的整體結(jié)構(gòu)如下，當(dāng)參數(shù)不是可執(zhí)行的比較函數(shù)時(shí)，內(nèi)部定義默認(rèn)的比較函數(shù)。

出于性能優(yōu)化的目的，當(dāng)數(shù)組排序區(qū)間長(zhǎng)度在10之內(nèi)時(shí)，實(shí)際的排序方法是插入排序，其余時(shí)候使用快速排序。所以定義了內(nèi)部函數(shù)InsertionSort和QuickSort，同時(shí)還有函數(shù)GetThirdIndex，用于輔助快排中支點(diǎn)的選擇。

// TODO(pwong): Remove once TypedArray.prototype.join() is ported to Torque.
function InnerArraySort(array, length, comparefn) {
  // In-place QuickSort algorithm.
  // For short (length <= 10) arrays, insertion sort is used for efficiency.

  if (!IS_CALLABLE(comparefn)) {
    comparefn = function (x, y) {
        // ...
    };
  }
  function InsertionSort(a, from, to) {
    // ...
  };

  function GetThirdIndex(a, from, to) {
    // ...
  }

  function QuickSort(a, from, to) {
    // ...
  };

  if (length < 2) return array;

  var num_non_undefined = %PrepareElementsForSort(array, length);

  QuickSort(array, 0, num_non_undefined);

  return array;
}

下面來逐段分析代碼。

第一個(gè)if處理默認(rèn)排序，內(nèi)部會(huì)將x、y轉(zhuǎn)化成字符串再進(jìn)行比較。字符串比較是使用基于標(biāo)準(zhǔn)字典的 Unicode 值來進(jìn)行比較的，這也是第一個(gè)例子中13在4之前的原因。

    if (!IS_CALLABLE(comparefn)) {
    comparefn = function (x, y) {
      if (x === y) return 0;
      if (%_IsSmi(x) && %_IsSmi(y)) {
        return %SmiLexicographicCompare(x, y);
      }
      x = TO_STRING(x); // 轉(zhuǎn)化成字符串
      y = TO_STRING(y);
      if (x == y) return 0;
      else return x < y ? -1 : 1;
    };
  }

接著實(shí)現(xiàn)了插入排序。模擬將新元素插入到數(shù)組中間的過程，從第二個(gè)元素from + 1開始，根據(jù)大小關(guān)系確定插入的位置。當(dāng)確定插入位置在j的時(shí)候，原在j上以及后面的元素都要向右移一，索引加一。這就是插入排序。

插入排序每次排序過程會(huì)將當(dāng)前元素與前面的元素進(jìn)行比較。以升序排序?yàn)槔h(huán)將當(dāng)前元素與前一元素比較，當(dāng)前元素較小時(shí)交換兩個(gè)元素的位置，直至當(dāng)前元素大于前一元素或到達(dá)排序區(qū)間的第一位時(shí)結(jié)束循環(huán)，完成當(dāng)前元素的排序。更新新元素位置時(shí)的遍歷區(qū)間是[from, i]，i的取值是[from + 1, to]。

使用element緩存插入排序中要插入的值，每次迭代中，使用tmp緩存a[j]的值，執(zhí)行comparefn(tmp, element)。

返回結(jié)果order大于0的時(shí)候，說明element仍需向前，所以要將a[j]向后移動(dòng)。a[j + 1] = tmp便完成了這樣的工作；直至order大于等于0，說明則找到element應(yīng)插入的位置，執(zhí)行a[j + 1] = element插入a[i]。

  function InsertionSort(a, from, to) {
    for (var i = from + 1; i < to; i++) {
      var element = a[i];
      for (var j = i - 1; j >= from; j--) {
        var tmp = a[j];
        var order = comparefn(tmp, element);
        if (order > 0) {
          a[j + 1] = tmp;
        } else {
          break;
        }
      }
      a[j + 1] = element;
    }
  };

如圖示，使用插入排序使數(shù)組升序，上箭頭表示當(dāng)前循環(huán)中的i，當(dāng)前緩存值element是19，下箭頭是j，從i - 1開始想前遍歷，如果element應(yīng)在a[j]之前，則a[j + 1] = tmp，確定插入位置時(shí)，a[j + 1] = element。這樣就完成了一個(gè)元素的插入過程。

當(dāng)數(shù)組長(zhǎng)度較長(zhǎng)時(shí)，內(nèi)部使用快速排序。快排的思想是選取某一個(gè)值作為支點(diǎn)值，先從頭遍歷，找出第一個(gè)應(yīng)該在支點(diǎn)值右邊的元素，再?gòu)奈蚕蝾^遍歷，找出第一個(gè)應(yīng)該在支點(diǎn)值左邊的元素，交換兩個(gè)元素，直至左邊與右邊重疊。重疊的位置即是支點(diǎn)應(yīng)在的位置。以升序排序?yàn)槔c(diǎn)左邊的值均小于等于支點(diǎn)值，右邊的值均大于支點(diǎn)值。

在V8引擎的實(shí)現(xiàn)中，支點(diǎn)值的選取是確定第三個(gè)值，再取其與a[from]、a[to]的中值作為支點(diǎn)值。當(dāng)排序區(qū)間的長(zhǎng)度在1000以內(nèi)時(shí)，第三個(gè)值的位置是from + ((to - from) >> 1)，接近區(qū)間的中值點(diǎn)。當(dāng)排序區(qū)間較大時(shí)(大于1000)，第三個(gè)值的索引是通過GetThirdIndex來獲取。GetThirdIndex的選取思想是將區(qū)間分成多段，每段用一個(gè)值代表，然后從這些值去選取一個(gè)接近中值的值作為支點(diǎn)。

increment是區(qū)間分段后每段的長(zhǎng)度，取值區(qū)間是[200, 215]。分段的范圍是[from + 1, to - 1]，每一段用起點(diǎn)值代表。將代表值及其在原數(shù)組a中的索引保存在數(shù)組中作為內(nèi)部數(shù)組t_array的元素，并根據(jù)代表值進(jìn)行排序。

最后的返回結(jié)果是t_array[t_array.length >> 1][0]。t_array.length >> 1是將t_array.length的二進(jìn)制形式左移一位，取值接近t_array的中值，t_array[t_array.length >> 1][0]則是這個(gè)中值在數(shù)組a中的索引。

  function GetThirdIndex(a, from, to) {
    var t_array = new InternalArray();
    // Use both "from" and "to" to determine the pivot candidates.
    var increment = 200 + ((to - from) & 15);
    var j = 0;
    from += 1;
    to -= 1;
    for (var i = from; i < to; i += increment) {
      t_array[j] = [i, a[i]];
      j++;
    }
    t_array.sort(function(a, b) {
      return comparefn(a[1], b[1]);
    });
    var third_index = t_array[t_array.length >> 1][0];
    return third_index;
  }

快排的實(shí)現(xiàn)如下。

內(nèi)部使用一個(gè)while(true)循環(huán)，只有當(dāng)to - from <= 10才會(huì)結(jié)束無限循環(huán)。在函數(shù)內(nèi)末尾有修改from/to的代碼，避免無限循環(huán)，同時(shí)遞歸調(diào)用自身。大體上，這個(gè)排序方法的思想是對(duì)數(shù)組進(jìn)行區(qū)間劃分，當(dāng)排序區(qū)間大于10時(shí)，使用快排，使局部有序，當(dāng)區(qū)間小于等于10時(shí)使用插入排序，使數(shù)組整體有序。

function QuickSort(a, from, to) {
    var third_index = 0;
    while (true) {
      // Insertion sort is faster for short arrays.
      if (to - from <= 10) {
        InsertionSort(a, from, to);
        return;
      }
      // ...
      if (to - high_start < low_end - from) {
        QuickSort(a, high_start, to);
        to = low_end;
      } else {
        QuickSort(a, from, low_end);
        from = high_start;
      }
    }
    
  }

第三個(gè)點(diǎn)的位置會(huì)根據(jù)排序區(qū)間的長(zhǎng)度來選取。

      if (to - from > 1000) {
        third_index = GetThirdIndex(a, from, to);
      } else {
        third_index = from + ((to - from) >> 1);
      }

將a[from]/a[to-1]和上面選取的第三個(gè)點(diǎn)的值記為v0、v1、v2。交換這些值，使其按v0 <= v1 <= v2的順序排列。

      var v0 = a[from];
      var v1 = a[to - 1];
      var v2 = a[third_index];
      var c01 = comparefn(v0, v1);
      if (c01 > 0) {
        // v1 < v0, so swap them.
        var tmp = v0;
        v0 = v1;
        v1 = tmp;
      } // v0 <= v1.
      var c02 = comparefn(v0, v2);
      if (c02 >= 0) {
        // v2 <= v0 <= v1.
        var tmp = v0;
        v0 = v2;
        v2 = v1;
        v1 = tmp;
      } else {
        // v0 <= v1 && v0 < v2
        var c12 = comparefn(v1, v2);
        if (c12 > 0) {
          // v0 <= v2 < v1
          var tmp = v1;
          v1 = v2;
          v2 = tmp;
        }
      }
      // v0 <= v1 <= v2
      a[from] = v0;
      a[to - 1] = v2;
      var pivot = v1;

如果忽略from之前與to之后的元素，當(dāng)前的排序區(qū)間可以表示成

隨后，交換low_end與third_index的值。

      var low_end = from + 1;   // Upper bound of elements lower than pivot.
      var high_start = to - 1;  // Lower bound of elements greater than pivot.
      a[third_index] = a[low_end];
      a[low_end] = pivot;

現(xiàn)在排序區(qū)間的結(jié)構(gòu)如下:

接著從low_end + 1開始向右遍歷，令element = a[i]，比較當(dāng)前元素element與pivot。若comparefn(element, pivot) < 0，說明element應(yīng)該在pivot前，將element與a[low_end](即pivot)交換，low_end++表示pivot位置向右移一位，因?yàn)樵瓉淼奈恢靡炎兂?b>element。

如果comparefn(element, pivot) > 0說明element應(yīng)該在pivot后，從high_start開始向左查找第一個(gè)應(yīng)該在pivot前或與pivot相等的元素top_elem，交換top_elem與element。如果top_elem應(yīng)該在pivot之前，二者互換。如果comparefn(element, pivot) == 0，則element/pivot取值相同，無需交換，同時(shí)也無需移動(dòng)low_end。

直至i == high_start時(shí)退出循環(huán)。下面是上述流程完整的代碼。

      // From low_end to i are elements equal to pivot.
      // From i to high_start are elements that haven"t been compared yet.
      partition: for (var i = low_end + 1; i < high_start; i++) {
        var element = a[i];
        var order = comparefn(element, pivot);
        if (order < 0) {
          a[i] = a[low_end];
          a[low_end] = element;
          low_end++;
        } else if (order > 0) {
          do {
            high_start--;
            if (high_start == i) break partition;
            var top_elem = a[high_start];
            order = comparefn(top_elem, pivot);
          } while (order > 0);
          a[i] = a[high_start];
          a[high_start] = element;
          if (order < 0) {
            element = a[i];
            a[i] = a[low_end];
            a[low_end] = element;
            low_end++;
          }
        }
      }

完成整個(gè)快排的數(shù)組區(qū)間以a[i]/pivot為分界點(diǎn)，a[i]左邊的元素全小于或等于pivot，a[i]右邊的元素全大于pivot。然后從[from, low_end]和[high_start, to]中選出區(qū)間較小的一組遞歸調(diào)用QuickSort；同時(shí)將更新一個(gè)端點(diǎn)，縮小區(qū)間。

      if (to - high_start < low_end - from) {
        QuickSort(a, high_start, to);
        to = low_end;
      } else {
        QuickSort(a, from, low_end);
        from = high_start;
      }

這是完整的快排算法。

  function QuickSort(a, from, to) {
    var third_index = 0;
    while (true) {
      // Insertion sort is faster for short arrays.
      if (to - from <= 10) {
        InsertionSort(a, from, to);
        return;
      }
      if (to - from > 1000) {
        third_index = GetThirdIndex(a, from, to);
      } else {
        third_index = from + ((to - from) >> 1);
      }
      // Find a pivot as the median of first, last and middle element.
      var v0 = a[from];
      var v1 = a[to - 1];
      var v2 = a[third_index];
      var c01 = comparefn(v0, v1);
      if (c01 > 0) {
        // v1 < v0, so swap them.
        var tmp = v0;
        v0 = v1;
        v1 = tmp;
      } // v0 <= v1.
      var c02 = comparefn(v0, v2);
      if (c02 >= 0) {
        // v2 <= v0 <= v1.
        var tmp = v0;
        v0 = v2;
        v2 = v1;
        v1 = tmp;
      } else {
        // v0 <= v1 && v0 < v2
        var c12 = comparefn(v1, v2);
        if (c12 > 0) {
          // v0 <= v2 < v1
          var tmp = v1;
          v1 = v2;
          v2 = tmp;
        }
      }
      // v0 <= v1 <= v2
      a[from] = v0;
      a[to - 1] = v2;
      var pivot = v1;
      var low_end = from + 1;   // Upper bound of elements lower than pivot.
      var high_start = to - 1;  // Lower bound of elements greater than pivot.
      a[third_index] = a[low_end];
      a[low_end] = pivot;

      // From low_end to i are elements equal to pivot.
      // From i to high_start are elements that haven"t been compared yet.
      partition: for (var i = low_end + 1; i < high_start; i++) {
        var element = a[i];
        var order = comparefn(element, pivot);
        if (order < 0) {
          a[i] = a[low_end];
          a[low_end] = element;
          low_end++;
        } else if (order > 0) {
          do {
            high_start--;
            if (high_start == i) break partition;
            var top_elem = a[high_start];
            order = comparefn(top_elem, pivot);
          } while (order > 0);
          a[i] = a[high_start];
          a[high_start] = element;
          if (order < 0) {
            element = a[i];
            a[i] = a[low_end];
            a[low_end] = element;
            low_end++;
          }
        }
      }
      if (to - high_start < low_end - from) {
        QuickSort(a, high_start, to);
        to = low_end;
      } else {
        QuickSort(a, from, low_end);
        from = high_start;
      }
    }
  };

參考鏈接

Array.prototype.sort() - MDN

V8的排序方法實(shí)現(xiàn)