Netty 之 Zero-copy 的實(shí)現(xiàn)（上）

sf_wangchong 發(fā)布于2019-08-15 12:23 / 3177人閱讀

摘要：維基百科中對的解釋是零拷貝技術(shù)是指計算機(jī)執(zhí)行操作時，不需要先將數(shù)據(jù)從某處內(nèi)存復(fù)制到另一個特定區(qū)域。維基百科里提到的零拷貝是在硬件和操作系統(tǒng)層面的，而本文主要介紹的是在應(yīng)用層面的優(yōu)化。

維基百科中對 Zero-copy 的解釋是

零拷貝技術(shù)是指計算機(jī)執(zhí)行操作時，CPU不需要先將數(shù)據(jù)從某處內(nèi)存復(fù)制到另一個特定區(qū)域。這種技術(shù)通常用于通過網(wǎng)絡(luò)傳輸文件時節(jié)省CPU周期和內(nèi)存帶寬。

維基百科里提到的零拷貝是在硬件和操作系統(tǒng)層面的，而本文主要介紹的是Netty在應(yīng)用層面的優(yōu)化。不過需要注意的是，零拷貝并非字面意義上的沒有內(nèi)存拷貝，而是避免多余的拷貝操作，即使是系統(tǒng)層的零拷貝也有從設(shè)備到內(nèi)存，內(nèi)存到設(shè)備的數(shù)據(jù)拷貝過程。

Netty 的零拷貝體現(xiàn)在以下幾個方面

ByteBuf 的 slice 操作并不會拷貝一份新的 ByteBuf 內(nèi)存空間，而是直接借用原來的 ByteBuf ，只是獨(dú)立地保存讀寫索引。

Netty 提供了 CompositeByteBuf 類，可以將多個 ByteBuf 組合成一個邏輯上的 ByteBuf 。

Netty 的 FileRegion 中包裝了 NIO 的 FileChannel.transferTo()方法，該方法在底層系統(tǒng)支持的情況下會調(diào)用 sendfile 方法，從而在傳輸文件時避免了用戶態(tài)的內(nèi)存拷貝。

Netty 的 PooledDirectByteBuf 等類中封裝了 NIO 的 DirectByteBuffer ，而 DirectByteBuffer 是直接在 jvm 堆外分配的內(nèi)存，省去了堆外內(nèi)存向堆內(nèi)存拷貝的開銷。

下面來簡單介紹下這幾種方式。

slice

以下以 AbstractUnpooledSlicedByteBuf 為例講解 slice 的零拷貝原理，至于內(nèi)存池化的實(shí)現(xiàn) PooledSlicedByteBuf ，因為內(nèi)存池要通過引用計數(shù)來控制內(nèi)存的釋放，所以代碼里會出現(xiàn)很多與本文主題無關(guān)的邏輯，這里就不拿來舉栗子了。

// 切片ByteBuf的構(gòu)造函數(shù)，其中字段adjustment為切片ByteBuf相對于被切片ByteBuf的偏移
// 量，兩個ByteBuf共用一塊內(nèi)存空間,字段buffer為實(shí)際存儲數(shù)據(jù)的ByteBuf
AbstractUnpooledSlicedByteBuf(ByteBuf buffer, int index, int length) {
    super(length);
    checkSliceOutOfBounds(index, length, buffer);//檢查slice是否越界
    
    if (buffer instanceof AbstractUnpooledSlicedByteBuf) {
        // 如果被切片ByteBuf也是AbstractUnpooledSlicedByteBuf對象
        this.buffer = ((AbstractUnpooledSlicedByteBuf) buffer).buffer;
        adjustment = ((AbstractUnpooledSlicedByteBuf) buffer).adjustment + index;
    } else if (buffer instanceof DuplicatedByteBuf) {
        // 如果被切片ByteBuf為DuplicatedByteBuf對象，則
        // 用unwrap得到實(shí)際存儲數(shù)據(jù)的ByteBuf賦值buffer
        this.buffer = buffer.unwrap();
        adjustment = index;
    } else {
        // 如果被切片ByteBuf為一般ByteBuf對象，則直接賦值buffer
        this.buffer = buffer;
        adjustment = index;
    }

    initLength(length);
    writerIndex(length);
}

以上為 AbstractUnpooledSlicedByteBuf 類的構(gòu)造函數(shù)，比較簡單，就不詳細(xì)介紹了。

下面來看看 AbstractUnpooledSlicedByteBuf 對 ByteBuf 接口的實(shí)現(xiàn)代碼，以 getBytes 方法為例：

@Override
public ByteBuf getBytes(int index, ByteBuffer dst) {
    checkIndex0(index, dst.remaining());//檢查是否越界
    unwrap().getBytes(idx(index), dst);
    return this;
}

@Override
public ByteBuf unwrap() {
    return buffer;
}

private int idx(int index) {
    return index + adjustment;
}

這是 AbstractUnpooledSlicedByteBuf 重載的 getBytes 方法，可以看到 AbstractUnpooledSlicedByteBuf 是直接在封裝的 ByteBuf 上取的字節(jié)，但是重新計算了索引，加上了相對偏移量。

CompositeByteBuf

在有些場景里，我們的數(shù)據(jù)會分散在多個 ByteBuf 上，但是我們又希望將這些 ByteBuf 聚合在一個 ByteBuf 里處理。這里最直觀的想法是將所有 ByteBuf 的數(shù)據(jù)拷貝到一個 ByteBuf 上，但是這樣會有大量的內(nèi)存拷貝操作，產(chǎn)生很大的CPU開銷。

而 CompositeByteBuf 可以很好地解決這個問題，正如名字一樣，這是一個復(fù)合 ByteBuf ，內(nèi)部由很多的 ByteBuf 組成，但 CompositeByteBuf 給它們做了一層封裝，可以直接以 ByteBuf 的接口操作它們。

/**
 * Precondition is that {@code buffer != null}.
 */
private int addComponent0(boolean increaseWriterIndex, int cIndex, ByteBuf buffer) {
    assert buffer != null;
    boolean wasAdded = false;
    try {
        // 檢查新增的component的索引是否合法
        checkComponentIndex(cIndex);

        // buffer的長度
        int readableBytes = buffer.readableBytes();

        // No need to consolidate - just add a component to the list.
        @SuppressWarnings("deprecation")
        // 統(tǒng)一為大端ByteBuf
        Component c = new Component(buffer.order(ByteOrder.BIG_ENDIAN).slice());
        if (cIndex == components.size()) {
            // 如果索引等于components的大小，則加在components尾部
            wasAdded = components.add(c);
            if (cIndex == 0) {
                // 如果components中只有一個元素
                c.endOffset = readableBytes;
            } else {
                // 如果components中有多個元素
                Component prev = components.get(cIndex - 1);
                c.offset = prev.endOffset;
                c.endOffset = c.offset + readableBytes;
            }
        } else {
            // 如果新的ByteBuf是插在components中間
            components.add(cIndex, c);
            wasAdded = true;
            if (readableBytes != 0) {
                // 如果components的大小不為0,則依次更新cIndex之后的
                // 所有components的offset和endOffset
                updateComponentOffsets(cIndex);
            }
        }
        if (increaseWriterIndex) {
            // 如果要更新writerIndex
            writerIndex(writerIndex() + buffer.readableBytes());
        }
        return cIndex;
    } finally {
        if (!wasAdded) {
            // 如果沒添加成功，則釋放ByteBuf
            buffer.release();
        }
    }
}

這是添加一個新的 ByteBuf 的邏輯，核心是 offset 和 endOffset ，分別指代一個 ByteBuf 在 CompositeByteBuf 中開始和結(jié)束的索引，它們唯一標(biāo)記了這個 ByteBuf 在 CompositeByteBuf 中的位置。

弄清楚了這個，我們會發(fā)現(xiàn)上面的代碼無外乎做了兩件事：

把 ByteBuf 封裝成 Component 加到 components 合適的位置上

使 components 里的每個 Component 的 offset 和 endOffset 值都正確

下面來看看 CompositeByteBuf 對 ByteBuf 接口的實(shí)現(xiàn)代碼，同樣以 getBytes 方法為例：

@Override
public CompositeByteBuf getBytes(int index, ByteBuf dst, int dstIndex, int length) {
    // 查索引是否越界
    checkDstIndex(index, length, dstIndex, dst.capacity());
    if (length == 0) {
        return this;
    }

    // 用二分搜索查找index對應(yīng)的Component在components中的索引
    int i = toComponentIndex(index);
    // 循環(huán)讀直至length為0
    while (length > 0) {
        Component c = components.get(i);
        ByteBuf s = c.buf;
        int adjustment = c.offset;
        // 取length和ByteBuf剩余字節(jié)數(shù)中的較小值
        int localLength = Math.min(length, s.capacity() - (index - adjustment));
        // 開始索引為index - c.offset，而不是0
        s.getBytes(index - adjustment, dst, dstIndex, localLength);
        index += localLength;
        dstIndex += localLength;
        length -= localLength;
        i ++;
    }
    return this;
}

/**
 * Return the index for the given offset
 */
public int toComponentIndex(int offset) {
    checkIndex(offset);

    for (int low = 0, high = components.size(); low <= high;) {
        int mid = low + high >>> 1;
        Component c = components.get(mid);
        if (offset >= c.endOffset) {
            low = mid + 1;
        } else if (offset < c.offset) {
            high = mid - 1;
        } else {
            return mid;
        }
    }

    throw new Error("should not reach here");
}

可以看到 CompositeByteBuf 在處理 index 時是先將其轉(zhuǎn)換成對應(yīng) Component 在 components 中的索引，以及在 Component 中的偏移，然后從這個 Component 的這個偏移開始，往后循環(huán)取字節(jié)，直到讀完。

NOTE：這里有個小trick，因為 components 是有序排列的，所以 toComponentIndex 做索引轉(zhuǎn)換時沒有直接遍歷，而是用的二分查找。

今天寫得有點(diǎn)累了，這里留個坑，下一篇再填上。