首页 > 编程语言> 文章详细

Go语言map底层源码理解(map的扩容)

2022-01-03 10:34:46 阅读：196 来源： 互联网

标签：map unsafe hmap 源码 maptype key Go Pointer

开心一刻

放假，送室友坐高铁回家，临上车前，我说：“我去买几个橘子，你就站在此地，不要走动。”
室友愣了一下，说：“你TM什么时侯都不忘占我便宜。”

写在前面

最近在看Go map底层源码，看到go map的扩容机制，产生几个疑问，通过看源码能够理解，但是总感觉不够透彻，也容易忘记，在这里记录一下，以后碰到更好的解释再来记录。

扩容时获取map中的值的过程？
扩容时更改map中的值的过程？
扩容时删除map中的值的过程？
扩容因子6.5如何确定的？

扩容时获取map中的值的过程

map中获取元素的函数原型主要有以下几种：

mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer)

mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer
mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool)

mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool)
mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer

mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
...

mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer
...

之所以有这么多的函数原型还是主要因为go针对各种数据类型进行了优化，加快代码的执行效率，最基本的还是要看前两个，然后与扩容时想关的代码部分如下：

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ···
    // 计算key在属于哪个常规桶
    hash := t.hasher(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    // 扩容时重新定位在旧桶的位置，如果旧桶没有迁移，那么直接将旧桶赋值给b，接下来就只在旧桶中查找
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    // 计算tophash，开始在桶中查找元素
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            ···
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}

注意点：
map中查找元素的时候，如果处在扩容阶段时，不涉及迁移；
先定位到常规桶，再判断定位到的旧桶是否迁移，如果还没有迁移，直接转到旧桶中查找。

扩容时更改map中的值的过程？

map的赋值涉及的函数原型包括：

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {}
func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {}
func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer{}
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {}

Go同样对于不同类型的key值进行了优化，代码逻辑大致相同。与map赋值涉及的代码如下：

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ···
again:
    bucket := hash & bucketMask(h.B)
    //如果正在迁移，先将key所在的旧桶迁移，然后顺便再迁移一个桶，然后将nevacuate指向下一个待迁移的桶
    //可以查看growWork查看具体逻辑
    if h.growing() {
        growWork(t, h, bucket)
    }
    //在常规桶中查找key的位置，如果存在key，则更新value即可，否则寻找一个空位，然后插入key、value
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    ···

    // Did not find mapping for key. Allocate new cell & add entry.
    //没有找到key值的时候，就需要增加寻找一个空位，不过会先预判是否要进入扩容阶段。
    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }

    if inserti == nil {
        // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    ···
    return elem
}

注意点：
渐进式扩容的扩容就体现在插入值的时候，在扩容的时候不仅会将key所在的旧桶迁移，并且顺带迁移一个桶，并指向下一个待迁移的桶；
该函数返回的是一个内存地址，也就是key对应的value的内存地址，将value的值放入到该内存地址中即可。应该是由其他编译代码完成

扩容时删除map中的值的过程？

map的删除涉及的函数原型包括：

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer)

具体的代码逻辑如下：

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    //判断是否有其他协程在操作该map对象，如果有，则报错。这也是标准map不支持并发的原因
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }

    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write (delete).
    h.flags ^= hashWriting
    // 如果正在迁移，则先迁移。与map赋值操作逻辑相同
    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    bOrig := b
    top := tophash(hash)
search:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            ····
            //删除操作仅将标志位设置为emptyOne，并不清空内存，这也是需要等量扩容(伪缩容)的原因
            b.tophash[i] = emptyOne
            // If the bucket now ends in a bunch of emptyOne states,
            // change those to emptyRest states.
            // It would be nice to make this a separate function, but
            // for loops are not currently inlineable.
            ····
        notLast:
            h.count--
            // Reset the hash seed to make it more difficult for attackers to
            // repeatedly trigger hash collisions. See issue 25237.
            if h.count == 0 {
                h.hash0 = fastrand()
            }
            break search
        }
    }

    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}

注意点：
map删除元素时的扩容也体现在插入值的时候，在扩容的时候不仅会将key所在的旧桶迁移，并且顺带迁移一个桶，并指向下一个待迁移的桶；
删除操作仅将标志位设置为emptyOne，并不清空内存，这也是需要等量扩容(伪缩容)的原因。

扩容因子6.5如何确定的？

这个值太大会导致溢出桶（overflow buckets）过多，查找效率降低，过小则会浪费存储空间。
据 Google 开发人员称，这个值是一个测试程序测量出来的一个经验值。在map.go中有数据解释。

总结

这部分写的扩容的一些相关问题如果没理解map的访问、赋值，可能不太好理解这部分内容。我自己也参考了很多优秀博客，下面把一些我觉得比较好的博客链接写出来，以后常看看。
其他相关问题：

桶和key的定位：我觉得map中定位key所在的桶依靠的是hash值的低B位（B就是hmap结构体中的B），有的博客中写的低8位我觉得不太准确。源码中相关代码是这样写的:

// bucketShift returns 1<<b, optimized for code generation.
func bucketShift(b uint8) uintptr {
    // Masking the shift amount allows overflow checks to be elided.
    return uintptr(1) << (b & (sys.PtrSize*8 - 1))
}

// bucketMask returns 1<<b - 1, optimized for code generation.
func bucketMask(b uint8) uintptr {
    return bucketShift(b) - 1
}

明显不是靠低八位来定位桶的。这篇深入理解 Golang Map我觉得讲解的对。靠高8位来定位key是正确的，源码中就是靠高8位来计算的tophash，从而去找到key的位置。

参考

标签：map,unsafe,hmap,源码,maptype,key,Go,Pointer
来源： https://www.cnblogs.com/xingyu666/p/15757960.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9