golang

性能分析实战：pprof 从入门到精通

深入掌握 Go 的性能分析工具 pprof，学会定位 CPU、内存、Goroutine 等性能瓶颈

Leeting Yan 2022-03-25 5 分钟阅读 2459 字

性能分析实战：pprof 从入门到精通

当你的 Go 程序运行缓慢或内存占用过高时，你需要一个强大的工具来找出问题所在。pprof 是 Go 标准库提供的性能分析工具，它能帮你精确定位 CPU、内存、Goroutine 等性能瓶颈。

本文将通过实战案例，带你掌握 pprof 的使用方法。

pprof 基础

什么是 pprof？

pprof 是 Go 的性能分析工具集，可以收集和分析以下类型的 profile：

CPU Profile：哪些函数占用了最多的 CPU 时间
Memory Profile：哪些函数分配了最多的内存
Goroutine Profile：当前所有 Goroutine 的堆栈
Block Profile：导致阻塞的操作
Mutex Profile：互斥锁的竞争情况

在程序中使用 pprof

package main

import (
    "net/http"
    _ "net/http/pprof"  // 自动注册 pprof handlers
    "log"
)

func main() {
    // 启动 pprof HTTP 服务
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    
    // 你的应用逻辑
    http.HandleFunc("/", handler)
    http.ListenAndServe(":8080", nil)
}

func handler(w http.ResponseWriter, r *http.Request) {
    // 处理请求
}

访问 http://localhost:6060/debug/pprof/ 可以看到所有可用的 profile。

CPU 分析

收集 CPU Profile

# 收集 30 秒的 CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

或者在代码中手动收集：

package main

import (
    "os"
    "runtime/pprof"
    "log"
)

func main() {
    // 开始 CPU profiling
    f, err := os.Create("cpu.prof")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // 你的程序逻辑
    doWork()
}

func doWork() {
    // 模拟工作
    for i := 0; i < 1000000; i++ {
        _ = i * i
    }
}

分析 CPU Profile

# 交互式分析
go tool pprof cpu.prof

# 常用命令
(pprof) top          # 显示最耗 CPU 的函数
(pprof) top -cum     # 按累积时间排序
(pprof) list FuncName  # 查看函数源码
(pprof) web          # 生成 SVG 调用图（需要 graphviz）
(pprof) png          # 生成 PNG 图片

实战案例：找出 CPU 热点

package main

import (
    "crypto/md5"
    "fmt"
    "math"
    "time"
)

// 模拟一个计算密集型任务
func computeIntensive() {
    for i := 0; i < 100000; i++ {
        // 计算 MD5
        data := []byte(fmt.Sprintf("data-%d", i))
        md5.Sum(data)
        
        // 数学计算
        math.Sqrt(float64(i))
        math.Sin(float64(i))
    }
}

// 模拟一个 IO 密集型任务
func ioIntensive() {
    time.Sleep(100 * time.Millisecond)
}

func main() {
    start := time.Now()
    
    for i := 0; i < 100; i++ {
        computeIntensive()
        ioIntensive()
    }
    
    fmt.Printf("Total time: %v\n", time.Since(start))
}

收集并分析：

go run main.go &
sleep 1
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10

分析结果：

(pprof) top
Showing nodes accounting for 8.50s, 95.29% of 8.92s total
      flat  flat%   sum%        cum   cum%
    5.20s 58.30% 58.30%     5.20s 58.30%  crypto/md5.block
    2.10s 23.54% 81.84%     7.30s 81.84%  main.computeIntensive
    0.80s  8.97% 90.81%     0.80s  8.97%  math.sqrt
    0.40s  4.48% 95.29%     0.40s  4.48%  math.sin

(pprof) list computeIntensive
Total: 8.92s
ROUTINE ======================== main.computeIntensive
    2.10s     7.30s (flat, cum) 81.84% of Total
         .          .      1:func computeIntensive() {
         .          .      2:    for i := 0; i < 100000; i++ {
    1.50s     6.70s      3:        data := []byte(fmt.Sprintf("data-%d", i))
         .          .      4:        md5.Sum(data)
         .          .      5:        
         .     0.40s      6:        math.Sqrt(float64(i))
    0.60s     0.60s      7:        math.Sin(float64(i))
         .          .      8:    }
         .          .      9:}

从结果可以看出，crypto/md5.block 占用了 58% 的 CPU 时间，这是优化的重点。

内存分析

收集 Memory Profile

# 收集堆内存分配
go tool pprof http://localhost:6060/debug/pprof/heap

# 收集分配的内存（包括已释放的）
go tool pprof http://localhost:6060/debug/pprof/allocs

或者在代码中：

package main

import (
    "os"
    "runtime/pprof"
    "log"
)

func main() {
    // 执行一些分配内存的操作
    data := make([][]byte, 1000)
    for i := range data {
        data[i] = make([]byte, 1024*1024) // 1MB
    }
    
    // 收集 heap profile
    f, err := os.Create("heap.prof")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    
    pprof.WriteHeapProfile(f)
}

分析 Memory Profile

go tool pprof heap.prof

(pprof) top
(pprof) top -inuse_space   # 按当前占用内存排序
(pprof) top -inuse_objects # 按当前对象数排序
(pprof) top -alloc_space   # 按分配内存排序
(pprof) top -alloc_objects # 按分配对象数排序

实战案例：内存泄漏检测

package main

import (
    "fmt"
    "runtime"
    "time"
)

var cache = make(map[string][]byte)

func leakMemory() {
    for i := 0; i < 100; i++ {
        key := fmt.Sprintf("key-%d", i)
        // 每次都分配新的 1MB 数据，但从不释放
        cache[key] = make([]byte, 1024*1024)
    }
}

func printMemStats() {
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    fmt.Printf("Alloc = %v MB, Sys = %v MB, NumGC = %v\n",
        m.Alloc/1024/1024,
        m.Sys/1024/1024,
        m.NumGC)
}

func main() {
    printMemStats()
    
    for i := 0; i < 10; i++ {
        leakMemory()
        time.Sleep(1 * time.Second)
        printMemStats()
    }
}

运行结果：

Alloc = 0 MB, Sys = 72 MB, NumGC = 0
Alloc = 100 MB, Sys = 174 MB, NumGC = 2
Alloc = 200 MB, Sys = 276 MB, NumGC = 3
Alloc = 300 MB, Sys = 378 MB, NumGC = 4
...

收集 heap profile：

go tool pprof heap.prof

(pprof) top -inuse_space
Showing nodes accounting for 1024MB, 99.90% of 1025MB total
      flat  flat%   sum%        cum   cum%
  1000MB 97.56% 97.56%   1000MB 97.56%  main.leakMemory
    24MB  2.34% 99.90%     24MB  2.34%  fmt.Sprintf

(pprof) list leakMemory
Total: 1025MB
ROUTINE ======================== main.leakMemory
  1000MB   1000MB (flat, cum) 97.56% of Total
         .          .     10:    for i := 0; i < 100; i++ {
         .          .     11:        key := fmt.Sprintf("key-%d", i)
  1000MB   1000MB     12:        cache[key] = make([]byte, 1024*1024)
         .          .     13:    }
         .          .     14:}

问题一目了然：make([]byte, 1024*1024) 分配了大量内存且没有释放。

Goroutine 分析

收集 Goroutine Profile

go tool pprof http://localhost:6060/debug/pprof/goroutine

实战案例：Goroutine 泄漏

package main

import (
    "fmt"
    "runtime"
    "time"
)

func leakyGoroutine() {
    ch := make(chan int)
    go func() {
        // 这个 goroutine 永远不会退出
        for {
            <-ch // 阻塞等待，但没有人发送数据
        }
    }()
}

func main() {
    fmt.Println("Initial goroutines:", runtime.NumGoroutine())
    
    for i := 0; i < 100; i++ {
        leakyGoroutine()
    }
    
    time.Sleep(1 * time.Second)
    fmt.Println("Final goroutines:", runtime.NumGoroutine())
}

输出：

Initial goroutines: 1
Final goroutines: 101

分析 goroutine profile：

go tool pprof goroutine.prof

(pprof) top
Showing nodes accounting for 100, 99.01% of 101 total
      flat  flat%   sum%        cum   cum%
       100 99.01% 99.01%        100 99.01%  main.leakyGoroutine.func1

(pprof) list leakyGoroutine
Total: 101
ROUTINE ======================== main.leakyGoroutine.func1
       100      100 (flat, cum) 99.01% of Total
         .          .      8:    go func() {
         .          .      9:        for {
       100      100     10:            <-ch
         .          .     11:        }
         .          .     12:    }()

100 个 goroutine 都阻塞在 <-ch，这就是泄漏的根源。

Block Profile

收集 Block Profile

package main

import (
    "runtime"
    "time"
)

func init() {
    // 启用 block profiling
    runtime.SetBlockProfileRate(1)
}

func main() {
    ch := make(chan int)
    
    // 模拟阻塞
    go func() {
        time.Sleep(2 * time.Second)
        ch <- 1
    }()
    
    <-ch // 阻塞 2 秒
}

go tool pprof http://localhost:6060/debug/pprof/block

分析 Block Profile

(pprof) top
(pprof) list FuncName

Block profile 会显示哪些操作导致了阻塞，以及阻塞的时间。

Mutex Profile

收集 Mutex Profile

package main

import (
    "runtime"
    "sync"
)

func init() {
    // 启用 mutex profiling
    runtime.SetMutexProfileFraction(1)
}

var mu sync.Mutex

func contention() {
    for i := 0; i < 1000; i++ {
        go func() {
            mu.Lock()
            time.Sleep(time.Millisecond)
            mu.Unlock()
        }()
    }
}

go tool pprof http://localhost:6060/debug/pprof/mutex

分析 Mutex Profile

Mutex profile 会显示互斥锁的竞争情况，帮你找出锁竞争严重的地方。

Web UI 分析

使用 Web UI

# 启动 Web UI
go tool pprof -http=:8080 cpu.prof

# 或者
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

浏览器会打开一个交互式的 Web 界面，包含：

Graph：调用关系图
Flame Graph：火焰图
Top：Top 列表
Source：源码视图

火焰图解读

火焰图是性能分析的利器：

宽度：表示函数占用的时间比例
高度：表示调用栈深度
颜色：无特殊含义（随机）

如何阅读：

找到最宽的"平台"——这是最耗时的函数
向上看调用栈——找出谁调用了它
向下看——找出它调用了什么

实战：优化真实项目

案例：HTTP 服务器性能优化

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "encoding/json"
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "sync"
)

type Request struct {
    Data string `json:"data"`
}

type Response struct {
    Hash string `json:"hash"`
}

var cache = make(map[string]string)
var mu sync.Mutex

func hashHandler(w http.ResponseWriter, r *http.Request) {
    var req Request
    json.NewDecoder(r.Body).Decode(&req)
    
    // 检查缓存（有锁竞争）
    mu.Lock()
    if hash, ok := cache[req.Data]; ok {
        mu.Unlock()
        json.NewEncoder(w).Encode(Response{Hash: hash})
        return
    }
    mu.Unlock()
    
    // 计算 SHA256（CPU 密集）
    h := sha256.New()
    h.Write([]byte(req.Data))
    hash := hex.EncodeToString(h.Sum(nil))
    
    // 写入缓存
    mu.Lock()
    cache[req.Data] = hash
    mu.Unlock()
    
    json.NewEncoder(w).Encode(Response{Hash: hash})
}

func main() {
    http.HandleFunc("/hash", hashHandler)
    http.ListenAndServe(":8080", nil)
}

性能测试

# 使用 hey 进行压力测试
hey -n 10000 -c 100 -m POST -d '{"data":"test"}' http://localhost:8080/hash

# 同时收集 profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

分析结果

(pprof) top
Showing nodes accounting for 12.50s, 85.62% of 14.60s total
      flat  flat%   sum%        cum   cum%
    5.20s 35.62% 35.62%     5.20s 35.62%  crypto/sha256.block
    3.80s 26.03% 61.64%     9.00s 61.64%  sync.(*Mutex).Lock
    2.10s 14.38% 76.03%     2.10s 14.38%  runtime.lock
    1.40s  9.59% 85.62%     1.40s  9.59%  encoding/json.Marshal

问题：

SHA256 计算占用 35% CPU
Mutex 锁竞争占用 26% CPU
JSON 序列化占用 9% CPU

优化方案

// 优化 1：使用 sync.Map 减少锁竞争
var cache sync.Map

func hashHandler(w http.ResponseWriter, r *http.Request) {
    var req Request
    json.NewDecoder(r.Body).Decode(&req)
    
    // 无锁读取缓存
    if hash, ok := cache.Load(req.Data); ok {
        json.NewEncoder(w).Encode(Response{Hash: hash.(string)})
        return
    }
    
    // 计算 SHA256
    h := sha256.New()
    h.Write([]byte(req.Data))
    hash := hex.EncodeToString(h.Sum(nil))
    
    // 无锁写入缓存
    cache.Store(req.Data, hash)
    
    json.NewEncoder(w).Encode(Response{Hash: hash})
}

// 优化 2：使用 sync.Pool 复用对象
var hashPool = sync.Pool{
    New: func() interface{} {
        return sha256.New()
    },
}

func hashHandler(w http.ResponseWriter, r *http.Request) {
    // ...
    
    // 复用 hasher
    h := hashPool.Get().(hash.Hash)
    defer hashPool.Put(h)
    h.Reset()
    
    h.Write([]byte(req.Data))
    hash := hex.EncodeToString(h.Sum(nil))
    
    // ...
}

pprof 技巧

对比两个 Profile

# 对比优化前后的 CPU profile
go tool pprof -diff_base=cpu_before.prof cpu_after.prof

过滤和聚焦

(pprof) focus=main      # 只显示包含 main 的路径
(pprof) ignore=runtime  # 忽略 runtime 包
(pprof) trim_prefix=github.com/user/project  # 去除前缀

标签过滤

// 在代码中添加标签
ctx := pprof.WithLabels(context.Background(), 
    pprof.Labels("endpoint", "/api/users"))
pprof.SetGoroutineLabels(ctx)

(pprof) tagfocus=endpoint:/api/users

总结

pprof 是 Go 性能优化的利器：

常用 Profile：

CPU：找出耗时函数
Heap：找出内存分配热点
Goroutine：检测 goroutine 泄漏
Block：找出阻塞操作
Mutex：找出锁竞争

分析流程：

收集 profile 数据
使用 top 找出热点
使用 list 查看源码
使用 Web UI 可视化分析
优化代码
对比优化效果

最佳实践：

在生产环境启用 pprof（注意安全性）
定期收集和分析 profile
建立性能基准线
使用火焰图快速定位问题
优化后验证效果

记住：不要猜测性能问题，用数据说话。pprof 会告诉你真相。

继续阅读

探索更多技术文章

浏览归档，发现更多关于系统设计、工具链和工程实践的内容。

全部文章返回首页