-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
569 lines (303 loc) · 209 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>枫叶居</title>
<subtitle>桃李春风一杯酒,江湖夜雨十年灯</subtitle>
<link href="/atom.xml" rel="self"/>
<link href="https://keepalive555.github.io/"/>
<updated>2021-04-10T17:59:55.735Z</updated>
<id>https://keepalive555.github.io/</id>
<author>
<name>Lan Wei, Wang</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>服务雪崩历险记(上)</title>
<link href="https://keepalive555.github.io/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/"/>
<id>https://keepalive555.github.io/2021/04/10/服务雪崩/</id>
<published>2021-04-10T14:09:03.000Z</published>
<updated>2021-04-10T17:59:55.735Z</updated>
<content type="html"><![CDATA[<h1><span id="服务雪崩历险记上">服务雪崩历险记(上)</span></h1><p>服务雪崩——作为微服务架构中的经典问题,之前只是在技术博客中看到过,没想到自己有一天也遇到了,由于首次处理此类问题,经验较为欠缺,走了一些弯路,在此记录<strong>排查思路</strong>与<strong>解决方案</strong>,<strong>服务雪崩</strong>的概念可以参考网上技术文章,在此不做过多赘述。</p><h1><span id="线上现象">线上现象</span></h1><p>今天上午刚刚到公司,便收到<font color="#ff0000">【天气服务】CPU使用率超限报警</font>,上午一般是百度APP流量低峰期,因此笔者感觉比较奇怪,于是便打开报警链接,发现<strong>北京机房</strong>实例CPU使用率达到了惊人的<strong>245%**,远远超过了</strong>70%**的阈值(事后非常庆幸,笔者撰文时所有容器已开启资源硬限,雪崩发生时尚未开启CPU资源硬限,否则服务可用性可能会跌到个位数=_=),如图所示:</p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/cpu.png" alt="CPU使用率"></p><p>笔者第一时间打开业务监控,查看接口监控指标。如下图,可以明显看到接口流量有若干个波峰,最高的波峰上涨了约80%(8000Q左右)。<font color="#ff0000">情理之中,意料之外的是——接口可用性、接口平响同样出现若干个波峰,依赖的所有下游服务可用性均下跌严重</font>。很明显【天气服务】出现了雪崩的迹象,但全线上涨的监控指标,掩盖了问题发生的根源,无从下手定位问题原因。<strong>雪崩的时候没有一片雪花是无辜的。</strong></p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/qps.png" alt="流量"></p><h1><span id="问题定位">问题定位</span></h1><h2><span id="初步排查">初步排查</span></h2><p>因为无【服务雪崩】相关排查与处理经验,未能直击要害点。根据<strong>【止损优先】</strong>的原则,笔者凭直觉(经验),先列出可能的原因,尝试止损:</p><ol><li>Go模块未知Bug被触发(例如:Goroutine泄露、full GC),导致CPU使用率急剧飙升,流量急剧升高,下游服务可用性急剧变差;</li><li>端发生大规模崩溃,频繁重启,流量上涨,导致CPU过载;</li><li>下游服务不稳定,导致【天气接口】大量超时,触发上游服务重试,导致CPU使用率飙升;</li></ol><p>首先,选择三台问题实例(容器),查看<strong>实例CPU使用率</strong>监控指标,三台实例CPU使用率波峰出现时间点完全一致,且且内存使用率、已打开文件描述符等等监控均未见异常,基本排除实例内Go模块未知Bug引起CPU使用率飙升,进而引起连锁反应可能性。深入跟踪,查看三台实例所在物理机的CPU使用率(未开启资源硬限,混布服务存在资源侵占的可能性),CPU IDLE指标较高,计算资源充裕,排除服务混布,其它服务实例资源占用的可能性(**<em>注:CPU使用率升高与流量升高,是个鸡生蛋蛋生鸡的问题,CPU使用率升高,会导致部分请求处理不及时,引发上游服务重试,导致流量上涨、可用性下跌等等,所以不能武断**</em>)。综合实例CPU使用率与物理机CPU使用率,基本排除CPU性能瓶颈(延伸问题:若容器开启硬限制,如何排除)。</p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/shili_cpu.png" alt="实例CPU使用率图"></p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/jiqi.png" alt="物理机CPU使用率图"></p><p>端发生大规模崩溃,【天气服务】接口流量会上涨,调用的其它接口都会比较明显的上涨。但观察其它接口正常,端崩溃率监控正常,可能性2排除。</p><p>在可能性1与可能性2基本排除后,**<em>笔者几乎可以肯定是下游服务可用性出现波动,拖累天气服务超时,触发接入层重试**</em>,但监控中所有下游服务可用性(上游服务调用下游成功率)均大幅度下跌,无参考价值。分析线上RPC请求日志,失败原因均为”request canceled (client.timeout exceeded while awaiting header”,即<strong>读超时</strong>。一般情况下,因CPU使用率升高导致请求下游服务失败,在建立连接阶段就失败了,此处发生的错误为读超时,更加坚定笔者的判断。<font color="#ff0000">看上去真凶呼之欲出了,诡异的打脸马上来了,询问了所有下游业务方,业务方均反馈服务流量有上涨,但服务可用性、平响、CPU使用率均正常</font>。</p><h2><span id="尝试止损">尝试止损</span></h2><p>虽然尚未完全定位原因,但基本可以确定流量上涨的原因为【天气服务】上游服务重试导致的,『请求重试』加重了系统的负载,移除上游服务的重试应当是有效的。【天气服务】架构图如下(基于安全考虑,已屏蔽细节,实际架构有出入,不影响理解):</p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/xuebeng.png" alt="架构图"></p><p>可以看到【流量网关】 => 【业务网关】=> 【天气服务】=> 【下游服务】存在3层重试,假如相邻的上下两层,请求超时,触发重试,到【下游服务】的流量最大会被放大至正常流量的8倍(2^3=8),很容易发生<strong>服务雪崩</strong>。流量网关,因接入了众多产品线,摘除『重试』风险较大,在【业务网关】与【天气服务】这两层,将『重试』逻辑移除。<strong>【天气服务】与【下游服务】流量逐渐下降,CPU使用率开始逐渐下降,止损操作已生效</strong>。</p><h2><span id="问题定位">问题定位</span></h2><p>摘除【业务网关】与【天气服务】的请求『重试』之后,【天气服务】流量趋于正常。因为【流量网关】『重试』尚未摘除,【天气服务】流量波峰与CPU波峰依然存在,参考价值依然不大。下游服务较为明显,陆续恢复正常,除了——<strong>Push地址位置同步服务</strong>。<font color="#ff0000">显然该服务是此次雪崩的真凶,那么为什么业务方观察的现业务可用性、平响、CPU使用率均正常,如此诡异呢</font>。</p><h3><span id="长尾请求">长尾请求</span></h3><p>笔者在将【天气服务】请求Push地址位置同步服务Timeout时间调小进行止损的同时,开始分析原因,因为RPC报错日志大多数是读超时,首先想到的便是存在长尾请求(有关长尾请求与分位时参考笔者上一篇博文:<a href="https://keepalive555.github.io/2020/09/24/%E9%95%BF%E5%B0%BE%E8%AF%B7%E6%B1%82/%EF%BC%89%EF%BC%8C%E7%BB%9F%E8%AE%A1RPC%E6%97%A5%E5%BF%9785%E5%88%86%E4%BD%8D%E6%97%B6%E5%A6%82%E4%B8%8B%EF%BC%88%E5%AE%9E%E9%99%85%E4%B8%8A%E8%AF%B7%E6%B1%82%E8%AF%A5%E6%9C%8D%E5%8A%A1%E7%9A%84RPC%E6%97%A5%E5%BF%9750%E5%88%86%E4%BD%8D%E6%97%B6%E4%B9%9F%E5%B0%86%E8%BF%91400ms%EF%BC%89%EF%BC%9A">https://keepalive555.github.io/2020/09/24/%E9%95%BF%E5%B0%BE%E8%AF%B7%E6%B1%82/),统计RPC日志85分位时如下(实际上请求该服务的RPC日志50分位时也将近400ms):</a></p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/fenweishi.png" alt="85分位时"></p><p>而业务方监控平响峰值在5ms以内(如图所示),即使RPC日志中前50%请求耗时为0ms,接口平响也有200ms,相差40倍。于是笔者与业务方RD开始梳理请求全链路,查找线索。</p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/mapisss.png" alt="!mapi平响"></p><h3><span id="请求链路梳理">请求链路梳理</span></h3><p>与业务方RD沟通后得知,下游业务采用『Nginx+PHP』的部署方式,每台实例前端均部署一台Nginx用作反向代理, 若干PHP进程(线程)处理请求。请求链路:【天气服务】=> 【实例Nginx】=>【业务方PHP进程】。业务方RD在Nginx日志中发现了大量HTTP 499错误码的请求,HTTP 499表示客户端因请求超时关闭请求,与【天气服务】RPC日志表现一致,耗时500ms,问题基本定位——**<em>495ms的平响时间差是在【实例Nginx】转发至【业务方PHP进程】的过程中产生的**</em>。</p><p>由于Nginx日志可供参考的信息有限,在OP同学的帮助下最终定位了原因——PHP的虚拟机Worker线程处理能力达到了上限,后续到达的请求排队等候处理,直至超时,**<em>类似于限流算法中的漏桶算法**</em>。<font color="#ff0000">这也是业务服务可用性、平响、CPU使用率均正常这种诡异现象的原因</font>。</p><p><img src="/2021/04/10/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/queuss.png" alt="队列堆积"></p><h3><span id="导火索">导火索</span></h3><p>原因已定位,那导致请求堆积的罪魁祸首是什么呢——Push消息推送。年底运营活动较多,通过Push渠道进入百度APP的用户变多,下游业务吞吐量见顶。</p><h3><span id="事件复盘">事件复盘</span></h3><p>“重试”属于控制论中的“正反馈”,会逐渐增强“”活动“——“雪崩”触发”重试”,“重试”强化“雪崩”程度,所以若发生“服务雪崩”可以且应当首先考虑调整“重试”策略。此次【服务雪崩】发生的逻辑链如下:</p><ol><li>年底各业务方运营活动增多,Push推送频繁,“Push集群”流量逐渐上涨</li><li>“Push集群”实例PHP虚拟机Worker线程全部被占用,并发处理能力达到上限</li><li>“天气服务”请求“Push服务”,PHP虚拟机在处理其它请求,请求排队,读超时,此次请求失败</li><li>“天气服务”请求”Push服务”超时,触发RPC请求重试,“天气服务”再次请求“Push服务”</li><li>“天气请求”整体处理超时,触发“天气服务”上游“业务网关”重试策略,发起天气请求</li><li>“天气服务”再次对所有“下游服务”发起请求,流量被放大到至4倍</li><li>因为下游所有服务负载加大,“业务网关”处理”天气请求”超时,触发“流量网关”请求重试</li><li>“天气服务”再次对所有“下游服务”发起请求,流量被放大到至8倍</li><li>“天气服务”所有下游服务流量上涨、可用性均下跌、平响升高</li><li>“服务雪崩”ಥ_ಥ</li></ol><h1><span id="服务雪崩解决方案">服务雪崩解决方案</span></h1><p>由于【天气服务】是由PHP模块迁移而来,尚未接入手百的Service Mesh,所以止损方案有限。服务雪崩是微服务架构中常见的问题,解决方案也比较成熟,笔者在下一篇博文中讲述,常见方案参考:</p><ul><li>入口流量限流</li><li>访问下游服务增加断路器</li><li>异步请求弱依赖服务</li><li>Locality-aware load balancing路由算法</li><li>……</li></ul>]]></content>
<summary type="html">
<h1><span id="服务雪崩历险记上">服务雪崩历险记(上)</span></h1><p>服务雪崩——作为微服务架构中的经典问题,之前只是在技术博客中看到过,没想到自己有一天也遇到了,由于首次处理此类问题,经验较为欠缺,走了一些弯路,在此记录<strong>排查思路</s
</summary>
<category term="系统架构" scheme="https://keepalive555.github.io/categories/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84/"/>
<category term="微服务" scheme="https://keepalive555.github.io/tags/%E5%BE%AE%E6%9C%8D%E5%8A%A1/"/>
<category term="服务雪崩" scheme="https://keepalive555.github.io/tags/%E6%9C%8D%E5%8A%A1%E9%9B%AA%E5%B4%A9/"/>
<category term="重试" scheme="https://keepalive555.github.io/tags/%E9%87%8D%E8%AF%95/"/>
</entry>
<entry>
<title>长尾请求与分位时</title>
<link href="https://keepalive555.github.io/2020/09/24/%E9%95%BF%E5%B0%BE%E8%AF%B7%E6%B1%82/"/>
<id>https://keepalive555.github.io/2020/09/24/长尾请求/</id>
<published>2020-09-23T16:44:18.000Z</published>
<updated>2020-10-21T16:07:00.081Z</updated>
<content type="html"><![CDATA[<h1><span id="写在前面">写在前面</span></h1><p>本文章为笔者原创,转载需要表明出处,联系作者:<a href="mailto:luckydreamcatcher@163.com">luckydreamcatcher@163.com</a> | <a href="mailto:the.matrix.vvv@gmail.com">the.matrix.vvv@gmail.com</a></p><p>QA同学在线上测试重构后的<strong>golang模块</strong>时发现,会偶现后端响应<strong>超时</strong>的现象。在之前的压测中,接口监控<strong>响应</strong>稳定在10ms左右,所以猜测存在<strong>长尾请求</strong>。</p><h1><span id="目前问题">目前问题</span></h1><h2><span id="监控指标">监控指标</span></h2><p>目前业务监控系统,反应接口耗时的系统指标为——<strong>平响</strong>,即平均响应时间=<strong>单位时间内所有请求耗时总和/请求数</strong>。</p><p>平均数并不能够反应数据的波动情况,例如:请求a耗时10ms(记为cost(a)=10ms),请求b耗时300ms(记为cost(b)=300ms),请求a与请求b的平均响应时间= cost(a, b) = (cost(a) + cost(b)) / 2 =<strong>155ms</strong> 。平均耗时155ms(<=200ms)是达标的,但是请求b耗时300ms明显是未达标的。</p><p>APP后端研发工程师,都了解对端接口请求耗时<strong>200ms</strong>是一个临界阈值——请求耗时200ms以下,用户对网络延迟几乎无感,体验较好,请求耗时200ms以上,网络延迟感明显,用户体验较差。因此请求耗时是否<=200ms经常作为接口性能优化的判断条件之一。在业务中,经常会遇到<strong>命中缓存</strong>与<strong>未命中缓存</strong>时耗时差距较大的场景,<strong>所以平响无法全面的衡量系统的性能</strong>。</p><h2><span id="长尾请求">长尾请求</span></h2><p>业界关于延迟有一个常用的<a href="https://stackoverflow.com/questions/12808934/what-is-p99-latency">P99标准</a>,即99%的请求应该比指定的延迟更小,仅允许1%的请求大于指定的延迟,这1%的请求即为”长尾请求”。打个形象的比喻,班级内99%同学的成绩都非常优秀,但总会有几位同学拖班级平均成绩后腿儿,拉低班级的“平均分,这几位同学就是“长尾请求”。</p><p>长尾请求的产生原因是多种多样的且复杂的,包括实现方式、系统因素、硬件因素等等,在分布式中常见原因如下:</p><ul><li>依赖的下游服务有波动;</li><li>资源竞争(包括:文件、锁、硬件资源);</li><li>网络波动;</li><li>机器负载较大,系统调度,排队;</li><li>fullGC;</li><li>CPU降低功率控制温度;</li></ul><p>有关长尾请求更多介绍于技术优化思路,参考Google Jeff Dean大神的论文:<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.732.6087&rep=rep1&type=pdf%E3%80%82">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.732.6087&rep=rep1&type=pdf。</a></p><p>长尾请求在某种意义上来讲是无法消除的,但是我们可以通过技术手段将长尾请求控制在一定的比例之内,<strong>因此长尾请求也是很多性能优化工作的关注重点</strong>。由于长尾请求的存在,<strong>平响</strong>指标无法很好的反应绝大多数请求的耗时情况,因此有了<strong>分位时</strong>的概念,通俗的理解就是xx%的耗时在多少之内。</p><h2><span id="分位时">分位时</span></h2><h3><span id="概念介绍">概念介绍</span></h3><p>分位数,是统计学的一个术语,概念如下:</p><blockquote><p>百分位数又称百分位分数(percentile),是一种相对地位量数,它是次数分布(Frequency Distribution,频数分布)中的一个点。把一个次数分布排序后,分为 100 个单位,百分位数就是次数分布中相对于某个特定百分点的原始分数,它表明在次数分布中特定个案百分比低于该分数。</p></blockquote><p>通俗的讲,<strong>将数据按照升序(或降序)排列,等分为100份,在P=0.9(即99%)位置的数是多少</strong>。例如:全校800名学生,80分位数,指80%的学生考分在多少分以上,我们可以这样计算:</p><ol><li>将800名学生成绩,按照从高到低的降序排列;</li><li>800名同学80%的名次为:800 * 80% = 640;</li><li>全校成绩排名第640名的学生成绩即我们所需的80分位数;</li></ol><p>现实中,存在<code>total(总数) * percent(百分比)</code>为浮点数的情况,例如9名学生的分数分别为:100,88,89,90,95,70,65,78,79,求90分位数,按照上述思路来计算:</p><ol><li>将9名学生成绩,按照从高到低的升序排列为:100, 95, 90, 89, 88, 79, 78, 70, 65;</li><li>9名同学90%的名次为:9 * 90% = 8.1;</li></ol><p>问题来了,第8.1名学生的成绩为多少?显然不存在第8.1名学生,假如存在的话,那么第8.1名学生的成绩一定在第8名与第9名之间。拆开来看,第8.1名学生成绩等价于在第8名学生成绩基础上,加上第9名与第8名成绩之差乘以10%=score(8)+(score(9)-score(8))*10% = 70 + (65 - 70)*10% =69.50,即这9名学生的90分位数为69.50分(注意:假设第9名与第8名成绩区间是分布均匀的,实际上样本数量较少时波动比较大,随着样本数量变大趋向于均匀)。</p><p>总结分位数计算规则如下:</p><ol><li>将输入数组升序/降序排列,数组长度为n;</li><li>求数组[0, n)的P%的下标,m = n*P% - 1 = i + j,i代表整数部分,j代表小数部分;</li><li>求下标为m的元素值 f(m) = f(i) + (f(j) - f(i)) * j;</li></ol><p>参考上述,可得分位时,是将所有请求耗时由小至大升序排列,求得分位数。</p><h3><span id="计算工具">计算工具</span></h3><p>计算分位时的工具,可参考笔者写的简易<a href="https://github.com/keepalive555/victorinox/blob/main/src/percentile.py">Python脚本</a></p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl -L -O https://raw.githubusercontent.com/keepalive555/victorinox/main/src/percentile.py</span><br></pre></td></tr></table></figure><p>求一批请求耗时的99分位时,Linux示例命令如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cat service.log|grep -o -P <span class="string">"cost\[\d+(\.|\])?"</span>|grep -o -P <span class="string">"\d+"</span>|./percentile.py</span><br></pre></td></tr></table></figure><p>在笔者的案例中,取生产环境日志约<strong>10w</strong>条,求得重构后<code>golang</code>接口,99.9分位时为200ms,平响为10ms,差距是要比想想中的要大的多,所以关注系统性能指标不只需要关注<strong>平响</strong>,也需要关注<strong>分位时</strong>。</p><h2><span id="优化思路">优化思路</span></h2><p>长尾请求的产生原因是多种多样的,分布式系统中<strong>最常见的</strong>场景是受下游服务拖累,例如:MySQL慢查询、分布式缓存过期、下游服务过载等等,合理设置下游服务超时时间是非常有必要的。</p><p>目前许多流行的RPC框架,提供了解决长尾请求的方案——<code>Backup Request</code>,例如百度内部的BRPC框架。客户端首先向一台下游服务Server发送RPC请求,若在<code>backup_request_ms</code>(通常小于超时时间)内未取到数据,则在向下游服务另外一台Server发送RPC请求,哪台Server先响应则取哪条。设置合理的<code>backup_request_ms</code>,大部分情况下只会发一个请求,对下游服务的压力可以不计。</p><p>目前了解到,百度小程序C端团队,在做<code>BackupRequest</code>的改造,准备借鉴一下^_^。</p><h1><span id="参考资料">参考资料</span></h1><p><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.732.6087&rep=rep1&type=pdf">The tail at scale</a></p><p><a href="https://juejin.im/post/6844903904371539975">经典分布式论文阅读:The Tail at Scale</a></p><p><a href="https://baike.baidu.com/item/%E7%99%BE%E5%88%86%E4%BD%8D%E6%95%B0/10064171?fr=aladdin">百分位数</a></p><p><a href="https://www.cnblogs.com/liuning8023/p/3531900.html">分位数</a></p>]]></content>
<summary type="html">
<h1><span id="写在前面">写在前面</span></h1><p>本文章为笔者原创,转载需要表明出处,联系作者:<a href="mailto:&#x6c;&#117;&#99;&#107;&#121;&#100;&#x72;&#x65;&#97;&#109;&#99
</summary>
<category term="性能优化" scheme="https://keepalive555.github.io/categories/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/"/>
<category term="Golang" scheme="https://keepalive555.github.io/tags/Golang/"/>
<category term="性能优化" scheme="https://keepalive555.github.io/tags/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/"/>
<category term="长尾请求" scheme="https://keepalive555.github.io/tags/%E9%95%BF%E5%B0%BE%E8%AF%B7%E6%B1%82/"/>
<category term="分位时" scheme="https://keepalive555.github.io/tags/%E5%88%86%E4%BD%8D%E6%97%B6/"/>
</entry>
<entry>
<title>Golang性能分析</title>
<link href="https://keepalive555.github.io/2020/09/20/Golang%E6%80%A7%E8%83%BD%E5%88%86%E6%9E%90/"/>
<id>https://keepalive555.github.io/2020/09/20/Golang性能分析/</id>
<published>2020-09-20T08:47:49.000Z</published>
<updated>2020-10-21T15:58:03.518Z</updated>
<content type="html"><![CDATA[<h1><span id="写在前面">写在前面</span></h1><p>本文章为笔者原创,转载需要表明出处,联系作者:<a href="mailto:luckydreamcatcher@163.com">luckydreamcatcher@163.com</a> | <a href="mailto:the.matrix.vvv@gmail.com">the.matrix.vvv@gmail.com</a></p><p>笔者最近在做<strong>golang重构旧php模块</strong>的事情,PHP模块峰值请求约1.5w QPS,是典型的高并发场景,重构过程中,代码中一些容易被开发者**”选择性忽略”<strong>的问题会被指数级放大,比如内存泄露、full GC等等,所以</strong>上线/放量**前必须进行压力测试。</p><p>为了更贴近生产环境,与QA同学合作,将重构后的golang模块部署到生产集群,选择一台20标准CPU核的实例,从服务列表中摘除,做压力测试。预期单实例的配置,可以扛住400QPS,在压力测试过程中发现,并发达到400QPS时,实例CPU使用率达到100%,成为性能瓶颈。</p><p><strong>节约机器资源作为golang重构旧php的重要收益之一</strong>,为了达成此目标,笔者必须解决golang模块的cpu性能瓶颈,达到预期性能。</p><h1><span id="cpu性能分析">CPU性能分析</span></h1><p>CPU性能分析,又称为CPU Profiling,下面介绍了三种笔者常用的性能分析手段:</p><ul><li>go tool pprof命令行工具</li><li>go tool pprof可视化工具</li><li>FlameGraph火焰图</li></ul><p>go tool pprof工具非常强大,性能分析不止这三中方式,可根据业务场景自由选择。实践中,笔者推荐使用<code>go tool pprof</code>与<code>Flame Graph</code>两种方式相结合。</p><p>为了在不影响阅读的前提下,保证服务安全,文章的敏感信息,笔者均用”xxx”进行了替换。</p><h2><span id="cpu-profiling原理">CPU Profiling原理</span></h2><p><strong>借助工具进行CPU Profiling之前,我们需要了解CPU Profiling的基本原理,这样才可以对数据做出更准确的判断</strong>。然而许许多多Google搜索到的技术博客,几乎千篇一律的都是在介绍golang pprof工具的使用。笔者在阅读golang <code>runtime/pprof</code>源码的基础上,借鉴了Linux <code>perf</code>工具的工作原理,说明一下。</p><p>Golang pprof默认会以100Hz(1秒100次)的频率,采集各个goroutine调用栈。假设函数<code>foo</code>在采样时,位于调用栈栈顶,则可以认为当前goroutine在执行<code>foo</code>函数,假如100次采样,<code>foo</code>函数30次位于调用栈栈顶,则可以认为<code>foo</code>函数执行消耗30%。了解了基本原理,下面我们便可以借助工具进行分析。</p><h2><span id="golang模块开启profiling">Golang模块开启Profiling</span></h2><p>Golang官方提供强大的<code>runtime/pprof</code>包,用于Golang程序的Profiling。<code>runtime/pprof</code>包功能强大,但对于需长久运行的服务,不够方便。在生产环境中,建议开启<code>http pprof</code>,通过Web服务提供Profiling数据,方便直接使用浏览器查看或其它分析工具拉取数据进行进一步分析。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> _ <span class="string">"net/http/pprof"</span></span><br><span class="line">)</span><br></pre></td></tr></table></figure><p><code>net/http/pprof</code>包<code>init</code>初始化函数会在默认<code>HTTP Server</code>注册几个路由,将<code>runtime/pprof</code>的输出包装为<code>http</code>服务的响应,逻辑比较简单,可以参考阅读<code>net/http/pprof</code>包源码,此处不做赘述。</p><h2><span id="go-tool-pprof命令行工具">go tool pprof命令行工具</span></h2><p>采用Golang自带的pprof命令行工具,进行CPU性能分析:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">go tool pprof http://xxxx.baidu.com:2021/debug/pprof/profile?seconds=120</span><br></pre></td></tr></table></figure><p>go tool pprof会将服务端http响应数据写入本地文件(本地文件默认存储<code>/root/pprof</code>目录下,输入go tool pprof <filepath>即可分析本地文件),运行2min之后,自动进入交互式命令行,使用<code>top</code>命令即可查看CPU耗时排行:</filepath></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">(pprof) top</span><br><span class="line">Showing nodes accounting <span class="keyword">for</span> 18.09s, 35.20% of 51.39s total</span><br><span class="line">Dropped 856 nodes (cum <= 0.26s)</span><br><span class="line">Showing top 10 nodes out of 246</span><br><span class="line"> flat flat% sum% cum cum%</span><br><span class="line"> 3.20s 6.23% 6.23% 3.58s 6.97% syscall.Syscall</span><br><span class="line"> 2.43s 4.73% 10.96% 10.65s 20.72% runtime.mallocgc</span><br><span class="line"> 2.06s 4.01% 14.96% 2.10s 4.09% encoding/json.stateInString</span><br><span class="line"> 2.02s 3.93% 18.89% 3.97s 7.73% runtime.scanobject</span><br><span class="line"> 1.61s 3.13% 22.03% 4.76s 9.26% encoding/json.checkValid</span><br><span class="line"> 1.43s 2.78% 24.81% 1.43s 2.78% runtime.usleep</span><br><span class="line"> 1.39s 2.70% 27.52% 5.18s 10.08% runtime.mapassign_faststr</span><br><span class="line"> 1.36s 2.65% 30.16% 1.67s 3.25% encoding/json.unquoteBytes</span><br><span class="line"> 1.30s 2.53% 32.69% 1.45s 2.82% net/url.unescape</span><br><span class="line"> 1.29s 2.51% 35.20% 1.86s 3.62% encoding/json.(*decodeState).rescanLiteral</span><br></pre></td></tr></table></figure><p>标注:</p><ul><li>flat: 函数(不包含子函数)执行耗时;</li><li>flat%:函数执行耗时占抽样时间百分比;</li><li>sum%: 此<strong>行</strong>(包括)之前,flat%之和;</li><li>cum: 函数(包含调用的子函数)执行耗时;</li><li>cum%: 函数(包含调用的子函数)的执行耗时占抽样时间百分比;</li></ul><p>top命令默认显示前10条数据,按照flat列降序排列。<strong>虽然定位了CPU耗时较高的函数,但是粒度较细,并不能直观反应产生性能瓶颈的代码</strong>。可以指定<code>-cum</code>参数,显示函数累加执行耗时排行,键入命令<code>top20 -cum</code>:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">(pprof) top20 -cum</span><br><span class="line">Showing nodes accounting <span class="keyword">for</span> 0.61s, 1.19% of 51.39s total</span><br><span class="line">Dropped 856 nodes (cum <= 0.26s)</span><br><span class="line">Showing top 20 nodes out of 246</span><br><span class="line"> flat flat% sum% cum cum%</span><br><span class="line"> 0.04s 0.078% 0.078% 41.26s 80.29% net/http.(*conn).serve</span><br><span class="line"> 0.04s 0.078% 0.16% 38.25s 74.43% github.com/gin-gonic/gin.(*Context).Next</span><br><span class="line"> 0.03s 0.058% 0.21% 38.24s 74.41% icode.baidu.com/baidu/gdp/gdp.WebHandlerFunc.toGinHandlerFunc.func1</span><br><span class="line"> 0 0% 0.21% 38.17s 74.28% github.com/gin-gonic/gin.(*Engine).handleHTTPRequest</span><br><span class="line"> 0 0% 0.21% 38.06s 74.06% github.com/gin-gonic/gin.(*Engine).ServeHTTP</span><br><span class="line"> 0 0% 0.21% 37.96s 73.87% net/http.serverHandler.ServeHTTP</span><br><span class="line"> 0.01s 0.019% 0.23% 36.86s 71.73% github.com/gin-gonic/gin.RecoveryWithWriter.func1</span><br><span class="line"> 0 0% 0.23% 36.86s 71.73% icode.baidu.com/baidu/gdp/gdp.ginHandler2WebHandler.func1</span><br><span class="line"> 0.07s 0.14% 0.37% 36.80s 71.61% icode.baidu.com/baidu/gdp/gdp.ShoubaiTowerLogware</span><br><span class="line"> 0 0% 0.37% 32.46s 63.16% icode.baidu.com/baidu/gdp/gdp.recovery</span><br><span class="line"> 0.01s 0.019% 0.39% 32.45s 63.14% icode.baidu.com/baidu/xxxxxx/xxxxloc/middlewares.Recovery</span><br><span class="line"> 0.03s 0.058% 0.45% 32.41s 63.07% icode.baidu.com/baidu/xxxxxx/xxxxloc/middlewares.PProfAuth</span><br><span class="line"> 0 0% 0.45% 31.95s 62.17% icode.baidu.com/baidu/xxxxxx/xxxxloc/middlewares.ParseParams</span><br><span class="line"> 0.04s 0.078% 0.53% 30.18s 58.73% icode.baidu.com/baidu/xxxxxx/xxxxloc/controllers.(*WeatherController).GetIndexWeather</span><br><span class="line"> 0.04s 0.078% 0.6% 27.44s 53.40% icode.baidu.com/baidu/xxxxxx/xxxxloc/models/service/page/weather.(*WeatherIndexIphone).GetData</span><br><span class="line"> 0.01s 0.019% 0.62% 25.48s 49.58% icode.baidu.com/baidu/xxxxxx/xxxxloc/models/service/data/weather.(*DataWeatherCommon).GetWeatherData</span><br><span class="line"> 0.02s 0.039% 0.66% 21.03s 40.92% encoding/json.Unmarshal</span><br><span class="line"> 0 0% 0.66% 16.16s 31.45% encoding/json.(*decodeState).unmarshal</span><br><span class="line"> 0.04s 0.078% 0.74% 16.16s 31.45% encoding/json.(*decodeState).value</span><br><span class="line"> 0.23s 0.45% 1.19% 16.15s 31.43% encoding/json.(*decodeState).object</span><br></pre></td></tr></table></figure><p>由上可以观察到,系统执行流程大概包括了http框架、controller层、page层、data层,符合调用堆栈。encoding/json.Unmarshal函数累积执行耗时占总样本百分比为<strong>40.92%**,很明显不合理,是系统</strong>性能瓶颈**。go tool pprof命令行工具使用简单方便,无需要借助工具,但是表达不直观,我们可以借助下面提到的两种方式——以图或火焰图的形式。</p><h2><span id="go-tool-pprof可视化工具">go tool pprof可视化工具</span></h2><p>go tool pprof命令行支持-png、-svg、-pdf等选项,输出png图片、svg图片、pdf文档。go tool pprof此功能依赖graphviz组件。</p><h3><span id="安装graphviz">安装graphviz</span></h3><p>graphviz组件依赖较多,建议解决各个linux发行版本的包管理器进行安装,源码安装参考官方:<a href="http://www.graphviz.org/download/source/%E3%80%82">http://www.graphviz.org/download/source/。</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># debian</span></span><br><span class="line">apt-get install -y graphviz</span><br><span class="line"><span class="comment"># macos</span></span><br><span class="line">brew install graphviz</span><br></pre></td></tr></table></figure><h3><span id="导出图片">导出图片</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">go tool pprof -png http://xxxx.baidu.com:2021/debug/pprof/profile?seconds=120 >> profile.png</span><br></pre></td></tr></table></figure><h2><span id="火焰图">火焰图</span></h2><p>火焰图(FlameGraph)能直观的反映出系统的执行情况,是一种性能分析利器。Golang语言pprof工具暂不支持导出火焰图,需要安装第三方工具。笔者推荐使用由Uber开源的go-torch。</p><h3><span id="安装flamegraph分析工具">安装FlameGraph分析工具</span></h3><p>安装go-torch:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">go install github.com/uber/go-torch</span><br></pre></td></tr></table></figure><p>安装go-torch依赖——FlameGraph脚本:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 下载FlameGraph</span></span><br><span class="line">wget https://github.com/brendangregg/FlameGraph/archive/master.zip</span><br><span class="line"><span class="comment"># 解压</span></span><br><span class="line">unzip master.zip</span><br><span class="line"><span class="comment"># 移动至/opt目录</span></span><br><span class="line">sudo mv FlameGraph-master /opt/FlameGraph</span><br><span class="line"><span class="comment"># 添加至系统Path中</span></span><br><span class="line"><span class="built_in">echo</span> <span class="string">'export PATH=$PATH:/opt/FlameGraph'</span> |sudo tee -a /etc/profile && <span class="built_in">source</span> /etc/profile</span><br></pre></td></tr></table></figure><p>go-torch工具安装成功,运行<code>go torch --help</code>查看帮助信息:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">Usage:</span><br><span class="line"> go-torch [options] [binary] <profile <span class="built_in">source</span>></span><br><span class="line"></span><br><span class="line">pprof Options:</span><br><span class="line"> -u, --url= Base URL of your Go program (default: http://localhost:8080)</span><br><span class="line"> --suffix= URL path of pprof profile (default: /debug/pprof/profile)</span><br><span class="line"> -b, --binaryinput= File path of previously saved binary profile. (binary profile is anything accepted by</span><br><span class="line"> https://golang.org/cmd/pprof)</span><br><span class="line"> --binaryname= File path of the binary that the binaryinput is <span class="keyword">for</span>, used <span class="keyword">for</span> pprof inputs</span><br><span class="line"> -t, --seconds= Number of seconds to profile <span class="keyword">for</span> (default: 30)</span><br><span class="line"> --pprofArgs= Extra arguments <span class="keyword">for</span> pprof</span><br><span class="line"></span><br><span class="line">Output Options:</span><br><span class="line"> -f, --file= Output file name (must be .svg) (default: torch.svg)</span><br><span class="line"> -p, --<span class="built_in">print</span> Print the generated svg to stdout instead of writing to file</span><br><span class="line"> -r, --raw Print the raw call graph output to stdout instead of creating a flame graph; use with Brendan Gregg<span class="string">'s flame</span></span><br><span class="line"><span class="string"> graph perl script (see https://github.com/brendangregg/FlameGraph)</span></span><br><span class="line"><span class="string"> --title= Graph title to display in the output file (default: Flame Graph)</span></span><br><span class="line"><span class="string"> --width= Generated graph width (default: 1200)</span></span><br><span class="line"><span class="string"> --hash Colors are keyed by function name hash</span></span><br><span class="line"><span class="string"> --colors= set color palette. choices are: hot (default), mem, io, wakeup, chain, java, js, perl, red, green, blue,</span></span><br><span class="line"><span class="string"> aqua, yellow, purple, orange</span></span><br><span class="line"><span class="string"> --cp Use consistent palette (palette.map)</span></span><br><span class="line"><span class="string"> --reverse Generate stack-reversed flame graph</span></span><br><span class="line"><span class="string"> --inverted icicle graph</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">Help Options:</span></span><br><span class="line"><span class="string"> -h, --help Show this help message</span></span><br></pre></td></tr></table></figure><p>通常情况下,我们只需要关注-u参数与-f参数即可,运行如下命令进行CPU采样,输出svg格式火焰图:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">go-torch -u http://xxxx.baidu.com:2021/debug/pprof/profile?seconds=120</span><br></pre></td></tr></table></figure><p>go-torch运行输出如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">INFO[10:57:29] Run pprof command: go tool pprof -raw -seconds 30 http://xxxx.baidu.com:2021/debug/pprof/profile?seconds=120</span><br><span class="line">INFO[10:58:00] Writing svg to torch.svg</span><br></pre></td></tr></table></figure><p>FlameGrpah文件:<a href="torch.svg">torch.svg</a></p><h3><span id="flamegraph-火焰图分析">FlameGraph-火焰图分析</span></h3><p>许多人对火焰图的理解有歧义,有些似懂非懂,按照自己的主观意识去解读,导致陷入误区。要使用火焰图进行性能分析,首先需要明确火焰图<strong>x轴</strong>与<strong>y轴</strong>的确切含义。</p><p><strong><em>y 轴表示调用栈,每一层都标识一个函数,调用栈越深,火焰就越高,顶部就是当前在执行的函数。</em></strong></p><p><strong><em>x 轴表示抽样数,如果一个函数在 x 轴占据的宽度越宽,表示它被抽到的次数多,执行的时间长(x 轴非时间轴,是所有的调用栈合并后,按函数字母顺序排列的)。因此,火焰图顶部,只要有”平顶”(plateaus),则表示该函数可能存在<font color="#ff0000" size="3">性能问题</font>。</em></strong></p><p>提示:svg格式,当移动鼠标至其中一栏时会显示”Tips”信息,包含采样数、占采样总数百分比等等信息,有关火焰图更详细的资料参考:<a href="http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html">cpu flame graph</a>。</p><p><img src="/2020/09/20/Golang%E6%80%A7%E8%83%BD%E5%88%86%E6%9E%90/torch.svg" alt="示例火焰图"></p><p>上图中,可以很明显观察到<strong>encoding/json.Unmarshal</strong>函数<strong>耗费了40%的CPU时间</strong>,是系统的性能瓶颈。定位了性能瓶颈之后,我们应当思考如何优化了。</p><h2><span id="性能优化">性能优化</span></h2><p>json反序列化成为系统性能瓶颈,可以说在情理之内,预期之外。业务角度,我们的Golang模块强依赖的下游服务,返回了一个大JSON(大约几十KB),且字段嵌套层级较深,json反序列化耗时是情理之内的,但是耗费了惊人的40%的CPU时间是预期之外的。</p><p>为了解决这个问题,有如下几条思路:</p><ul><li>对下游返回JSON”瘦身”,未使用的字段不做解析;</li><li>使用LRU Cache,在内存中缓存已反序列化之后的Struct;</li><li>使用性能更高的开源json序列化方案;</li></ul><p>如何进行性能调优,解决文章中的Case,笔者将会在新的文章中阐述思路,本文不做过多叙述。</p><h1><span id="联系作者">联系作者</span></h1><p>有更好的”性能调优”方式,也欢迎一块儿交流一下(邮箱:<a href="mailto:luckydreamcatcher@163.com">luckydreamcatcher@163.com</a>,微信号:15210466756);</p><h1><span id="参考资料">参考资料</span></h1><p><a href="https://zhuanlan.zhihu.com/p/51559344">Go Pprof</a></p><p><a href="http://www.graphviz.org/download/source/">graphviz</a></p><p><a href="http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html">cpu flame graph</a></p><p><a href="https://www.ruanyifeng.com/blog/2017/09/flame-graph.html">如何读懂火焰图</a></p><p><a href="https://ichrisking.github.io/2018/03/08/FlameGraph/">FlameGraph安装指南</a></p>]]></content>
<summary type="html">
<h1><span id="写在前面">写在前面</span></h1><p>本文章为笔者原创,转载需要表明出处,联系作者:<a href="mailto:&#108;&#117;&#x63;&#107;&#x79;&#x64;&#114;&#x65;&#97;&#109;&#x
</summary>
<category term="性能优化" scheme="https://keepalive555.github.io/categories/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/"/>
<category term="Golang" scheme="https://keepalive555.github.io/tags/Golang/"/>
<category term="性能优化" scheme="https://keepalive555.github.io/tags/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/"/>
<category term="PProf" scheme="https://keepalive555.github.io/tags/PProf/"/>
</entry>
<entry>
<title>bloom filter(布隆过滤器)</title>
<link href="https://keepalive555.github.io/2018/02/03/bloom-filter/"/>
<id>https://keepalive555.github.io/2018/02/03/bloom-filter/</id>
<published>2018-02-02T16:21:14.000Z</published>
<updated>2020-09-23T16:50:05.929Z</updated>
<content type="html"><![CDATA[<h2><span id="应用场景">应用场景</span></h2><p>在互联网后台的开发工作中,笔者会经常遇到各种各样的**<em>白名单**</em>业务场景,比如以下典型场景:</p><ol><li>现有1亿个用户<code>user_id</code>,如何快速判断一个<code>user_id</code>是否在该白名单内</li><li>网络爬虫解析出一个页面的<code>url</code>清单,如何快速判断该<code>url</code>是否已经被抓取过</li><li>现有1亿个<code>user_id</code>,如何快速判断哪些<code>user_id</code>曾重复出现</li><li>服务器收到来自某个<code>ip</code>地址的请求,快速判断该<code>ip</code>地址是否在黑名单</li><li>……</li></ol><p>熟悉数据结构的读者,略微思考一下,便知以上若干问题的核心需求是:*<strong>设计一个内存占用少且又高效的查找算法/数据结构。*** 以场景1为例,大多数读者首先想到的数据结构为*</strong>哈希表***,任意元素均可在<code>O(1)</code>时间复杂度内快速完成查找。</p><p>假设哈希表的装载因子为0.5(实践中比较常见的取值),粗略计算一下1亿个int类型<code>user_id</code>的内存占用约为<code>745MB</code>,一个白名单要占用如此多的内存空间,这显然是不可接受的。那么我们如何既能达成我们的目的,又占用比较小的内存呢?</p><p>一个<code>user_id</code>是否在白名单之内,只可能存在两种取值——是/否,从**<em>香农信息论**</em> 角度来看,使用1个<code>bit</code>即可表示是/否两种取值。一个<code>int</code>类型变量可存储<code>2^32</code>种取值,而当前业务场景下我们仅仅需要<code>0</code>和<code>1</code>两种状态便可(存储4种状态使用2个<code>bit</code>,存储8种状态使用3个<code>bit</code>,以此类推…)。存储1亿个<code>bit</code>占用空间约为<code>11MB</code>,大大减少了内存占用,这便是<code>Bitmap</code>数据结构。</p><h2><span id="bitmap">Bitmap</span></h2><p><code>Bitmap</code>是一种紧凑的数据结构。以场景1为例,首先在内存中连续分配1亿个<code>bit</code>,要判断<code>user_id</code>为<code>1000</code>的用户是否在白名单之内,只需获取<code>bit</code>序列的第<code>1000</code>位<code>bit</code>的状态(1:<code>user_id</code>在白名单,0:<code>user_id</code>不在白名单)。如下为<code>c</code>语言版本的示例代码(也可查看笔者的<a href="https://github.com/keepalive555/study/blob/master/bitmap/bitmap.c"><code>github</code></a>):</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAXSIZE 1024</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> SHIFT 5</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MASK 0xffffffff</span></span><br><span class="line"><span class="keyword">static</span> <span class="keyword">unsigned</span> <span class="keyword">int</span> bitmap[MAXSIZE / (<span class="keyword">sizeof</span>(<span class="keyword">unsigned</span> <span class="keyword">int</span>) * <span class="number">8</span>) + <span class="number">1</span>];</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">set</span><span class="params">(<span class="keyword">int</span> n)</span> </span>{</span><br><span class="line"> <span class="comment">// 置位操作</span></span><br><span class="line"> bitmap[n >> SHIFT] |= <span class="number">1</span> << (n & MASK);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">clr</span><span class="params">(<span class="keyword">int</span> n)</span> </span>{</span><br><span class="line"> <span class="comment">// 复位操作</span></span><br><span class="line"> bitmap[n >> SHIFT] &= ~(<span class="number">1</span> << (n & MASK));</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">test</span><span class="params">(<span class="keyword">int</span> n)</span> </span>{</span><br><span class="line"> <span class="comment">// 检测是否置位</span></span><br><span class="line"> <span class="keyword">int</span> i = n >> SHIFT;</span><br><span class="line"> <span class="keyword">if</span>(bitmap[i] & (<span class="number">1</span> << (n & MASK)))</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">void</span>)</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> n = <span class="number">1023</span>;</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"space: %d\n"</span>, <span class="keyword">sizeof</span>(bitmap) / <span class="keyword">sizeof</span>(<span class="keyword">unsigned</span> <span class="keyword">int</span>));</span><br><span class="line"> <span class="built_in">set</span>(n);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"has set flag: %d\n"</span>, test(n));</span><br><span class="line"> clr(n);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"has set flag: %d\n"</span>, test(n));</span><br><span class="line"> <span class="built_in">set</span>(n);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"has set flag: %d\n"</span>, test(n));</span><br><span class="line"> clr(n);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>Bitmap</code>类似于哈希表,哈希规则便是将数字<code>n</code>映射到<code>Bitmap</code>第<code>n</code>个<code>bit</code>上。因此<code>Bitmap</code>在实际应用中存在一处问题——当<code>n</code>取值特别大时,<code>Bitmap</code>占用空间也会比较大。在此业务场景下,<code>Bitmap</code>数据结构是不合理的,所以便衍生出了<code>Bloom Filter</code>。</p><h2><span id="bloom-filter">Bloom Filter</span></h2><p><code>Bloom Filter</code>,中文译名布隆过滤器,是1970年由布隆提出来。布隆过滤器可以用于检索一个元素是否在一个集合中。朴素的讲,<code>BloomFilter</code>在<code>Bitmap</code>的基础上,将<code>Hash</code>函数的由一个扩展至多个。判断一个元素是否在一个集合中,仅需判断经过这些<code>Hash</code>函数后的值是否置位。布隆过滤器优点是*<strong>空间复杂度和时间复杂度*** 都优于一般的算法,缺点是*</strong>有一定的误识别率*** ,删除困难。</p><p><img src="/2018/02/03/bloom-filter/bloom-filter.png" alt="布隆过滤器"></p><h3><span id="算法原理">算法原理</span></h3><p>假设所选<code>Hash</code>函数在散列空间内分布均匀,即散列到每一个位置的概率相等(对于Hash函数的核心诉求)。假设<code>Bit</code>数组的大小为<code>m</code>,<code>k</code>为<code>Hash</code>函数的个数。</p><p><code>Bit</code>数组中某一位位置在元素插入时的<code>Hash</code>操作中没有被置位<code>1</code>的概率是:</p><p><img src="/2018/02/03/bloom-filter/1.png" alt="1"></p><p><code>k</code>个<code>Hash</code>函数散列之后该位置仍未被置位<code>1</code>的概率是:</p><p><img src="/2018/02/03/bloom-filter/2.png" alt="2"></p><p>连续插入<code>n</code>个元素,该位置仍未被置位<code>1</code>的概率是:</p><p><img src="/2018/02/03/bloom-filter/3.png" alt="3"></p><p>对立事件,该位为<code>1</code>的概率为:</p><p><img src="/2018/02/03/bloom-filter/4.png" alt="4"></p><h3><span id="代码实现">代码实现</span></h3><p><code>C</code>语言实现请参考笔者<code>Github</code>:<a href="https://github.com/keepalive555/DataStructure/blob/master/bitmap/bloomfilter.c">bloomfilter.c</a></p><h2><span id="参考资料">参考资料</span></h2><p><a href="http://matthias.vallentin.net/course-work/cs270-s11.pdf">Bloom Filter Pagers</a></p><p><a href="https://www.cnblogs.com/liyulong1982/p/6013002.html">Bloom Filter</a></p>]]></content>
<summary type="html">
<h2><span id="应用场景">应用场景</span></h2><p>在互联网后台的开发工作中,笔者会经常遇到各种各样的**<em>白名单**</em>业务场景,比如以下典型场景:</p>
<ol>
<li>现有1亿个用户<code>user_id</code>,如何快速
</summary>
<category term="数据结构" scheme="https://keepalive555.github.io/categories/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/"/>
<category term="BloomFilter" scheme="https://keepalive555.github.io/tags/BloomFilter/"/>
<category term="白名单" scheme="https://keepalive555.github.io/tags/%E7%99%BD%E5%90%8D%E5%8D%95/"/>
<category term="假阳性" scheme="https://keepalive555.github.io/tags/%E5%81%87%E9%98%B3%E6%80%A7/"/>
</entry>
<entry>
<title>Python list实现</title>
<link href="https://keepalive555.github.io/2018/01/08/Python-list%E5%AE%9E%E7%8E%B0/"/>
<id>https://keepalive555.github.io/2018/01/08/Python-list实现/</id>
<published>2018-01-08T13:57:35.000Z</published>
<updated>2020-09-23T16:52:40.493Z</updated>
<content type="html"><![CDATA[<h2><span id="前言">前言</span></h2><p>本文所讲<code>Python</code>实现均为<code>CPython</code>,需读者具备一定的<code>C</code>语言阅读能力。本博文参考了**<em>《Python源码剖析》**</em>与<code>Python2.7</code>源码。<code>PyListObject</code>采用顺序存储(而非链式存储),熟悉<code>数据结构</code>的读者,能很容易明白本博文所讲内容。</p><h2><span id="介绍">介绍</span></h2><p><code>PyListObject</code>是<code>Python</code>提供的<code>List</code>容器实现,与<code>C++ STL</code>中的<code>vector</code>实现机制相近。<code>PyListObject</code>是变长对象同时也是可变对象(很显然,不同时刻<code>List</code>中可以存在不同数目的元素)。</p><p><code>PyListObject</code>定义如下:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> {</span></span><br><span class="line">PyObject_VAR_HEAD</span><br><span class="line">PyObject **ob_item;</span><br><span class="line"><span class="keyword">int</span> allocated;</span><br><span class="line">} PyListObject;</span><br></pre></td></tr></table></figure><p><code>PyObject_VAR_HEAD</code>中的<code>ob_size</code>与<code>PyListObject</code>中的<code>allocated</code>字段分别标识了容器的现有*<strong>元素个数(size)**<em>与</em></strong>容器容量(capacity)***。<code>ob_item</code>为指向<code>PyObject *</code>的指针(即<code>PyObject *</code>数组),是<code>PyListObject</code>实现顺序存储的数组。</p><h2><span id="实现">实现</span></h2><h3><span id="1-创建对象">1、创建对象</span></h3><p><code>Python</code>提供了唯一创建<code>List</code>的函数——<code>PyList_New</code>。下面是简化的后<code>Python</code>创建<code>PyListObject</code>对象的过程。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAXFREELISTS 80</span></span><br><span class="line"><span class="keyword">static</span> PyListObject *free_lists[MAXFREELISTS];</span><br><span class="line"><span class="keyword">static</span> <span class="keyword">int</span> num_free_ists = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"><span class="function">PyObject *<span class="title">PyList_New</span><span class="params">(<span class="keyword">int</span> size)</span> </span>{</span><br><span class="line">PyListObject *op;</span><br><span class="line"><span class="keyword">size_t</span> nbytes;</span><br><span class="line"><span class="comment">// 判断int类型是否溢出,若溢出则返回内存分配失败</span></span><br><span class="line">nbytes = size * <span class="keyword">sizeof</span>(PyObject *);</span><br><span class="line"><span class="keyword">if</span>(nbytes / <span class="keyword">sizeof</span>(PyObject *) != (<span class="keyword">size_t</span>)size) {</span><br><span class="line"><span class="keyword">return</span> PyErr_NoMemory();</span><br><span class="line">}</span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="keyword">if</span>(num_free_lists) {</span><br><span class="line"><span class="comment">// 缓冲池可用,则从缓冲池取一可用List</span></span><br><span class="line">num_free_lists--;</span><br><span class="line">op = free_lists[num_free_lists];</span><br><span class="line">_Py_NewReference((PyObject *)op);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"><span class="comment">// 缓冲池不可用,直接新建对象并为Python中的自动垃圾收集机制做一些工作</span></span><br><span class="line">op = PyObject_GC_New(PyListObject, &PyList_Type);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span>(size <= <span class="number">0</span>) {</span><br><span class="line">op->ob_item = <span class="literal">NULL</span>;</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line">op->ob_item = (PyObject **)PyMem_MALLOC(nbytes);</span><br><span class="line"><span class="built_in">memset</span>(op->ob_item, <span class="number">0</span>, nbytes);</span><br><span class="line">}</span><br><span class="line">op->ob_size = size;</span><br><span class="line">op->allocated = size;</span><br><span class="line"><span class="keyword">return</span> (PyObject *)op;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>PyListObject</code>对象分为两部分:①<code>PyListObject</code>对象②<code>PyListObject</code>对象容纳的<code>PyObject</code>元素。</p><h3><span id="2-设置元素">2、设置元素</span></h3><p>前面提到<code>PyListObject</code>是顺序存储,可以**<em>随机访问**</em>。通过下标设置<code>List</code>中元素值,是由<code>PyList_SetItem</code>函数实现的。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">PyList_SetItem</span><span class="params">(<span class="keyword">register</span> PyObject *op, <span class="keyword">register</span> <span class="keyword">int</span> i, <span class="keyword">register</span> PyObject *new_item)</span> </span>{</span><br><span class="line"> <span class="comment">// 保存指向旧元素的指针,用于减少引用计数</span></span><br><span class="line"><span class="keyword">register</span> PyObject *olditem;</span><br><span class="line"><span class="keyword">register</span> PyObject **p;</span><br><span class="line"><span class="comment">// 检查索引值得合法性</span></span><br><span class="line"><span class="keyword">if</span>(i < <span class="number">0</span> || i>= (PyListObject)op->ob_size) {</span><br><span class="line"><span class="comment">// 报索引错误</span></span><br><span class="line"><span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">}</span><br><span class="line"><span class="comment">// 设置元素</span></span><br><span class="line">p = ((PyListObject*)op)->ob_item + i;</span><br><span class="line">olditem = *p;</span><br><span class="line">Py_XDECREF(olditem);</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3><span id="3-插入元素">3、插入元素</span></h3><p>了解<code>顺序存储</code>的读者,很容易想到新元素的插入会导致元素的移动。<code>PyListObject</code>的实现也不例外,而这其中又牵扯了<code>PyListObject.ob_item</code>的*<strong>扩容**<em>与</em></strong>缩容***(参考<code>Redis</code>或者其它若干软件的实现,都会有类似机制)。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">PyList_Insert</span><span class="params">(PyObject *op, Py_ssize_t where, PyObject *new_item)</span> </span>{</span><br><span class="line"><span class="keyword">return</span> insl((PyListObject *)op, where, newitem);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">insl</span><span class="params">(PyListObject *self, Py_ssize_t where, PyObject *v)</span> </span>{</span><br><span class="line">Py_ssize_t i, n = self->ob_size;</span><br><span class="line">PyObject **items;</span><br><span class="line"><span class="comment">// 调整列表容量</span></span><br><span class="line"><span class="keyword">if</span>(list_resize(self, n+<span class="number">1</span>) == <span class="number">-1</span>)</span><br><span class="line"><span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"><span class="comment">// 确定插入点</span></span><br><span class="line"><span class="keyword">if</span>(where < <span class="number">0</span>) {</span><br><span class="line"><span class="comment">// 负数索引</span></span><br><span class="line">where += n;</span><br><span class="line"><span class="keyword">if</span>(where < <span class="number">0</span>)</span><br><span class="line">where = <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"><span class="keyword">if</span>(where > n)</span><br><span class="line">where = n;</span><br><span class="line"><span class="comment">// 插入元素</span></span><br><span class="line">items = self->ob_item;</span><br><span class="line"><span class="keyword">for</span>(i = n; --i >= where; )</span><br><span class="line"><span class="comment">// 从后往前将元素后移一个单位,空出新元素存储单元</span></span><br><span class="line">item[i+<span class="number">1</span>] = item[i]</span><br><span class="line"><span class="comment">// 使用宏Py_INCREF增加元素v的引用计数</span></span><br><span class="line">Py_INCREF(v);</span><br><span class="line">item[where] = v;</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>其中函数<code>list_resize</code>为<code>PyListObject</code>对象*<strong>扩容**<em>与</em></strong>缩容***的关键。<code>list_resize</code>函数的实现如下:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">list_resize</span><span class="params">(PyObjectList *self, <span class="keyword">int</span> newsize)</span> </span>{</span><br><span class="line">PyObject **items;</span><br><span class="line"><span class="keyword">size_t</span> new_allocated;</span><br><span class="line"><span class="keyword">int</span> allocated = self->allocated;</span><br><span class="line"><span class="comment">// 不需要申请内存</span></span><br><span class="line"><span class="keyword">if</span>(allocated >= newsize && newsize >= (allocated >> <span class="number">1</span>)) {</span><br><span class="line">self->ob_size = newsize;</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"><span class="comment">// 计算重新申请内存的大小</span></span><br><span class="line">new_allocated = (newsize >> <span class="number">3</span>) + (newsize < <span class="number">9</span> ? <span class="number">3</span> : <span class="number">6</span>) + newsize;</span><br><span class="line"><span class="keyword">if</span>(newsize == <span class="number">0</span>)</span><br><span class="line">new_allocated = <span class="number">0</span>;</span><br><span class="line"><span class="comment">// 扩展列表</span></span><br><span class="line">items = self->ob_items;</span><br><span class="line"><span class="comment">// 最终调用c语言的realloc</span></span><br><span class="line">PyMem_RESIZE(item, PyObject *, new_allocated);</span><br><span class="line">self->ob_itme = items;</span><br><span class="line">self->ob_size = newsize;</span><br><span class="line">self->allocated = new_allocated;</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>当<code>List</code>新的元素个数<code>newsize</code>,满足条件:<code>allocated/2 <= newsize <= allocated</code>时,不需要进行<code>realloc</code>。当<code>newsize >= allocated</code>时,<code>PyObjectList</code>会进行*<strong>扩容**<em>操作,当<code>newsize < allocated/2</code>时<code>PyObjectList</code>会进行</em></strong>缩容***操作。</p><h2><span id="对象池">对象池</span></h2><p><code>CPython</code>为了解决频繁创建对象带来的性能问题(大多数对性能要求较高的<code>C</code>程序均采用类似机制),采用了大量的<code>对象池</code>技术——<code>PyListObject</code>的实现也不例外。如果读者对此类技术不熟悉,请参阅**<em>对象池**</em>设计模式。</p><p>在如上<code>PyList_New</code>函数的实现代码中,<code>free_lists</code>指针数组便是用于<code>PyListObject</code>创建的对象池。我们可以看到如果存在可用的<code>PyListObject</code>,<code>Python</code>便会从<code>对象池</code>中取出并返回一个<code>PyListObject</code>对象。那么<code>PyListObject</code>对象是**<em>何时、如何**</em>归还给对象池的呢?答案就在销毁<code>PyListObject</code>的<code>list_dealloc</code>函数里。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">list_dealloc</span><span class="params">(PyListObject *op)</span> </span>{</span><br><span class="line"><span class="keyword">int</span> i;</span><br><span class="line"><span class="keyword">if</span>(op->ob_item != <span class="literal">NULL</span>) {</span><br><span class="line">i = op->ob_size;</span><br><span class="line"><span class="keyword">while</span>(--i >= <span class="number">0</span>) {</span><br><span class="line">Py_XDECREF(op->ob_item[i]);</span><br><span class="line">}</span><br><span class="line">PyMem_FREE(op->ob_item);</span><br><span class="line">}</span><br><span class="line"><span class="comment">// 释放PyListObject自身</span></span><br><span class="line"><span class="keyword">if</span>(num_free_lists < MAXFREELISTS && PyList_CheckExact(op))</span><br><span class="line">free_lists[num_free_lists++] = op;</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">op->ob_type->tp_free((PyObject *)op);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h2><span id="前言">前言</span></h2><p>本文所讲<code>Python</code>实现均为<code>CPython</code>,需读者具备一定的<code>C</code>语言阅读能力。本博文参考了**<em>《Python源码剖析》**</
</summary>
<category term="编程语言" scheme="https://keepalive555.github.io/categories/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/"/>
<category term="源码分析" scheme="https://keepalive555.github.io/categories/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/"/>
<category term="Python" scheme="https://keepalive555.github.io/tags/Python/"/>
<category term="list" scheme="https://keepalive555.github.io/tags/list/"/>
</entry>
<entry>
<title>SkipList研究</title>
<link href="https://keepalive555.github.io/2017/12/29/SkipList/"/>
<id>https://keepalive555.github.io/2017/12/29/SkipList/</id>
<published>2017-12-28T16:37:10.000Z</published>
<updated>2020-09-23T16:51:16.117Z</updated>
<content type="html"><![CDATA[<h2><span id="吐槽">吐槽</span></h2><p>作为一名学渣,每次回头去翻看一下大学课程的基础知识,总会有不同的感受。笔者也总想着把自己工作中领悟的做归纳。关于查找算法,思想大概可以归类为三类(大神请绕路):</p><ul><li>顺序查找</li><li>二分查找(插入查找、斐波拉切查找…)</li><li>哈希查找</li></ul><p>顺序查找是我们常用的遍历。在对性能要求比较高的业务场景下,我们便需要考虑其他更好的实现方式了(例如:为了避免全表扫描,数据库通过<code>B+ Tree</code>索引提高查找效率)。哈希查找,时间复杂度为<code>O(1)</code>,是一种常见且应用广泛的查找算法。本文将在剩余篇幅对二分查找法进行吐槽。</p><h2><span id="思考">思考</span></h2><p>咦?今天我们讨论的不是<code>SkipList</code>吗,为什么会谈到二分查找法,接下来笔者将阐述一下原因。在实际工程应用中,算法与数据结构是相辅相成的,相互依存,相互影响的, 没有<code>数据结构</code>支撑的算法只能是空中阁楼。接下来,我们思考尝试为二分查找(或类似思想)寻找一个适合的**<em>数据结构**</em>。</p><p>通常会从<code>CRUD</code>(即增、删、改、查)四个角度,结合具体应用场景去衡量一个数据结构的适用性。我们知道数据的存储方式分为两种:①顺序存储②链式存储。<strong>顺序存储</strong>中,有序列表的元素在内存中紧紧相连,可以**<em>随机访问**</em>(直接用下标访问,时间复杂度<code>O(1)</code>),能用二分查找法快速定位节点。但是顺序存储对<code>增、删</code>操作的处理比较费力(当删除列表中一个元素时,列表应当将该元素后面的元素前移,填补空的节点,同样增加元素时亦是如此)。</p><p>顺序存储不适用于<code>增 、删</code>操作频繁的应用场景,那么我们考虑一下*<strong>链式存储**<em>。</em></strong>链表*<strong>能很好的处理<code>增、删</code>频繁的场景。但是链表一般**<em>顺序访问</em></strong>(即读取第一个元素后才可以读取第二个元素,以此类推),显然传统的链表数据结构无法应用二分的思想进行快速查找。</p><p>聪明的人们结合<code>二叉树</code>,发明了**<em>二叉查找树**</em>———既可以二分查找,又能够快速<code>添加、删除</code>元素的数据结构。这正是我们期望的能够应用二分查找的完美数据结构吗?很遗憾,并不是。二叉查找树在最坏情况下可能变成一个链表。于是,在二分查找树的基础上,就出现了<code>AVL</code>平衡树。<code>AVL</code>树在<code>增、删</code>节点时,为了保持树的平衡,会进行左旋,右旋操作,增加了<code>增、删</code>操作的复杂度。于是乎根据人们在发明了<code>B-Tree</code>,<code>B+ Tree</code>,<code>红黑树</code>等。但是<code>AVL</code>树实现起来比较复杂,平衡操作较难理解。</p><p>所以便有了<code>SkipList</code>。</p><h2><span id="实现">实现</span></h2><p>百度搜索网上一些<code>SkipList</code>的实现,代码多多少少存在一些瑕疵。笔者根据自己对<code>SkipList</code>的理解,结合网上的一些实现,整理出了一份<code>C</code>语言版本的<code>SkipList</code>实现。读者可以参阅笔者的<code>GitHub</code>,源文件:<a href="https://github.com/keepalive555/study/blob/master/skiplist/skiplist.c">https://github.com/keepalive555/study/blob/master/skiplist/skiplist.c</a>。</p><p>其中<code>SkipList</code>新建<code>Node</code>节点,随机获取节点<code>level</code>值的<code>random_level</code>函数(源码如下所示),是笔者摘抄自<code>Redis</code>源码。*<strong>该函数是保证<code>SkipList</code>的<code>CRUD</code>操作时间复杂度为**<em>O(logN)</em></strong>的核心所在***。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAX_LEVEL 32</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> P 0.25</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">random_level</span><span class="params">(<span class="keyword">void</span>)</span> </span>{ </span><br><span class="line"> <span class="keyword">int</span> level = <span class="number">1</span>; </span><br><span class="line"> <span class="keyword">while</span> ((random() & <span class="number">0xFFFF</span>) < (P * <span class="number">0xFFFF</span>)) </span><br><span class="line"> level += <span class="number">1</span>; </span><br><span class="line"> <span class="keyword">return</span> (level < MAX_LEVEL) ? level : MAX_LEVEL; </span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h2><span id="吐槽">吐槽</span></h2><p>作为一名学渣,每次回头去翻看一下大学课程的基础知识,总会有不同的感受。笔者也总想着把自己工作中领悟的做归纳。关于查找算法,思想大概可以归类为三类(大神请绕路):</p>
<ul>
<li>顺序查找</li>
<
</summary>
<category term="数据结构" scheme="https://keepalive555.github.io/categories/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/"/>
<category term="跳表" scheme="https://keepalive555.github.io/tags/%E8%B7%B3%E8%A1%A8/"/>
<category term="时间复杂度O(nlogn)" scheme="https://keepalive555.github.io/tags/%E6%97%B6%E9%97%B4%E5%A4%8D%E6%9D%82%E5%BA%A6O-nlogn/"/>
<category term="哈希表" scheme="https://keepalive555.github.io/tags/%E5%93%88%E5%B8%8C%E8%A1%A8/"/>
</entry>
<entry>
<title>Raft协议</title>
<link href="https://keepalive555.github.io/2017/12/23/raft%E5%8D%8F%E8%AE%AE/"/>
<id>https://keepalive555.github.io/2017/12/23/raft协议/</id>
<published>2017-12-23T13:14:30.000Z</published>
<updated>2017-12-29T16:49:33.000Z</updated>
<content type="html"><![CDATA[<h2><span id="前言">前言</span></h2><p>分布式,很多初学者对这个词的第一印象——高大上技术范儿。抛开技术细节不谈,纵观后台技术的发展,存在着普遍适用的规律,一项新技术的诞生,总是解决一些现有架构无法解决的问题。如果读者凭空去学习分布式,便容易坠入云里雾里。本文作为笔者自己学习的一个梳理,以实际问题出发阐述了笔者对<code>Raft</code>协议的理解。本文并不对<code>Raft</code>协议的实现机制做详细的描述,只是从一个新手解决问题的角度去阐述<code>Raft</code>协议做了些什么,不正确的地方请读者指正(邮箱:<a href="mailto:[email protected]">[email protected]</a>)。</p><h2><span id="思考">思考</span></h2><p>以经典单数据库实例架构(这也是很多企业级应用的典型架构)为例,所有的业务数据均存储于单机数据库,当数据库实例<code>Crash</code>了以后,业务便受到影响,在大多数情况下,这种<code>Crash</code>对企业业务的影响是可控范的。然而在互联网应用中,哪怕是一分钟的<code>Crash</code>对企业来说也是致命的,比如前段时间,美团的外卖系统出现崩溃,整个服务停摆几个小时,造成大量用户流失到饿了么平台。 </p><p>笔者尝试根据自己的经验去解决该问题,为了让单机数据库实例在<code>Crash</code>了以后,整个系统仍然保持可用,我们很容易想到的一个策略——冗余(比如你在单位请假了需要有人代替你继续工作而不影响业务)。我们增加了一台数据库实例<code>B</code>(原来的数据库实例用<code>A</code>表示),在实例<code>A</code>挂掉了之后,我们期望<code>B</code>可以代替<code>A</code>继续提供服务,*<strong>所以<code>B</code>与<code>A</code>必须具备一样的数据**<em>,在分布式里面这个称作</em></strong>一致性*<strong>。<code>Raft</code>协议为**<em>分布式一致性协议</em></strong>的一种实现,主要目标就是解决上述这类问题。</p><p>脱离现有的<code>MySQL</code>,<code>Redis</code>,<code>Kafka</code>等高可用方案(因为这些系统为了性能而做出一些折中),我们根据自己的诉求,去设计一个高可用的存储系统,需要注意哪些问题呢?假设我们的存储系统有<code>A</code>,<code>B</code>,<code>C</code>等3个节点用来保持高可用,那我们该怎么保持<code>A</code>,<code>B</code>,<code>C</code>3个节点内数据的一致性呢?</p><ul><li>一致性由客户端保证还是服务端保证</li><li>如何保证<code>A,B,C</code>或更多节点的数据一致性</li></ul><p>首先分析第一个问题,假设一致性工作是由客户端保证的(客户端向<code>A</code>写入数据的同时向<code>B</code>和<code>C</code>写入数据,为保证<code>A,B,C</code>的一致性,需<code>A,B,C</code>3个节点全部写入成功,客户端回才判定写入成功),我们可能会遇到如下情况:</p><ul><li><code>B</code>下线了一段时间又重新上线,因为客户端未保存<code>B</code>处于下线状态这段时间的数据,所以<code>B</code>中就会缺失这部分数据,因而<code>B</code>中数据会与<code>A</code>与<code>C</code>中数据不一致。</li><li>客户端向<code>A</code>与<code>C</code>中写入数据成功,但向<code>B</code>中写入数据失败,这次写入应当被认定为失败(因为<code>A</code>,<code>B</code>,<code>C</code>中数据不一致,也无法通过其他途径达到一致),我们期望整个系统可以表现的犹如一个**<em>事务**</em>,要么全部成功,要么全部失败回滚修改,客户端无法提供这种机制。</li></ul><p>综上,**<em>由客户端保证数据的一致性是不可取的**</em>。 </p><p>我们将一致性保证工作放在服务端实现,那么我们如何保证<code>A,B,C</code>三节点数据的一致性呢?首先我们思考一个问题,**<em>我们无法预知<code>A,B,C</code>三个节点中哪个节点会意外挂掉,所以客户端不应该至同单一节点建立联系**</em>,也就是说——<code>A,B,C</code>3个节点对外应当表现为一个整体,也就是集群<code>Cluster</code>。那么客户端该如何向<code>A,B,C</code>组成的集群写入数据?以下是笔者想到的实现方式:</p><ul><li>所有客户端均向<code>A,B,C</code>中某一节点(比如<code>A</code>)写入数据,由该节点将数据拷贝至其它节点以达到一致性。</li><li>向建立连接的节点写入数据,比如<code>客户端1</code>同<code>A</code>建立连接,<code>客户端1</code>向<code>A</code>写入数据,<code>客户端2</code>同<code>B</code>建立连接,<code>客户端2</code>向<code>B</code>写入数据,以此类推。</li></ul><p>读者是否觉得以上两种实现方式似曾相识——这和*<strong>并发编程**<em>下的并发更改共享变量问题相似,由经验我们可知,我们最好是将对共享的操作</em></strong>串行,有序的***执行。同样,如果多个客户端通过多个节点向集群写入数据,为了达到每个节点都有一份完整数据的目的,多个节点间会进行通讯,数据合并,而这其中又牵扯了数据的顺序等许多问题,工程实现起来比较复杂。<br>当然不是说不可以,笔者没见过这么做的~ ~)</p><p>方式一为目前流行的一致性解决思路,<code>Raft</code>协议采用了该思路,<code>Raft</code>协议解决了方式一面临的两大问题:</p><ul><li>集群启动(或者写入节点下线)时,如何选举出一个节点作为写入节点</li><li>写入节点如何与其它节点通讯,复制数据,保持数据在各节点的一致性</li></ul><p>以上两大问题便是<code>Raft</code>协议的两大功能:</p><ul><li><code>Leader Election</code></li><li><code>Log Replication</code></li></ul><p>分布式中任何环节都是不可靠的,实际问题比本人论述的复杂的多,但明确了上述问题,再去研究<code>Raft Paper</code>时,读者便可以快速掌握<code>Raft</code>协议。</p><p>建议大家观看<code>Raft</code>协议动画,简单明了生动:<a href="http://thesecretlivesofdata.com/raft/">http://thesecretlivesofdata.com/raft/</a></p><h2><span id="参考">参考</span></h2><p>[1] <a href="https://raft.github.io/raft.pdf">Raft Pager</a></p>]]></content>
<summary type="html">
<h2><span id="前言">前言</span></h2><p>分布式,很多初学者对这个词的第一印象——高大上技术范儿。抛开技术细节不谈,纵观后台技术的发展,存在着普遍适用的规律,一项新技术的诞生,总是解决一些现有架构无法解决的问题。如果读者凭空去学习分布式,便容易坠入云里
</summary>
<category term="algorithm" scheme="https://keepalive555.github.io/tags/algorithm/"/>
</entry>
<entry>
<title>线段树应用(编辑中)</title>
<link href="https://keepalive555.github.io/2017/12/23/%E7%BA%BF%E6%AE%B5%E6%A0%91%E5%BA%94%E7%94%A8/"/>
<id>https://keepalive555.github.io/2017/12/23/线段树应用/</id>
<published>2017-12-22T18:25:15.000Z</published>
<updated>2017-12-25T14:08:09.000Z</updated>
<content type="html"><![CDATA[<h2><span id="前言">前言</span></h2><p>学习工作多年,逐渐悟得一名优秀的程序员应有的态度———**<em>优雅的解决问题**</em>。简而言之,解决问题并不是最终解,如何求得该问题的最优解才是一名优秀程序员应该考虑的问题。<code>360</code>同事对待问题的态度让我这种感觉愈发强烈。</p><h2><span id="思考">思考</span></h2><p>最近笔者在考虑如下一个需求:</p><blockquote><p>现有一集团,内部网络划分为N多个子网(N>100),该N多个子网网段互相之间不交叉。有一批告警数据,该数据里携带了产生告警事件的设备<code>IPv4</code>地址,现笔者需要根据设备<code>IPv4</code>地址,将这些告警数据按子网网段分类。</p></blockquote><p>以上问题可抽象为:</p><p>问题Q:存在区间[1, 100],该区间是由[1, 10], [11, 20], [21, 30]…[91, 100]等子区间组成,现给定一个正整数N(1 <= N <= 100),求解N落在那个子区间。 </p><blockquote><p>注解:将N多个子网网段用区间(由计算机网络可知<code>IPv4</code>地址为32位无符号整数)的形式表现,比如子网网段<code>10.95.12.0/24</code>表示的<code>IPv4</code>地址范围为:<code>10.95.12.0</code> ~ <code>10.95.12.255</code>(即:<code>0xa5f0c00</code>~<code>0xaf0cff</code>),其他网段类推,由此可见该需求属于我们问题Q的同一类问题。</p></blockquote><p>我们尝试去解决问题Q,首先比较容易想到,也是实现比较简单的便是**<em>遍历**</em>[1, 10], [11, 20]…[91, 100]等所有子区间,用N与子区间的左右端点作比较,确定N所在的子区间。显而易见,该方法简单明了,时间复杂度为O(n)。</p><p>子区间数目越多,遍历一次花费的代价就越大,在海量数据的处理中,这显然是不可忍受的。我们观察到子区间[1, 10], [11, 20], [21, 30]…[91, 100]是连续的,于是我们自然而然的想到了二分查找与二叉搜索树,不同的是以前我们接触的大多是单个节点的查找,现在是范围(即:子区间)的查找,由此今天的主角便登场了——线段树(又名区间树)。</p><h2><span id="定义">定义</span></h2>]]></content>
<summary type="html">
<h2><span id="前言">前言</span></h2><p>学习工作多年,逐渐悟得一名优秀的程序员应有的态度———**<em>优雅的解决问题**</em>。简而言之,解决问题并不是最终解,如何求得该问题的最优解才是一名优秀程序员应该考虑的问题。<code>360</co
</summary>
<category term="algorithms" scheme="https://keepalive555.github.io/tags/algorithms/"/>
</entry>
<entry>
<title>Tornado IOLoop图解</title>
<link href="https://keepalive555.github.io/2017/12/22/IOLoop%E7%AE%80%E8%BF%B0/"/>
<id>https://keepalive555.github.io/2017/12/22/IOLoop简述/</id>
<published>2017-12-22T05:26:57.000Z</published>
<updated>2017-12-29T17:06:39.000Z</updated>
<content type="html"><![CDATA[<h1><span id="tornado-ioloop简述">Tornado IOLoop简述</span></h1><h2><span id="前言">前言</span></h2><p>笔者信奉这样一种哲学——“把书从薄读厚,然后从厚读薄”,<code>Tornado</code>源码犹如一部文学作品,汇集了众多优秀<code>Python</code>工程师的智慧结晶,奇思妙想让人拍手连连。一本好书每读一次,都有不同的感受,代码也是如此。为了能够在以后的工作学习中时时回顾品味一下,笔者决定将笔者对<code>Tornado</code>的理解以图记录下来。</p><p>在这里笔者推荐一款强大的在线绘图软件:<a href="https://www.draw.io/">https://www.draw.io/</a>,想要<code>Visio</code>的专业,却不喜欢<code>Visio</code>笨重的读者绝对会让你好用到<code>Cry</code>。</p><h2><span id="ioloop图示">IOLoop图示</span></h2><p><strong><em>注意:</em></strong> 对<code>IO</code>多路复用技术不了解的同学,可以先了解一下 <strong><em>阻塞/非阻塞,同步/异步,select,epoll</em></strong> 等概念。</p><p>笔者注意到,任何语言的事件循环(比如:<code>libev</code>,<code>nodejs</code>,…),最核心的<code>Feature</code>是相同的,不一样的只不过是实现方式,抽象层次不同,笔者将这些核心<code>Feature</code>总结如下:</p><ul><li>文件<code>IO</code>事件(比如:<code>socket</code>,<code>pipe</code>等<code>READ</code>,<code>WRITE</code>,<code>HUP</code>事件 …)</li><li>系统信号(比如:<code>SIGINT</code>,<code>SIGHUP</code>…)</li><li>定时器</li></ul><p><code>Tornado IOLoop</code>的实现也不例外,如下图所示:</p><p><img src="/2017/12/22/IOLoop%E7%AE%80%E8%BF%B0/ioloop.jpg" alt="IOLoop示意图"></p><h2><span id="ioloop代码随笔">IOLoop代码随笔</span></h2><p>以下是笔者对<code>IOLoop</code>核心方法<code>start</code>的源码注解,可以用于结合图示,加深理解。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">start</span>(<span class="params">self</span>):</span></span><br><span class="line"> <span class="keyword">if</span> self._running:</span><br><span class="line"> <span class="comment"># IOLoop已经启动。</span></span><br><span class="line"> <span class="keyword">raise</span> RuntimeError(<span class="string">"IOLoop is already running"</span>)</span><br><span class="line"> self._setup_logging() <span class="comment"># 开始安装日志模块。</span></span><br><span class="line"> <span class="keyword">if</span> self._stopped:</span><br><span class="line"> self._stopped = <span class="literal">False</span> <span class="comment"># 如果已经停止,则直接返回。</span></span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> <span class="comment"># 获取当前线程的IOLoop实例。</span></span><br><span class="line"> old_current = getattr(IOLoop._current, <span class="string">"instance"</span>, <span class="literal">None</span>)</span><br><span class="line"> IOLoop._current.instance = self <span class="comment"># 将当前IOLoop实例置为self。</span></span><br><span class="line"> self._thread_ident = thread.get_ident() <span class="comment"># 线程id。</span></span><br><span class="line"> self._running = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># signal.set_wakeup_fd closes a race condition in event loops:</span></span><br><span class="line"> <span class="comment"># a signal may arrive at the beginning of select/poll/etc</span></span><br><span class="line"> <span class="comment"># before it goes into its interruptible sleep, so the signal</span></span><br><span class="line"> <span class="comment"># will be consumed without waking the select. The solution is</span></span><br><span class="line"> <span class="comment"># for the (C, synchronous) signal handler to write to a pipe,</span></span><br><span class="line"> <span class="comment"># which will then be seen by select.</span></span><br><span class="line"> <span class="comment">#</span></span><br><span class="line"> <span class="comment"># In python's signal handling semantics, this only matters on the</span></span><br><span class="line"> <span class="comment"># main thread (fortunately, set_wakeup_fd only works on the main</span></span><br><span class="line"> <span class="comment"># thread and will raise a ValueError otherwise).</span></span><br><span class="line"> <span class="comment">#</span></span><br><span class="line"> <span class="comment"># If someone has already set a wakeup fd, we don't want to</span></span><br><span class="line"> <span class="comment"># disturb it. This is an issue for twisted, which does its</span></span><br><span class="line"> <span class="comment"># SIGCHLD processing in response to its own wakeup fd being</span></span><br><span class="line"> <span class="comment"># written to. As long as the wakeup fd is registered on the IOLoop,</span></span><br><span class="line"> <span class="comment"># the loop will still wake up and everything should work.</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># wakeup_fd是用来唤醒主事件循环(信号唤醒,或者从别的线程唤醒主线程)。</span></span><br><span class="line"> old_wakeup_fd = <span class="literal">None</span></span><br><span class="line"> <span class="keyword">if</span> hasattr(signal, <span class="string">'set_wakeup_fd'</span>) <span class="keyword">and</span> os.name == <span class="string">'posix'</span>:</span><br><span class="line"> <span class="comment"># requires python 2.6+, unix. set_wakeup_fd exists but crashes</span></span><br><span class="line"> <span class="comment"># the python process on windows.</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># Python2.6版本以上,Unix-like系统中,signal模块支持set_wakeup_fd方法。</span></span><br><span class="line"> <span class="comment"># Windows上siganl存在该方法,但是会crash。</span></span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> old_wakeup_fd = signal.set_wakeup_fd(self._waker.write_fileno())</span><br><span class="line"> <span class="keyword">if</span> old_wakeup_fd != <span class="number">-1</span>:</span><br><span class="line"> <span class="comment"># Already set, restore previous value. This is a little racy,</span></span><br><span class="line"> <span class="comment"># but there's no clean get_wakeup_fd and in real use the</span></span><br><span class="line"> <span class="comment"># IOLoop is just started once at the beginning.</span></span><br><span class="line"> signal.set_wakeup_fd(old_wakeup_fd)</span><br><span class="line"> old_wakeup_fd = <span class="literal">None</span></span><br><span class="line"> <span class="keyword">except</span> ValueError:</span><br><span class="line"> <span class="comment"># Non-main thread, or the previous value of wakeup_fd</span></span><br><span class="line"> <span class="comment"># is no longer valid.</span></span><br><span class="line"> <span class="comment"># 参考signal的官方手册,set_wakeup_fd仅可在主线程中调用。</span></span><br><span class="line"> old_wakeup_fd = <span class="literal">None</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> <span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line"> <span class="comment"># Prevent IO event starvation by delaying new callbacks</span></span><br><span class="line"> <span class="comment"># to the next iteration of the event loop.</span></span><br><span class="line"> <span class="comment"># ncallbacks记录了此次循环的回调函数个数,新增加的回调函数将要在下次循环被调用。</span></span><br><span class="line"> ncallbacks = len(self._callbacks)</span><br><span class="line"></span><br><span class="line"> <span class="comment"># Add any timeouts that have come due to the callback list.</span></span><br><span class="line"> <span class="comment"># Do not run anything until we have determined which ones</span></span><br><span class="line"> <span class="comment"># are ready, so timeouts that call add_timeout cannot</span></span><br><span class="line"> <span class="comment"># schedule anything in this iteration.</span></span><br><span class="line"> due_timeouts = [] <span class="comment"># 即将超时的任务。</span></span><br><span class="line"> <span class="keyword">if</span> self._timeouts:</span><br><span class="line"> now = self.time()</span><br><span class="line"> <span class="keyword">while</span> self._timeouts:</span><br><span class="line"> <span class="keyword">if</span> self._timeouts[<span class="number">0</span>].callback <span class="keyword">is</span> <span class="literal">None</span>:</span><br><span class="line"> <span class="comment"># The timeout was cancelled. Note that the</span></span><br><span class="line"> <span class="comment"># cancellation check is repeated below for timeouts</span></span><br><span class="line"> <span class="comment"># that are cancelled by another timeout or callback.</span></span><br><span class="line"> heapq.heappop(self._timeouts)</span><br><span class="line"> self._cancellations -= <span class="number">1</span></span><br><span class="line"> <span class="keyword">elif</span> self._timeouts[<span class="number">0</span>].deadline <= now:</span><br><span class="line"> due_timeouts.append(heapq.heappop(self._timeouts))</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> <span class="keyword">if</span> (self._cancellations > <span class="number">512</span> <span class="keyword">and</span></span><br><span class="line"> self._cancellations > (len(self._timeouts) >> <span class="number">1</span>)):</span><br><span class="line"> <span class="comment"># Clean up the timeout queue when it gets large and it's</span></span><br><span class="line"> <span class="comment"># more than half cancellations.</span></span><br><span class="line"> <span class="comment"># 如果定时任务取消数量大于512,并且超过总定时任务的半数,则清理self._timeouts,并重新平衡堆。</span></span><br><span class="line"> self._cancellations = <span class="number">0</span></span><br><span class="line"> self._timeouts = [x <span class="keyword">for</span> x <span class="keyword">in</span> self._timeouts</span><br><span class="line"> <span class="keyword">if</span> x.callback <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>]</span><br><span class="line"> heapq.heapify(self._timeouts)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(ncallbacks):</span><br><span class="line"> <span class="comment"># 执行回调函数。</span></span><br><span class="line"> self._run_callback(self._callbacks.popleft())</span><br><span class="line"> <span class="keyword">for</span> timeout <span class="keyword">in</span> due_timeouts:</span><br><span class="line"> <span class="comment"># 执行定时任务。</span></span><br><span class="line"> <span class="keyword">if</span> timeout.callback <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> self._run_callback(timeout.callback)</span><br><span class="line"> <span class="comment"># Closures may be holding on to a lot of memory, so allow</span></span><br><span class="line"> <span class="comment"># them to be freed before we go into our poll wait.</span></span><br><span class="line"> due_timeouts = timeout = <span class="literal">None</span> <span class="comment"># 防止内存泄漏</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> self._callbacks:</span><br><span class="line"> <span class="comment"># If any callbacks or timeouts called add_callback,</span></span><br><span class="line"> <span class="comment"># we don't want to wait in poll() before we run them.</span></span><br><span class="line"> <span class="comment"># 如果发现新增的_callbacks,(回调函数执行时加入了新的回调函数)。</span></span><br><span class="line"> poll_timeout = <span class="number">0.0</span></span><br><span class="line"> <span class="keyword">elif</span> self._timeouts:</span><br><span class="line"> <span class="comment"># If there are any timeouts, schedule the first one.</span></span><br><span class="line"> <span class="comment"># Use self.time() instead of 'now' to account for time</span></span><br><span class="line"> <span class="comment"># spent running callbacks.</span></span><br><span class="line"> poll_timeout = self._timeouts[<span class="number">0</span>].deadline - self.time() <span class="comment"># 距离将来最近一次定时任务的时间,wait该时间。</span></span><br><span class="line"> poll_timeout = max(<span class="number">0</span>, min(poll_timeout, _POLL_TIMEOUT))</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="comment"># No timeouts and no callbacks, so use the default.</span></span><br><span class="line"> <span class="comment"># 未发现新的回调函数与定时任务,则调用poll,等待IO事件,超时事件为3600秒。</span></span><br><span class="line"> poll_timeout = _POLL_TIMEOUT</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> <span class="keyword">not</span> self._running:</span><br><span class="line"> <span class="comment"># 如果回调函数中有调用stop的则,跳出事件循环。</span></span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> self._blocking_signal_threshold <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> <span class="comment"># clear alarm so it doesn't fire while poll is waiting for</span></span><br><span class="line"> <span class="comment"># events.</span></span><br><span class="line"> signal.setitimer(signal.ITIMER_REAL, <span class="number">0</span>, <span class="number">0</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> <span class="comment"># 等待IO事件,events_pairs内容为:[(fd, events), (fd, events), ]</span></span><br><span class="line"> event_pairs = self._impl.poll(poll_timeout)</span><br><span class="line"> <span class="keyword">except</span> Exception <span class="keyword">as</span> e:</span><br><span class="line"> <span class="comment"># Depending on python version and IOLoop implementation,</span></span><br><span class="line"> <span class="comment"># different exception types may be thrown and there are</span></span><br><span class="line"> <span class="comment"># two ways EINTR might be signaled:</span></span><br><span class="line"> <span class="comment"># * e.errno == errno.EINTR</span></span><br><span class="line"> <span class="comment"># * e.args is like (errno.EINTR, 'Interrupted system call')</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># poll陷入内核态以后,进程捕获到的信号会导致poll wait结束,并且错误码为EINTR。</span></span><br><span class="line"> <span class="keyword">if</span> errno_from_exception(e) == errno.EINTR:</span><br><span class="line"> <span class="keyword">continue</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">raise</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> self._blocking_signal_threshold <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> signal.setitimer(signal.ITIMER_REAL,</span><br><span class="line"> self._blocking_signal_threshold, <span class="number">0</span>)</span><br><span class="line"></span><br><span class="line"> <span class="comment"># Pop one fd at a time from the set of pending fds and run</span></span><br><span class="line"> <span class="comment"># its handler. Since that handler may perform actions on</span></span><br><span class="line"> <span class="comment"># other file descriptors, there may be reentrant calls to</span></span><br><span class="line"> <span class="comment"># this IOLoop that modify self._events</span></span><br><span class="line"> self._events.update(event_pairs)</span><br><span class="line"> <span class="keyword">while</span> self._events:</span><br><span class="line"> fd, events = self._events.popitem()</span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> <span class="comment"># 获取file-like object,与IO事件的处理函数handler。</span></span><br><span class="line"> fd_obj, handler_func = self._handlers[fd]</span><br><span class="line"> <span class="comment"># 调用handler,处理fd_obj上发生的events事件,</span></span><br><span class="line"> <span class="comment"># handler_func在add_handler时候,加入了对事件处理的wraps。</span></span><br><span class="line"> handler_func(fd_obj, events)</span><br><span class="line"> <span class="keyword">except</span> (OSError, IOError) <span class="keyword">as</span> e:</span><br><span class="line"> <span class="keyword">if</span> errno_from_exception(e) == errno.EPIPE:</span><br><span class="line"> <span class="comment"># Happens when the client closes the connection</span></span><br><span class="line"> <span class="comment"># 客户端关闭了同服务器的连接。</span></span><br><span class="line"> <span class="keyword">pass</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="comment"># 处理异常。</span></span><br><span class="line"> self.handle_callback_exception(self._handlers.get(fd))</span><br><span class="line"> <span class="keyword">except</span> Exception:</span><br><span class="line"> self.handle_callback_exception(self._handlers.get(fd))</span><br><span class="line"> fd_obj = handler_func = <span class="literal">None</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">finally</span>:</span><br><span class="line"> <span class="comment"># reset the stopped flag so another start/stop pair can be issued</span></span><br><span class="line"> self._stopped = <span class="literal">False</span></span><br><span class="line"> <span class="keyword">if</span> self._blocking_signal_threshold <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> signal.setitimer(signal.ITIMER_REAL, <span class="number">0</span>, <span class="number">0</span>)</span><br><span class="line"> <span class="comment"># 还原前一个IOLoop实例(作者也说了这种情况基本没有...)</span></span><br><span class="line"> IOLoop._current.instance = old_current</span><br><span class="line"> <span class="keyword">if</span> old_wakeup_fd <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> signal.set_wakeup_fd(old_wakeup_fd)</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h1><span id="tornado-ioloop简述">Tornado IOLoop简述</span></h1><h2><span id="前言">前言</span></h2><p>笔者信奉这样一种哲学——“把书从薄读厚,然后从厚读薄”,<code>Tornado</co
</summary>
<category term="tornado" scheme="https://keepalive555.github.io/tags/tornado/"/>
</entry>
<entry>
<title>Python内存回收机制</title>
<link href="https://keepalive555.github.io/2017/11/13/Python%E5%86%85%E5%AD%98%E5%9B%9E%E6%94%B6%E6%9C%BA%E5%88%B6/"/>
<id>https://keepalive555.github.io/2017/11/13/Python内存回收机制/</id>
<published>2017-11-13T15:28:54.000Z</published>
<updated>2020-09-23T16:53:36.519Z</updated>
<content type="html"><. 把对象当做参数调用`sys.getrefcount`方法会增加对象的一个临时引用计数。</code></pre>]]></content>
<summary type="html">
<h1><span id="python27内存回收机制一">Python2.7内存回收机制(一)</span></h1><h2><span id="写在前面">写在前面</span></h2><p><code>Python</code>的内存回收采用<em>引用计数</em>机
</summary>
<category term="编程语言" scheme="https://keepalive555.github.io/categories/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/"/>
<category term="源码分析" scheme="https://keepalive555.github.io/categories/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/"/>
<category term="Python" scheme="https://keepalive555.github.io/tags/Python/"/>
<category term="内存回收" scheme="https://keepalive555.github.io/tags/%E5%86%85%E5%AD%98%E5%9B%9E%E6%94%B6/"/>
</entry>
<entry>
<title>Huffman编码</title>
<link href="https://keepalive555.github.io/2017/08/31/Huffman%E7%BC%96%E7%A0%81/"/>
<id>https://keepalive555.github.io/2017/08/31/Huffman编码/</id>
<published>2017-08-30T16:45:00.000Z</published>
<updated>2017-08-31T15:53:15.000Z</updated>
<content type="html"><![CDATA[<h1><span id="huffman编码">Huffman编码</span></h1><p><strong>注意:</strong> <a href="">原创技术博客,转载请注明原文地址</a>。</p><h2><span id="huffman编码简介">Huffman编码简介</span></h2><p>依然记得初次接触<code>Huffman</code>编码,是在大一的《计算机组成原理》课程上,老师采用<code>Huffman</code>编码实现了一种<code>CPU</code>(虚拟机字节码同理)变长指令集。当时感觉特别神奇,后来又在《数据结构》课程上接触到了<code>Huffman Tree</code>(霍夫曼树),算是对<code>Huffman</code>编码有了一个比较全面的认识。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""Huffman编码Python实现。"""</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Node</span>(<span class="params">object</span>):</span></span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span>(<span class="params">self, score, value=None, lchild=None, rchild=None</span>):</span></span><br><span class="line"> self.score = score</span><br><span class="line"> self.value = value</span><br><span class="line"> self.lchild = lchild</span><br><span class="line"> self.rchild = rchild</span><br><span class="line"> self.bit = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">generate_huffman_tree</span>(<span class="params">sequence</span>):</span></span><br><span class="line"></span><br><span class="line"> char2frequency = dict()</span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> sequence:</span><br><span class="line"> char2frequency.setdefault(c, <span class="number">0</span>)</span><br><span class="line"> char2frequency[c] += <span class="number">1</span></span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">def</span> <span class="title">_cmp</span>(<span class="params">a, b</span>):</span></span><br><span class="line"> delta = a.score - b.score</span><br><span class="line"> <span class="keyword">if</span> delta != <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> sequence.index(a.value) - sequence.index(b.value)</span><br><span class="line"> <span class="keyword">return</span> delta</span><br><span class="line"></span><br><span class="line"> queue = [Node(char2frequency[x], x) <span class="keyword">for</span> x <span class="keyword">in</span> set(sequence)]</span><br><span class="line"> queue.sort(cmp=_cmp)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span> len(queue) > <span class="number">1</span>:</span><br><span class="line"> lchild, rchild = queue.pop(), queue.pop()</span><br><span class="line"> lchild.bit, rchild.bit = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line"> parent_node = Node(</span><br><span class="line"> lchild.score+rchild.score, lchild=lchild, rchild=rchild)</span><br><span class="line"> queue.insert(<span class="number">0</span>, parent_node)</span><br><span class="line"> queue.sort(key=<span class="keyword">lambda</span> o: o.score)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> queue[<span class="number">0</span>]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">generate_check_list</span>(<span class="params">node</span>):</span></span><br><span class="line"></span><br><span class="line"> stack = []</span><br><span class="line"> check_list = dict()</span><br><span class="line"> <span class="keyword">while</span> node <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span> <span class="keyword">or</span> stack:</span><br><span class="line"> <span class="keyword">while</span> node <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> stack.append(node)</span><br><span class="line"> node = node.lchild</span><br><span class="line"> <span class="keyword">if</span> stack:</span><br><span class="line"> node = stack.pop()</span><br><span class="line"> <span class="keyword">if</span> node.value <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> sequence = [x.bit <span class="keyword">for</span> x <span class="keyword">in</span> stack]</span><br><span class="line"> sequence.append(node.bit)</span><br><span class="line"> check_list[node.value] = int(<span class="string">''</span>.join(map(str, sequence)), <span class="number">2</span>)</span><br><span class="line"> node = node.rchild</span><br><span class="line"> <span class="keyword">return</span> check_list</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line"> tree = generate_huffman_tree(<span class="string">'shootsheetjobwork'</span>)</span><br><span class="line"> check_list = generate_check_list(tree)</span><br><span class="line"> <span class="keyword">print</span> check_list</span><br><span class="line"></span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h1><span id="huffman编码">Huffman编码</span></h1><p><strong>注意:</strong> <a href="">原创技术博客,转载请注明原文地址</a>。</p>
<h2><span id="huffman编码简介">Huffma
</summary>
<category term="Alg" scheme="https://keepalive555.github.io/categories/Alg/"/>
<category term="Alg" scheme="https://keepalive555.github.io/tags/Alg/"/>
</entry>
<entry>
<title>MySQL前缀索引</title>
<link href="https://keepalive555.github.io/2017/07/31/MySQL%E5%89%8D%E7%BC%80%E7%B4%A2%E5%BC%95/"/>
<id>https://keepalive555.github.io/2017/07/31/MySQL前缀索引/</id>
<published>2017-07-31T14:42:18.000Z</published>
<updated>2020-09-23T16:54:36.948Z</updated>
<content type="html"><![CDATA[<h2><span id="mysql前缀索引">MySQL前缀索引</span></h2><p><strong>注意:</strong> 本文的所有优化仅针对InnoDB存储引擎。</p><p>MySQL索引可以加快查询速度,但是索引并不是越多越好,索引虽然可以加快数据的查询速度,但是索引文件本身需要占用存储空间,数据的插入,删除,和修改也需要索引保持同步,据一线有经验的DBA介绍,索引列占表所有列的30%是比较合理的。</p><h4><span id="查看表存在的索引">查看表存在的索引</span></h4><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">show</span> <span class="keyword">index</span> <span class="keyword">from</span> <span class="string">`table`</span>;</span><br></pre></td></tr></table></figure><h4><span id="查看索引占用磁盘空间大小">查看索引占用磁盘空间大小</span></h4><blockquote><p>MySQL<code>information_schema</code>数据库<code>TABLES</code>表<a href="https://dev.mysql.com/doc/refman/5.7/en/tables-table.html">官方手册</a></p></blockquote><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">concat</span>(<span class="keyword">round</span>(<span class="keyword">sum</span>(INDEX_LENGTH/(<span class="number">1024</span>*<span class="number">1024</span>)), <span class="number">2</span>), <span class="string">'MB'</span>) <span class="keyword">as</span> <span class="string">'Total Index Size'</span> <span class="keyword">from</span> <span class="string">`information_schema`</span>.<span class="string">`TABLES`</span> <span class="keyword">where</span> <span class="string">`table_schema`</span> = <span class="string">'order'</span> <span class="keyword">and</span> <span class="string">`table_name`</span> = <span class="string">'order_base'</span>;</span><br></pre></td></tr></table></figure><p><strong>注意:</strong> 以下两种情况,不建议建立索引。</p><ul><li>表记录数比较少 —— InnoDB辅助索引叶子节点存储的不是记录的物理地址,而是主键值,使用辅助索引查找数据总是需要第二次查找主键索引,在表记录较少的情况下,建立索引反而会降低查询性能,所以直接全表扫描就好了,具体基准需根据业务场景而定。</li><li>索引的选择性比较低 —— 索引的选择性指不重复的索引值与表记录数的比值,值域为(0, 1]。索引选择性越高越好,可以通俗的理解为,选择性高的索引在查询时往往可以筛选出比较多的记录,所以访问表的次数较少,索引的有效率比较高。例如存在基本表<code>table</code>,我们要在表<code>table</code>的<code>column</code>列上建立一个索引,我们使用如下<code>SQL</code>语句计算该索引的选择性:</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">count</span>(<span class="keyword">distinct</span> <span class="string">`column`</span>) / <span class="keyword">select</span> <span class="keyword">count</span>(*) <span class="keyword">from</span> <span class="string">`table`</span>;</span><br></pre></td></tr></table></figure><h4><span id="前缀索引">前缀索引</span></h4><p>前缀索引是一种与索引选择性相关联的索引优化技术,顾名思义,使用列的前缀代替整个列作为Key,当前缀长度合理时,既可以做到前缀索引的选择性接近全列索引,同时因为索引Key变短而减少索引文件的大小和维护开销。</p><p>想象一下存在如下业务场景,一张存放订单信息的基本表<code>order</code>,订单号长度为24位(如:61700123215824),表结构如下:</p><table><thead><tr><th>id</th><th>order_id</th></tr></thead><tbody><tr><td>1</td><td>61700123215824</td></tr><tr><td>2</td><td>61500280698102</td></tr><tr><td>3</td><td>61500280756582</td></tr></tbody></table><p>首先计算一下,全列索引选择性:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">round</span>(<span class="keyword">count</span>(<span class="keyword">distinct</span> <span class="string">`order_id`</span>) / <span class="keyword">count</span>(*), <span class="number">2</span>) <span class="keyword">from</span> <span class="string">`order`</span>;</span><br></pre></td></tr></table></figure><p>结果为1.00,选择前缀长度为5,看一下索引的选择性:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">round</span>(<span class="keyword">count</span>(<span class="keyword">distinct</span> <span class="keyword">left</span>(<span class="string">`order_id`</span>, <span class="number">5</span>)) / <span class="keyword">count</span>(*), <span class="number">2</span>) <span class="keyword">from</span> <span class="string">`order`</span>;</span><br></pre></td></tr></table></figure><p>结果为0.00,5个字符长度的前缀,看来不行,我们增加到10,看一下索引的选择性:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">round</span>(<span class="keyword">count</span>(<span class="keyword">distinct</span> <span class="keyword">left</span>(<span class="string">`order_id`</span>, <span class="number">10</span>)) / <span class="keyword">count</span>(*), <span class="number">2</span>) <span class="keyword">from</span> <span class="string">`order`</span>;</span><br></pre></td></tr></table></figure><p>结果为0.63,差强人意,我们继续以5位单位递增,看一下索引的选择性:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="keyword">round</span>(<span class="keyword">count</span>(<span class="keyword">distinct</span> <span class="keyword">left</span>(<span class="string">`order_id`</span>, <span class="number">15</span>)) / <span class="keyword">count</span>(*), <span class="number">2</span>) <span class="keyword">from</span> <span class="string">`order`</span>;</span><br></pre></td></tr></table></figure><p>由于round的四舍五入,结果为1.00,那前缀能不能更少一些,在公司的业务系统内试了一下,前缀长度为11时,索引选择性为0.9999,四舍五入为1。通过夹逼的方法(😆默默的想起高数的夹逼定理),找到了最佳的索引前缀长度为11,接下来我们创建前缀索引:</p><blockquote><p>创建前缀索引</p></blockquote><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">alter</span> <span class="keyword">table</span> <span class="string">`order`</span> <span class="keyword">add</span> <span class="keyword">index</span> <span class="string">`idx_order_id`</span>(<span class="string">`order_id`</span>(<span class="number">11</span>));</span><br></pre></td></tr></table></figure><p><strong>注意:</strong> 前缀索引的缺点</p><ul><li>前缀索引不能用于order by与group by操作。</li><li>前缀索引不能用于covering index —— 显而易见,前缀索引并未包含<code>order_id</code>列的全部信息,所以并不能用于covering index。</li></ul>]]></content>
<summary type="html">
<h2><span id="mysql前缀索引">MySQL前缀索引</span></h2><p><strong>注意:</strong> 本文的所有优化仅针对InnoDB存储引擎。</p>
<p>MySQL索引可以加快查询速度,但是索引并不是越多越好,索引虽然可以加快数据的查询
</summary>
<category term="MySQL" scheme="https://keepalive555.github.io/categories/MySQL/"/>
<category term="MySQL" scheme="https://keepalive555.github.io/tags/MySQL/"/>
<category term="索引优化" scheme="https://keepalive555.github.io/tags/%E7%B4%A2%E5%BC%95%E4%BC%98%E5%8C%96/"/>
</entry>
<entry>
<title>高效率Shell</title>
<link href="https://keepalive555.github.io/2017/07/06/%E9%AB%98%E6%95%88%E7%8E%87Shell/"/>
<id>https://keepalive555.github.io/2017/07/06/高效率Shell/</id>
<published>2017-07-06T05:52:12.000Z</published>
<updated>2017-08-05T08:12:57.000Z</updated>
<content type="html"><![CDATA[<p>快速将开发分支代码合并至测试分支。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">export</span> DEV=dev</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">function</span> <span class="function"><span class="title">merge_dev</span></span>() {</span><br><span class="line"> <span class="keyword">if</span> [ -z <span class="variable">${DEV}</span> ]; <span class="keyword">then</span></span><br><span class="line"> <span class="keyword">if</span> [ -z <span class="variable">$1</span> ]; <span class="keyword">then</span></span><br><span class="line"> DEV=<span class="string">"dev"</span></span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> DEV=<span class="variable">$1</span></span><br><span class="line"> <span class="keyword">fi</span></span><br><span class="line"> <span class="keyword">fi</span></span><br><span class="line"> BRANCH=$(git rev-parse --abbrev-ref HEAD)</span><br><span class="line"> git push origin <span class="variable">${BRANCH}</span></span><br><span class="line"> git checkout <span class="variable">${DEV}</span> \</span><br><span class="line"> && git pull origin <span class="variable">${DEV}</span> \</span><br><span class="line"> && git merge <span class="variable">${BRANCH}</span> -m <span class="string">"merge code automatically"</span> \</span><br><span class="line"> && git push origin <span class="variable">${DEV}</span> \</span><br><span class="line"> && git checkout <span class="variable">${BRANCH}</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>统计工程代码行数</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">find ./ -name <span class="string">"*.py"</span> -<span class="built_in">exec</span> wc -l {} \;|awk <span class="string">'BEGIN{total=0}{print $1"\t"$2; $total+=$1}END{print "total lines: "$total}'</span></span><br></pre></td></tr></table></figure><p>显示代码行最多的文件</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">find ./ -name <span class="string">"*.py"</span> | xargs -n 1 wc -l|sort -nr</span><br></pre></td></tr></table></figure><p>显示处于<code>TIME_WAIT</code>的<code>SOCKET</code>个数</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">netstat -n|awk <span class="string">'/^tcp/{a[$6]++}END{for (j in a) print a[j], j}'</span>|sort -nr|head -n 10</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<p>快速将开发分支代码合并至测试分支。</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td cla
</summary>
<category term="Bash" scheme="https://keepalive555.github.io/categories/Bash/"/>
<category term="Shell" scheme="https://keepalive555.github.io/categories/Bash/Shell/"/>
<category term="Shell" scheme="https://keepalive555.github.io/tags/Shell/"/>
<category term="Bash" scheme="https://keepalive555.github.io/tags/Bash/"/>
</entry>
<entry>
<title>高效率Bash Shell快捷键</title>
<link href="https://keepalive555.github.io/2017/04/15/%E9%AB%98%E6%95%88%E7%8E%87Bash-Shell%E5%BF%AB%E6%8D%B7%E9%94%AE/"/>
<id>https://keepalive555.github.io/2017/04/15/高效率Bash-Shell快捷键/</id>
<published>2017-04-14T16:47:59.000Z</published>
<updated>2017-07-06T10:20:58.000Z</updated>
<content type="html"><![CDATA[<h2><span id="简介">简介</span></h2><p><a href="https://keepalive555.github.io/2017/04/15/%E9%AB%98%E6%95%88%E7%8E%87Bash-Shell%E5%BF%AB%E6%8D%B7%E9%94%AE/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p><p>Bash Shell快捷键默认为Emacs编辑模式,与Emacs编辑器快捷键兼容,可通过<code>set -o vi</code>切换至vi编辑模式,通过<code>set -o emacs</code>切换回默认的Emacs编辑模式,本文所列快捷键均为Emacs模式下快捷键。</p><h5><span id="特别注意"><em>特别注意:</em></span></h5><blockquote><p>本人工作中一直使用Emacs编辑模式,Emacs编辑模式下的Alt键默认非Meta键(Emacs中使用频率很高的组合键),需要在终端中打开该项设置。</p></blockquote><h3><span id="1移动命令">1.移动命令</span></h3><ul><li>Ctrl + a: 移动到命令行行首</li><li>Ctrl + e: 移动到命令行行尾</li><li>Ctrl + f: 向前移动一个字符</li><li>Ctrl + b: 向后移动一个字符</li><li>Alt + f: 向前移动一个单词</li><li>Alt + b: 向后移动一个单词</li></ul><h3><span id="2删除命令">2.删除命令</span></h3><ul><li>Ctrl + u: 删除命令行行首至光标处的所有字符</li><li>Ctrl + k: 删除光标处至命令行行尾的所有字符</li><li>Ctrl + d: 删除光标处的字符</li><li>Ctrl + h: 删除光标前的字符</li><li>Ctrl + w: 删除光标处至单词词首的所有字符</li><li>Alt + d: 删除光标处至单词词尾的所有字符</li></ul><h3><span id="3编辑命令">3.编辑命令</span></h3><ul><li>Ctrl + y: 粘贴至光标后</li><li>Ctrl + x Ctrl + u: 撤销上次编辑操作</li><li>Ctrl + x Ctrl + e: 启动编辑器编辑命令(<strong>注:可编辑字符较多的命令</strong>)</li></ul><h3><span id="4搜索历史命令">4.搜索历史命令</span></h3><ul><li>Ctrl + r: 向后搜索历史命令(<strong>注:Ctrl + r可按多次,向后搜索历史命令</strong>)</li><li>Ctrl + g: 退出搜索模式</li><li>Ctrl + p: 上一条命令</li><li>Ctrl + n: 下一条命令</li></ul><h3><span id="5控制命令">5.控制命令</span></h3><ul><li>Ctrl + l: 清屏(<strong>注:相当于clear命令</strong>)</li><li>Ctrl + c: 终止命令</li><li>Ctrl + z: 挂起命令</li></ul><h3><span id="6命令">6.!命令</span></h3><ul><li>!!: 执行上一条命令(<strong>注:执行$ls,后执行$!!,则会执行$ls</strong>)</li><li>!<prefix>: 执行最近执行的以<prefix>开头的命令</prefix></prefix></li></ul><h3><span id="参考文档">参考文档</span></h3><p>[1]: <a href="https://www.gnu.org/software/bash/manual/bash.html#Command-Line-Editing">Bash官方手册</a></p>]]></content>
<summary type="html">
<h2><span id="简介">简介</span></h2><p><a href="https://keepalive555.github.io/2017/04/15/%E9%AB%98%E6%95%88%E7%8E%87Bash-Shell%E5%BF%AB%E6%8D%B
</summary>
<category term="Shell" scheme="https://keepalive555.github.io/categories/Shell/"/>
<category term="Bash" scheme="https://keepalive555.github.io/categories/Shell/Bash/"/>
<category term="Shell" scheme="https://keepalive555.github.io/tags/Shell/"/>
<category term="Bash" scheme="https://keepalive555.github.io/tags/Bash/"/>
</entry>
<entry>
<title>MySQL笔记</title>
<link href="https://keepalive555.github.io/2017/04/04/MySQL%E7%AC%94%E8%AE%B0/"/>
<id>https://keepalive555.github.io/2017/04/04/MySQL笔记/</id>
<published>2017-04-04T06:24:23.000Z</published>
<updated>2020-09-23T16:54:20.607Z</updated>
<content type="html"><![CDATA[<figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%innodb_old_blocks_pct%'</span>; <span class="comment">-- 37%</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%innodb_buffer_pool_size%'</span>; <span class="comment">-- 显示InnoDB存储引擎BufferPool大小,134217728 = 128Mb</span></span><br><span class="line"><span class="comment">-- 热点数据</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%innodb_old_blocks_time%'</span>; <span class="comment">-- 1000ms, 位于midpoint处的数据在等待多长时间之后才会放入new热点数据。</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">engine</span> <span class="keyword">innodb</span> <span class="keyword">status</span>; <span class="comment">-- 显示innodb引擎的状态</span></span><br><span class="line"><span class="comment">-- Modified db pages 指的是Flush list里面的DirtyPage</span></span><br><span class="line"><span class="comment">-- buffer pool hit rate</span></span><br><span class="line"><span class="keyword">select</span> pool_id, hit_rate, pages_made_young <span class="keyword">from</span> <span class="string">`information_schema`</span>.nnodb_buffer_pool_stats; <span class="comment">--</span></span><br><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`information_schema`</span>.innodb_buffer_page_lru; <span class="comment">-- 查看innodb BufferPool中LRU list中具体Page信息。 </span></span><br><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`information_schema`</span>.innodb_buffer_page_lru <span class="keyword">where</span> oldest_modification > <span class="number">0</span>; <span class="comment">-- 查看DirtyPage</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%innodb_log_buffer_size%'</span>; <span class="comment">-- redo log buffer大小,默认为8MB,该缓冲区不需要太大,因为会被刷新至日志文件。 </span></span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%innodb_fast_shutdown%'</span>; <span class="comment">-- SharpCheckpoint在Database关闭时将所有DirtyPage Flush回Disk,这是默认的工作方式。</span></span><br><span class="line"><span class="comment">-- FuzzyCheckpoint 进行部分页的刷新</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">engine</span> <span class="keyword">innodb</span> <span class="keyword">status</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'datadir'</span>; <span class="comment">-- datadir代表数据库所在目录</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">master</span> <span class="keyword">status</span>; <span class="comment">-- 显示master节点信息</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'binlog_cache_size'</span>; <span class="comment">-- binlo_cache_size mysql会为每一个Session开启一个binlog_cache,默认为32kB大小。</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">global</span> <span class="keyword">status</span> <span class="keyword">like</span> <span class="string">'binlog_cache%'</span>; <span class="comment">-- Binlog_cache_use -> 缓存使用次数,Binlog_cache_disk_use -> 磁盘使用次数。</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'sync_binlog'</span>; <span class="comment">-- 二进制日志并不是在每次写的时候同步到磁盘(用户可以理解为缓冲写),因此数据所在操作系统挂机时,可能会有一部分数据没有写入二进制文件</span></span><br><span class="line"><span class="comment">-- sync_binlog = [N],表示每写多少次就同步至磁盘sync_binlog = 1表示采用同步写磁盘的方式来写二进制日志。</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_support_xa'</span>; <span class="comment">-- 让mysql支持xa事务</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'binlog_format'</span>; <span class="comment">-- 指定binlog格式</span></span><br><span class="line"><span class="keyword">set</span> @@session.binlog_format = <span class="string">'ROW'</span>; <span class="comment">-- 修改会话级别binlog_format</span></span><br><span class="line"><span class="keyword">select</span> @@session.binlog_format; <span class="comment">-- ROW</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'socket'</span>; <span class="comment">-- mysqld监听的unix socket文件</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'pid_file'</span>; <span class="comment">-- mysqld的pid文件路径</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_data_file_path'</span>; <span class="comment">-- ibdata1:12M:autoextend 格式innodb_data_file_path=datafile_spec1[; datafile_spec2]...</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_file_per_table'</span>; <span class="comment">-- 为每个基于innodb存储引擎的表产生一个独立的表空间,命名规则为:</span></span><br><span class="line"> </span><br><span class="line"><span class="comment">-- innodb默认ib_logfile0, ib_logfile1是重做日志(redo log file)</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_log_file_size'</span>; <span class="comment">-- innodb redo log file size默认为512GB</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_log_files_in_group'</span>; <span class="comment">-- redo log group日志文件数量</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_log_group_home_dir'</span>;</span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_mirrored_log_groups'</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'innodb_flush_log_at_trx_commit'</span>; <span class="comment">-- 有效值为:0,1,2。0 - 当提交事务时,并不将事务的重做日志写入磁盘上的日志文件,而是等待主线程每秒的刷新。</span></span><br><span class="line"><span class="comment">-- 1 - 表示在执行commit时将重做日志缓冲同步写到磁盘,即伴有fsync的调用, 2 - 表示将重做日志写入磁盘,即写到文件系统的缓存中。</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">variables</span> <span class="keyword">like</span> <span class="string">'%partition%'</span>;</span><br><span class="line"><span class="keyword">show</span> plugins;</span><br><span class="line"></span><br><span class="line"><span class="comment">-- mysql数据库支持以下几种类型的分区。range分区,list分区,hash分区,key分区。</span></span><br><span class="line"><span class="comment">-- range分区:行数据基于属于一个给定连续区间的列值被放入分区。</span></span><br><span class="line"><span class="comment">-- list</span></span><br><span class="line"><span class="comment">-- hash</span></span><br><span class="line"><span class="comment">-- key</span></span><br><span class="line"></span><br><span class="line"><span class="comment">-- innodb_file_pe_table</span></span><br><span class="line"><span class="comment">-- 表空间</span></span><br><span class="line"><span class="comment">-- ①数据②索引③插入缓冲Bitmap页</span></span><br><span class="line"></span><br><span class="line"><span class="comment">-- system ls /;</span></span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="
</summary>
<category term="MySQL" scheme="https://keepalive555.github.io/categories/MySQL/"/>
<category term="MySQL" scheme="https://keepalive555.github.io/tags/MySQL/"/>
<category term="语句" scheme="https://keepalive555.github.io/tags/%E8%AF%AD%E5%8F%A5/"/>
</entry>
<entry>
<title>Thrift教程</title>
<link href="https://keepalive555.github.io/2017/04/01/Thrift%E6%95%99%E7%A8%8B/"/>
<id>https://keepalive555.github.io/2017/04/01/Thrift教程/</id>
<published>2017-03-31T16:21:27.000Z</published>
<updated>2020-09-23T16:51:41.424Z</updated>
<content type="html"><![CDATA[<h2><span id="1thrift简介">1.Thrift简介</span></h2><p><a href="https://keepalive555.github.io/2017/04/01/Thrift%E6%95%99%E7%A8%8B/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p><p>Thrift由Facebook开发,解决了由不同语言编写的服务之间的调用问题。2009年Facebook将Thrift贡献给Apache基金会,成为一个开源项目。Thrift为典型的C/S架构,采用IDL(Interface Description Language),定义接口。</p><p>Thrift提供了命令行工具thrift,根据指定的IDL定义文件,生成不同语言的代码。因Thrift的依赖比较多,所以建议读者,采用自己开发平台的包管理工具进行安装。本人平时在macOS下进行开发工作,采用brew install thrift,便可自动完成了thrift的安装。</p><h2><span id="2thrift-idl介绍">2.Thrift IDL介绍</span></h2><p>Thrift IDL的语法比较接近C++,熟悉C++的读者,可以迅速掌握Thrift IDL的语法。由于IDL语法上比较接近C++,所以Thrift IDL比较偏向于静态语言,Python、Ruby等动态语言开发者在使用IDL定义接口时需要特别注意。</p><p>Thrift IDL支持C/C++风格的“//”,“/*,*/”注释,也支持Python风格的”#”注释。</p><h3><span id="1基本数据类型">1.基本数据类型</span></h3><ul><li>bool: A boolean value, true or false(<strong>注:布尔值</strong>)</li><li>byte: A signed byte(<strong>注:有符号字节</strong>)</li><li>i16: A 16-bit signed integer(<strong>注:16位有符号数</strong>)</li><li>i32: A 32-bit signed integer(<strong>注:32位有符号数</strong>)</li><li>i64: A 64-bit signed integer(<strong>注:64位有符号数</strong>)</li><li>double: A 64-bit floating point number(<strong>注:64位浮点数</strong>)</li><li>string: An encoding-agnostic text or binary string(<strong>注:编码无关的字符串</strong>)</li><li>void: Void (<strong>注:无返回值</strong>)</li></ul><h3><span id="2复合类型">2.复合类型</span></h3><h4><span id="1struct">1.struct</span></h4><p>struct——结构体类型,类似于C/C++中的结构体类型,将不同类型的数据聚合到一块儿。(<strong>注:描述面向对象语言中的类</strong>)。</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">User</span> </span>{</span><br><span class="line"><span class="number">1</span>: <span class="built_in">i32</span> id,</span><br><span class="line"><span class="number">2</span>: <span class="built_in">string</span> user_name,</span><br><span class="line"><span class="number">3</span>: <span class="keyword">required</span> <span class="built_in">bool</span> is_admin,</span><br><span class="line"><span class="number">4</span>: <span class="keyword">optional</span> <span class="built_in">string</span> nick_name,</span><br><span class="line"><span class="number">5</span>: <span class="keyword">optional</span> <span class="built_in">string</span> sex = <span class="string">"male"</span>,</span><br><span class="line"><span class="number">20</span>: <span class="keyword">optional</span> <span class="built_in">string</span> city,</span><br><span class="line">} </span><br></pre></td></tr></table></figure><p><strong>注意:</strong> </p><ol><li>optional关键字标识该字段为可选的。</li><li>required关键字标识该字段为必填的。</li><li>定义中的序号,不可重复,但不是必须连续的。</li></ol><h4><span id="2enum">2.enum</span></h4><p>enum枚举类型</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">enum</span> <span class="title">UserType</span> </span>{</span><br><span class="line">BLOCKED = <span class="number">0</span>,</span><br><span class="line">NORMAL = <span class="number">1</span>,</span><br><span class="line">ADMIN = <span class="number">2</span>,</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3><span id="3容器类型">3.容器类型</span></h3><p>容器类型比较类似于C++ STL中的容器类型,Thrift提供了三种容器类型:list,map,set。</p><ul><li>list<T>: 由元素T组成的有序列表,Thrift将该容器类型编译为,C++ STL中的vector,Java中ArrayList,Python中list等等…</li><li>set<T>: 无序的集合,Thrift将该容器类型编译为,C++ STL中的set,Java中的HashSet,Python中set等等…</li><li>map<T1, T2>: k-v映射,Thrift将该容器类型编译为,C++ STL中的map,Java中的HashMap,Python中的dict等等…</li></ul><h3><span id="4自定义类型">4.自定义类型</span></h3><p>Thrift支持C/C++风格的,typedef关键字,用于声明自定义类型。</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="built_in">i32</span> Integer</span><br></pre></td></tr></table></figure><h3><span id="5异常">5.异常</span></h3><p>Thrift支持自定义异常,语法同struct类型的定义相似,如下所示:</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">exception</span> <span class="title">NotFound</span> </span>{</span><br><span class="line"><span class="number">1</span>: <span class="built_in">string</span> message,</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><strong>注:在编写服务端代码时,仅仅在代码中抛出异常,是不够的,需要在定义接口时,指明该接口可能抛出的异常</strong></p><h3><span id="6服务">6.服务</span></h3><p>Thrift IDL中的Service,与Java中的Interface有异曲同工之妙。Thrift定义服务的语法如下:</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">service</span> <<span class="title">name</span>> </span>{</span><br><span class="line"><returntype> <name>(<arguments>)</span><br><span class="line">[throws (<exceptions>)]</span><br><span class="line">...</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>定义一个获取用户的Service示例如下:</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">service</span> <span class="title">UserService</span> </span>{</span><br><span class="line">User query_user(<span class="number">1</span>: <span class="built_in">i32</span> user_id) throws (</span><br><span class="line"><span class="number">1</span>: NotFound not_found),</span><br><span class="line">User add_user(<span class="number">1</span>: User user);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><strong>注:service中接口定义之间的分割符号,可以是”,“,也可以是”;“。</strong></p><h3><span id="7命名空间">7.命名空间</span></h3><p>Thrift支持C++风格的命明空间,等同于Java、Python中的package的概念。Thrift定义命名空间的语法如下:</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">namespace</span> cpp com.example.project</span><br><span class="line"><span class="keyword">namespace</span> java com.example.project</span><br><span class="line"><span class="keyword">namespace</span> py com.example.project</span><br></pre></td></tr></table></figure><h2><span id="3thrift使用方法">3.Thrift使用方法</span></h2><p>示例文件:</p><figure class="highlight thrift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// filename: user.thrift</span></span><br><span class="line"><span class="keyword">namespace</span> py thrift.rpc.user</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">enum</span> <span class="title">Sex</span> </span>{</span><br><span class="line">FEMALE = <span class="number">0</span>,</span><br><span class="line">MALE = <span class="number">1</span>,</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">User</span> </span>{</span><br><span class="line"><span class="number">1</span>: <span class="built_in">string</span> user_name,</span><br><span class="line"><span class="number">2</span>: Sex sex,</span><br><span class="line"><span class="number">3</span>: <span class="keyword">required</span> <span class="built_in">i32</span> age,</span><br><span class="line"><span class="number">4</span>: <span class="keyword">optional</span> <span class="built_in">string</span> city,</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">exception</span> <span class="title">NotFound</span> </span>{</span><br><span class="line"><span class="number">1</span>: <span class="built_in">string</span> message,</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">service</span> <span class="title">UserService</span> </span>{</span><br><span class="line">User query_user(<span class="number">1</span>: <span class="built_in">i32</span> id) throws (</span><br><span class="line"><span class="number">1</span>: NotFound not_found),</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>在Shell里面输入如下命令,生成Python代码:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">thrift -out py --gen py:new_style,utf8strings,coding=utf-8</span><br></pre></td></tr></table></figure><p>--gen参数,指定待目标语言,冒号后以都好分割的为thrift为该语言提供的选项。</p><p>-out参数,指定thrift生成代码的存放路径(**注:仅有一个-**)</p><p>更详细的用法,可通过<code>thrift --help</code>获取。</p><h2><span id="参考资料">参考资料</span></h2><p>[1]<a href="https://thrift.apache.org/static/files/thrift-20070401.pdf">Thrift官方手册</a></p><p>[2]<a href="https://diwakergupta.github.io/thrift-missing-guide/">Thrift:The Missing Guide</a></p><p>[3]<a href="https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=blob_plain;f=tutorial/tutorial.thrift">Thrift官方示例</a></p>]]></content>
<summary type="html">
<h2><span id="1thrift简介">1.Thrift简介</span></h2><p><a href="https://keepalive555.github.io/2017/04/01/Thrift%E6%95%99%E7%A8%8B/"><em>转载说明:本文章
</summary>
<category term="RPC" scheme="https://keepalive555.github.io/categories/RPC/"/>
<category term="编程语言" scheme="https://keepalive555.github.io/categories/RPC/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/"/>
<category term="Python" scheme="https://keepalive555.github.io/tags/Python/"/>
<category term="Thrift" scheme="https://keepalive555.github.io/tags/Thrift/"/>
<category term="IDL" scheme="https://keepalive555.github.io/tags/IDL/"/>
</entry>
<entry>
<title>MySQL分页</title>
<link href="https://keepalive555.github.io/2017/03/29/MySQL%E5%88%86%E9%A1%B5/"/>
<id>https://keepalive555.github.io/2017/03/29/MySQL分页/</id>
<published>2017-03-29T14:49:42.000Z</published>
<updated>2020-09-23T16:54:10.036Z</updated>
<content type="html"><![CDATA[<p><a href="http://keepalive555.github.io/2017/03/29/MySQL%E5%88%86%E9%A1%B5/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p><p>公司最近上线了“订单管理系统”,系统内有比较多的分页展示逻辑,所以我单独拿出了点时间,对MySQL的分页做了总结。</p><p>提到MySQL分页,我们通常会首先考虑,使用偏移量offset+limit的办法实现。下面以实现目标:<br><strong>查询订单表<code>order</code>,并按订单创建时间<code>create_time</code>,降序排序,每页<code>50</code>条记录</strong><br>,为例进行说明。首先我们想到的sql语句如下:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">desc</span> <span class="keyword">limit</span> <span class="number">0</span>, <span class="number">50</span>;</span><br></pre></td></tr></table></figure><p>多次执行该操作,随着页数增加(<em>即:偏移量增大</em>),该查询语句的性能随之下降,耗时比较明显。当并发量上来时,这对MySQL数据库的压力是致命的。以翻页至200页为例,MySQL会查询<code>200 * 50</code>条记录,最后只返回<code>50</code>条记录,前面的<code>200 * (50 - 1)</code>条记录,将会被丢弃。</p><p>因为订单的属性很多,所以并无法为每一列建立索引(当然为每一列都建立索引,是简单粗暴的)。优化此类分页查询的一个最简单的办法,尽可能使用索引覆盖扫描,而不是查询所有列,然后在关联返回所需要的列。优化后的SQL语句如下所示:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">inner</span> <span class="keyword">join</span> (<span class="keyword">select</span> <span class="keyword">id</span> <span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">limit</span> <span class="keyword">desc</span> <span class="number">10000</span>, <span class="number">50</span>) <span class="keyword">as</span> <span class="string">`tmp`</span> <span class="keyword">using</span>(<span class="string">`id);</span></span><br></pre></td></tr></table></figure><p>这里MySQL扫描了尽可能少的页面,获取需要访问的记录后,然后再去关联查询,获取了所需的列,该种用法还存在若干类似的变种。</p><p>最后,也是我们确定的技术选型。推荐的用法,前端每次查询都传入上次查询记录的max_id,或者min_id传递给我们,然后根据id的索引去优化该操作。(<strong>注:只有id列的单调性与目标列的单调性一致时可采用该方案</strong>)产品设计上,我们不返回具体的页数,只提供了当前页,前后10页跳页的功能,解决了该问题。</p><p>向后翻页</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">desc</span> <span class="keyword">where</span> <span class="string">`id`</span> < min_id <span class="keyword">limit</span> <span class="number">50</span>;</span><br></pre></td></tr></table></figure><p>向前跳页</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">desc</span> <span class="keyword">where</span></span><br><span class="line"><span class="string">`id`</span> < min_id <span class="keyword">limit</span> <span class="number">450</span>, <span class="number">50</span>;</span><br></pre></td></tr></table></figure><p>向前翻页</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">desc</span> <span class="keyword">where</span> <span class="string">`id`</span> > max_id <span class="keyword">limit</span> <span class="number">50</span>;</span><br></pre></td></tr></table></figure><p>向后跳页</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> *<span class="keyword">from</span> <span class="string">`order`</span> <span class="keyword">order</span> <span class="keyword">by</span> <span class="string">`create_time`</span> <span class="keyword">desc</span> <span class="keyword">where</span> <span class="string">`id`</span> > max_id <span class="keyword">limit</span> <span class="number">450</span>, <span class="number">50</span>;</span><br></pre></td></tr></table></figure><p>当数据库,采用分库分表,或者中间件时的分页操作,较为复杂,需要根据具体情况确定,本次不做陈述。</p>]]></content>
<summary type="html">
<p><a href="http://keepalive555.github.io/2017/03/29/MySQL%E5%88%86%E9%A1%B5/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p>
<p>公司最近上线了“订单管理系统”,系
</summary>
<category term="MySQL" scheme="https://keepalive555.github.io/categories/MySQL/"/>
<category term="MySQL" scheme="https://keepalive555.github.io/tags/MySQL/"/>
<category term="分页" scheme="https://keepalive555.github.io/tags/%E5%88%86%E9%A1%B5/"/>
</entry>
<entry>
<title>Vim实用技巧</title>
<link href="https://keepalive555.github.io/2017/03/20/Vim%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"/>
<id>https://keepalive555.github.io/2017/03/20/Vim实用技巧/</id>
<published>2017-03-20T02:06:09.000Z</published>
<updated>2017-04-20T17:47:29.000Z</updated>
<content type="html"><![CDATA[<h2><span id="介绍">介绍</span></h2><p><a href="https://keepalive555.github.io/2017/03/20/Vim%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p><p>本人是一个vim党,平时的开发工作都是在vim下完成的,但是…vim的官方手册就达2000多页,所以本人将平时遇到的一些比较有趣的插件与配置记录在配置文件中,一是方便他人借阅,分享一些实用的技巧,另一方面也是自我的慢慢积累。</p><h2><span id="实用技巧">实用技巧</span></h2><h3><span id="1显示代码中的tab字符">1.显示代码中的Tab字符</span></h3><blockquote><p>Python2.7版本,代码缩进是不允许空格与制表符Tab混合使用的,否则会抛出IndentationError异常,这困扰过很多初学者,明明看着缩进都对,但是就是报错。</p></blockquote><ul><li>使用正则表达式替换</li><li>开启list选项,查看Tab字符与空格,:set list。</li></ul><figure class="highlight vim"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">set</span> <span class="keyword">lcs</span>=<span class="keyword">ta</span><span class="variable">b:</span>>-,trai<span class="variable">l:</span>-</span><br><span class="line"><span class="keyword">map</span> <span class="symbol"><silent></span> <span class="symbol"><C-l></span> :<span class="keyword">let</span> &<span class="keyword">list</span>=&<span class="keyword">list</span>==<span class="number">1</span>?<span class="number">0</span>:<span class="number">1</span><span class="symbol"><CR></span> <span class="comment">"也可通过:set list显示</span></span><br></pre></td></tr></table></figure><h3><span id="2在代码中插入表达式值">2.在代码中插入表达式值</span></h3><blockquote><p>在开发工作中,经常需要输入一些常量值,因为数值比较大,不太容易计算的原因,很多同学直接输入了常量表达式。比如,存在时间戳t = int(time.time()),现在需要获取该时间戳24小时前的时间戳,可以这样写:</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">_t = int(time.time()) - <span class="number">24</span> * <span class="number">3600</span></span><br></pre></td></tr></table></figure><blockquote><p>对于Python这种解释性语言来说,这种开销就落到了运行时,所以在vim中插入表达式值这个技巧显得比较实用。<br>在插入模式下(注意是插入模式),按下<c-r>,然后键入=,=后面输入需要计算的表达式,摁下Enter便可以计算出该表达式的值。</c-r></p></blockquote><h2><span id="3未配置插件注释代码块">3.未配置插件,注释代码块</span></h2><blockquote><p>有时候需要在服务器上修改一些配置,或者脚本,而服务器的vim未配置,没有代码注释插件,该怎么达到目的呢?这时,可以按下<c-v>进入列选择模式,选中需要注释代码的前几列,可通过o或O调整选择块的范围,然后按下I,输入注释符,Python中为’#’,然后按下<esc>,便可大功告成(注意:按下A则在块后面添加)。</esc></c-v></p></blockquote><h2><span id="4快速关闭文件">4.快速关闭文件</span></h2><blockquote><p>大多数同学退出vim,都是在命令行下输入,:wq,退出,因为键位的缘故,个人感觉这样不是很方便,好在vim提供了更加便捷的方式。<br>关闭文件时,只需在普通模式下,输入ZZ,(即左侧shift+z,z按下两次),效果等同于:x命令,文件内容有改动,保存修改并退出,更改文件修改时间,如果文件未有改动则退出,不更改文件修改时间。</p></blockquote><blockquote><p>想要丢弃对文件的更改,大多数同学都是用,:q!,强制退出的。在vim中也可以在普通模式下输入ZQ,强制退出。</p></blockquote><h2><span id="5强制保存文件">5.强制保存文件</span></h2><blockquote><p>注意:此用法需要当前用户,在sudoer文件中授权。</p></blockquote><blockquote><p>在工作中经常遇到,root用户的vim未进行任何配置,而用普通用户去编辑的时候,保存却没有权限的尴尬。下面的vim命令可以,调用sudo进行修改保存,非常实用。</p></blockquote><figure class="highlight vim"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">:<span class="keyword">w</span> !sudo tee %</span><br></pre></td></tr></table></figure><h2><span id="6快速选择代码段落">6.快速选择代码段落</span></h2><blockquote><p>常见选择方式:v - 选择字符,V - 行选择,<c-v> - 列选择。</c-v></p></blockquote><blockquote><p>大多数同学,进行多行代码选择时会用V开启行选择,然后通过j,k,o选择,代码块,或者是g跳转等,这样子虽然速度不慢,但是还是不够优雅。</p></blockquote><blockquote><p>vim中有个段落的概念(paragraphs),大家可以通过:tab help paragraphs,查看paragraphs的描述,或者该<a href="http://blog.carbonfive.com/2011/10/17/vim-text-objects-the-definitive-guide/">博客</a><br>在开发工作中,我习惯将不同小功能,用空格划分为不同段落(Python开发中,这也是Pep8建议的),然后用段落移动命令,在段落间移动,用vip命令选择整个段落,很方便。</p></blockquote><table><thead><tr><th>命令</th><th>效果</th></tr></thead><tbody><tr><td>]]</td><td>foo</td></tr><tr><td>[[</td><td>bar</td></tr><tr><td>)</td><td>向前移动一条一语句(forward)</td></tr><tr><td>(</td><td>向后移动一条语句(backward)</td></tr><tr><td>{</td><td>向前移动至下一段落</td></tr><tr><td>}</td><td>向后移动至下一段落</td></tr><tr><td>vap</td><td>选择一个段落</td></tr><tr><td>vip</td><td>选择一个段落</td></tr></tbody></table><h2><span id="7设置vim分词">7.设置vim分词</span></h2><blockquote><p>在vim中,用w,b,e,ge,等可以在单词之前移动,这种移动方式,大大方便了我们的编程,但是也有一些不尽如人意的地方,比如,下面的函数foo,有a,b,c三个参数,想要把光标从a移动至b,则需要输入2w,(vim将逗号,作为一个单词),这样子不是很方便(用习惯的同学除外),vim有个iskeyword选项,定义了“单词”的组成字符,可以将“,”,“.”,做为单词的一部分,下次移动的时候直接按下w便可从参数a,移动至参数b。(个人习惯,不喜勿喷^_^)</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">foo</span>(<span class="params">a, b, c</span>):</span></span><br><span class="line"><span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Demo</span>(<span class="params">object</span>):</span></span><br><span class="line"><span class="keyword">pass</span></span><br><span class="line">d = Demo()</span><br><span class="line">d.__name__</span><br></pre></td></tr></table></figure><figure class="highlight vim"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">:<span class="keyword">set</span> iskeyword+=\,</span><br><span class="line">:<span class="keyword">set</span> iskeyword+=\.</span><br></pre></td></tr></table></figure><p>或</p><figure class="highlight vim"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">:<span class="keyword">set</span> iskeyword+=\,\.</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h2><span id="介绍">介绍</span></h2><p><a href="https://keepalive555.github.io/2017/03/20/Vim%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"><em>转载说明:本文章
</summary>
<category term="Vim" scheme="https://keepalive555.github.io/categories/Vim/"/>
<category term="Vim" scheme="https://keepalive555.github.io/tags/Vim/"/>
</entry>
<entry>
<title>Shell实用技巧</title>
<link href="https://keepalive555.github.io/2017/03/19/Shell%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"/>
<id>https://keepalive555.github.io/2017/03/19/Shell实用技巧/</id>
<published>2017-03-19T15:55:16.000Z</published>
<updated>2017-04-22T16:08:53.000Z</updated>
<content type="html"><![CDATA[<p><a href="https://keepalive555.github.io/2017/03/19/Shell%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p><h3><span id="1获取当前shell脚本文件所在目录">1.获取当前Shell脚本文件所在目录</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">PREFIX=$(<span class="built_in">cd</span> `dirname <span class="variable">$0</span>`; <span class="built_in">pwd</span>)</span><br></pre></td></tr></table></figure><h3><span id="2ssh开启密钥登录">2.SSH开启密钥登录</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh user@host <span class="string">'mkdir -p .ssh && cat >> .ssh/authorized_keys'</span> < ~/.ssh/id_rsa.pub</span><br></pre></td></tr></table></figure><h3><span id="3ssh端口转发">3.SSH端口转发</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nohup /usr/bin/ssh -N -L 0.0.0.0:8000:10.10.10.10:12000 sa.demo.com &</span><br></pre></td></tr></table></figure><h3><span id="4ios证书转换">4.iOS证书转换</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">openssl pkcs12 -<span class="keyword">in</span> demo.p12 -out demo.pem -nodes</span><br></pre></td></tr></table></figure><h3><span id="5统计代码行数">5.统计代码行数</span></h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ls *.py|xargs -n 1 wc -l|awk <span class="string">'BEGIN{total=0}{$total+=$1}END{print "total lines:\t"$total}'</span></span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<p><a href="https://keepalive555.github.io/2017/03/19/Shell%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/"><em>转载说明:本文章为作者原创,转载请标注原文地址</em></a></p>
<
</summary>
<category term="Shell" scheme="https://keepalive555.github.io/categories/Shell/"/>
<category term="Shell" scheme="https://keepalive555.github.io/tags/Shell/"/>
</entry>
<entry>
<title>Kafka常见操作</title>
<link href="https://keepalive555.github.io/2017/01/05/Kafka%E5%B8%B8%E8%A7%81%E6%93%8D%E4%BD%9C/"/>
<id>https://keepalive555.github.io/2017/01/05/Kafka常见操作/</id>
<published>2017-01-04T16:35:26.000Z</published>
<updated>2018-01-04T16:44:52.000Z</updated>
<content type="html"><![CDATA[<h2><span id="kafka常见操作命令行集锦"><code>Kafka</code>常见操作命令行集锦</span></h2><p><strong><em>导出变量定义:</em></strong> <code>$ZK_HOSTS</code>定义在<code>~/.bashrc</code>中的导出变量,为<code>Zookeeper</code>集群地址</p><blockquote><p>查看Kafka Cluster中的topic</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --list --zookeeper <span class="variable">$ZK_HOSTS</span></span><br></pre></td></tr></table></figure><blockquote><p>查看Kafka某个topic的信息</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --describe --topic benchmark --zookeeper <span class="variable">$ZK_HOSTS</span></span><br></pre></td></tr></table></figure><blockquote><p>新建一个1个partition的topic readbench</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --create --topic readbench --partitions 1 --replication-factor 1 --zookeeper <span class="variable">$ZK_HOSTS</span>```</span><br><span class="line"></span><br><span class="line">> 新建一个1分区,复制因子为1的topic</span><br><span class="line"></span><br><span class="line">```bash</span><br><span class="line">./bin/kafka-topics.sh --create --topic name --partitions 1 --replication-factor 1 --zookeeper <span class="variable">$ZK_HOSTS</span></span><br></pre></td></tr></table></figure><blockquote><p>新建一个5分区,复制因子为1的topic</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --create --topic name --partitions 5 --replication-factor 1 --zookeeper <span class="variable">$ZK_HOSTS</span></span><br></pre></td></tr></table></figure><blockquote><p>以group_name为消费组消费名称为name的topic中的数据</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-console-consumer.sh --topic name --partition 0 --consumer-property group.id=group_name --from-beginning --bootstrap-server 10.95.134.86:9092,10.95.134.86:9093,10.95.134.86:9094</span><br></pre></td></tr></table></figure><blockquote><p>往topic name中生产数据</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">bin/kafka-console-producer.sh --broker-list 10.95.134.86:9092,10.95.134.86:9093,10.95.134.86:9094 --sync --topic name</span><br></pre></td></tr></table></figure><blockquote><p>物理删除kafka中的一个topic(kafka默认是逻辑删除)</p></blockquote><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --delete --topic name --zookeeper $ZK_HOSTS</span><br></pre></td></tr></table></figure><ul><li>删除掉<code>Zookeeper</code>集群中<code>name</code>标识的<code>topic</code></li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/zkCli.sh rmr /brokers/topics/name</span><br></pre></td></tr></table></figure><blockquote><p><code>Kafka</code>中为topic增加partition(不能够删除partition,仅支持增加)</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --alter --topic name --zookeeper <span class="variable">$ZK_HOSTS</span> --partitions 3</span><br></pre></td></tr></table></figure><blockquote><p> Kafka中为topic更新配置参数</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-topics.sh --alter --topic readbench --zookeeper <span class="variable">$ZK_HOSTS</span> --config cleanup.policy=delete</span><br></pre></td></tr></table></figure><blockquote><p>Kafka集群中添加新的broker时,需要将一些topic的存储压力分散到新的broker上去,这时需要kafka reassign工具,分三步操作</p></blockquote><ul><li>1、指定待迁移的topic,以json格式存放在topic-to-move.json文件中。</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">cat >> topics-to-move.json << EOF</span><br><span class="line">{</span><br><span class="line"> <span class="string">"topics"</span>: [</span><br><span class="line"> {<span class="string">"topic"</span>: <span class="string">"name1"</span>},</span><br><span class="line"> {<span class="string">"topic"</span>: <span class="string">"name2"</span>},</span><br><span class="line"> ],</span><br><span class="line"> <span class="string">"version"</span>: 1</span><br><span class="line">}</span><br><span class="line">EOF</span><br></pre></td></tr></table></figure><ul><li>2、生成移动脚本</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-reassign-partitions.sh --zookeeper <span class="variable">$ZK_HOSTS</span> --topics-to-move-json-file topic-to-move.json --broker-list <span class="string">"1,2"</span> --generate</span><br></pre></td></tr></table></figure><p>将输出信息中的Proposed partition reassignment configuration一栏下的json字符串保存为reassign.json文件。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="string">'{"version":1,"partitions":[{"topic":"readbench","partition":0,"replicas":[3]}]}'</span> >> reassign.json</span><br></pre></td></tr></table></figure><ul><li>3、执行迁移脚本</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./bin/kafka-reassign-partitions.sh --zookeeper <span class="variable">$ZK_HOSTS</span> --reassignment-json-file reassign.json --execute</span><br></pre></td></tr></table></figure><ul><li>4、查看迁移进度</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/bin/kafka-reassign-partitions.sh --zookeeper <span class="variable">$ZK_HOSTS</span> --reassignment-json-file reassign.json --verify</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<h2><span id="kafka常见操作命令行集锦"><code>Kafka</code>常见操作命令行集锦</span></h2><p><strong><em>导出变量定义:</em></strong> <code>$ZK_HOSTS</code>定义在<code>~/.
</summary>
</entry>
</feed>