Consul是由HashiCorp开发的服务发现和配置管理框架,这公司还有一个耳熟能详的工具:Vargrant,是不是想说,“哦,原来是这个公司。”
本文是 consul getting started学习笔记。
Consul主要有以下几个组件。
- 服务发现(Service Discovery):服务提供者可以注册自己到Consul,服务使用者可以通过Consul查询到提供者,有http和dns两种方式。
- 健康检查(Health Checking):Consul客户端提供了几中不同的健康检查方案,比如:服务本身是不是500了?;服务所在机器的内存使用率是不是超过90%了,等等,这些信息提供给运维监控整个集群的健康状态,也可以被服务发现组件使用一次来将流量导入到健康的服务中。
- KV存储(Key/Value Store):应用可以使用Consul的分层kv存储干任何事情,比如:动态配置,特征标记,协调,leader选举等。KV存储的API是基于http的。
- 多数据中心(Multi Datacenter):Consul创造性的支持多数据中心,Consul的使用者无需多做任何工作。
Consul是分布式,高可用系统,基本架构如下:
- Consul agent:所有提供服务的节点都需要运行Consul agent(使用和查询服务的不需要),这个Agent主要负责服务本身和这个服务所在节点的
健康检查
。 - Consul servers:负责数据的存储和复制,servers之间选举出一个leader,虽然consul一个也可以工作,但是一般需要提供3至5个servers组成的servers集群。
查询服务的时候可以通过Agent也可以通过Server,agent会将查询路由给server然后返回结果。
Mac Os X
$ brew cask install consul
如果你没有cast插件,这样安装:
$ brew install caskroom/cask/brew-cask
验证是否安装成功
➜ consul
usage: consul [--version] [--help] <command> [<args>]
Available commands are:
agent Runs a Consul agent
configtest Validate config file
event Fire a new event
exec Executes a command on Consul nodes
force-leave Forces a member of the cluster to enter the "left" state
info Provides debugging information for operators
join Tell Consul agent to join cluster
keygen Generates a new encryption key
keyring Manages gossip layer encryption keys
leave Gracefully leaves the Consul cluster and shuts down
lock Execute a command holding a lock
maint Controls node or service maintenance mode
members Lists the members of a Consul cluster
monitor Stream logs from a Consul agent
reload Triggers the agent to reload configuration files
rtt Estimates network round trip time between nodes
version Prints the Consul version
watch Watch for changes in Consul
每一个consul都可以运行在客户端(client)和服务器(server)模式,生成环境建议部署3-5台consul服务器,否则数据丢失是不可恢复的。agent 客户端模式则非常轻量级,主要用来:注册服务,健康检查,路由查询到服务器。Agent必须在集群的每一台机器上部署。
可以用dev模式快速启动一个consul,主要用做本地测试,千万不要用于生产环境。
$ consul agent -dev
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'zhuoyikangdeMBP.lan'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 192.168.199.171 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2016/07/25 21:38:02 [INFO] raft: Node at 192.168.199.171:8300 [Follower] entering Follower state
2016/07/25 21:38:02 [INFO] serf: EventMemberJoin: zhuoyikangdeMBP.lan 192.168.199.171
2016/07/25 21:38:02 [INFO] serf: EventMemberJoin: zhuoyikangdeMBP.lan.dc1 192.168.199.171
2016/07/25 21:38:02 [INFO] consul: adding LAN server zhuoyikangdeMBP.lan (Addr: 192.168.199.171:8300) (DC: dc1)
2016/07/25 21:38:02 [INFO] consul: adding WAN server zhuoyikangdeMBP.lan.dc1 (Addr: 192.168.199.171:8300) (DC: dc1)
2016/07/25 21:38:02 [ERR] agent: failed to sync remote state: No cluster leader
2016/07/25 21:38:03 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/25 21:38:03 [INFO] raft: Node at 192.168.199.171:8300 [Candidate] entering Candidate state
2016/07/25 21:38:03 [DEBUG] raft: Votes needed: 1
2016/07/25 21:38:03 [DEBUG] raft: Vote granted from 192.168.199.171:8300. Tally: 1
2016/07/25 21:38:03 [INFO] raft: Election won. Tally: 1
2016/07/25 21:38:03 [INFO] raft: Node at 192.168.199.171:8300 [Leader] entering Leader state
2016/07/25 21:38:03 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/07/25 21:38:03 [DEBUG] raft: Node 192.168.199.171:8300 updated peer set (2): [192.168.199.171:8300]
2016/07/25 21:38:03 [INFO] consul: cluster leadership acquired
2016/07/25 21:38:03 [INFO] consul: New leader elected: zhuoyikangdeMBP.lan
2016/07/25 21:38:03 [DEBUG] consul: reset tombstone GC to index 2
2016/07/25 21:38:03 [INFO] consul: member 'zhuoyikangdeMBP.lan' joined, marking health alive
2016/07/25 21:38:04 [INFO] agent: Synced service 'consul'
==> Failed to check for updates: Get https://checkpoint-api.hashicorp.com/v1/check/consul?arch=amd64&os=darwin&signature=&version=0.6.4: dial tcp: lookup checkpoint-api.hashicorp.com on 192.168.199.1:53: read udp 192.168.199.171:65100->192.168.199.1:53: i/o timeout
从日志可以看出两件事:
- consul运行在服务器模式。
- consul把自己选择为leader节点,进入leader state.
集群成员查看
✗ consul members
Node Address Status Type Build Protocol DC
zhuoyikangdeMBP.lan 192.168.199.171:8301 alive server 0.6.4 2 dc1
还可以加上-detailed
获取更详细的信息,
使用consl memebers是基于gossip协议
,并且是最终一致
的,也就是说输出的结果可能和实际的网络环境并不一样,如果你需要强一致性
的结果,用http接口:
curl localhost:8500/v1/catalog/nodes
[{"Node":"zhuoyikangdeMBP.lan","Address":"192.168.199.171","TaggedAddresses":{"wan":"192.168.199.171"},"CreateIndex":3,"ModifyIndex":4}]
另外consul还可以使用dns
接口,注意你必须确定把你的dns服务器指向为consul agent的dns服务,端口默认为8600
# dns的玩意儿没怎么看懂
#
dig @127.0.0.1 -p 8600 zhuoyikangdeMBP.lan
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 zhuoyikangdeMBP.lan
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 45459
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;zhuoyikangdeMBP.lan. IN A
;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 21:48:01 2016
;; MSG SIZE rcvd: 37
安全停止
Ctrl-C
可以让Consul平稳的退出,退出的时候该结点会通知集群中的其他节点它离开了。如果你粗暴的kill掉consul agent,集群中的其他节点会检查到该节点fail了。
- 当一个节点平稳退出:从它登记的服务和安全检查将从目录中移除。
- 当一个节点fail:它的健康状态被标记为
critial
,但是并不从目录中移除。consul会自动尝试重连fail的的节点,并且允许其恢复。
另外,当一个节点是服务节点时,平稳退出
将可以有效的避免潜在问题 CONSENSUS PROTOCOL。
安全运维请查看:CONSUL GUIDES/Adding/Removing Servers -。
服务可以通过两种形式注册:
- 提供一个
服务定义
。 - 通过HTTP APi。
实际上服务定义这种方式比较常用,操作如下:
# 新建一个服务定义文件夹
sudo mkdir /etc/consul.d
# 新建一个服务
echo '{"service": {"name": "web", "tags": ["rails"], "port": 80}}' \
>/etc/consul.d/web.json
# 启动
consul agent -dev -config-dir /etc/consul.d
# 从日志可以看出同步启动了一个web服务。
....
2016/07/25 22:09:14 [INFO] agent: Synced service 'web'
...
dns 方法
$ dig @127.0.0.1 -p 8600 web.service.consul
...
;; QUESTION SECTION:
;web.service.consul. IN A
;; ANSWER SECTION:
web.service.consul. 0 IN A 172.20.20.11
可以看出找到一条A
记录,可以通过SRV
获取到主机信息
dig @127.0.0.1 -p 8600 web.service.consul SRV
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 web.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19045
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;web.service.consul. IN SRV
;; ANSWER SECTION:
web.service.consul. 0 IN SRV 1 1 80 209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul.
;; ADDITIONAL SECTION:
209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul. 0 IN A 192.168.199.171
;; Query time: 2 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 22:12:40 2016
;; MSG SIZE rcvd: 200
可以看出输出了该服务运行在209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul.
节点。
最后,可以用tags来过滤记录:TAG.NAME.service.consul
dig @127.0.0.1 -p 8600 rails.web.service.consul
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 rails.web.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41235
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;rails.web.service.consul. IN A
;; ANSWER SECTION:
rails.web.service.consul. 0 IN A 192.168.199.171
;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 22:15:06 2016
;; MSG SIZE rcvd: 82
可以用http api查询。
curl http://localhost:8500/v1/catalog/service/web
[{"Node":"209-160-124-64.fwd.paradisenetworks.net","Address":"192.168.199.171","ServiceID":"web","ServiceName":"web","ServiceTags":["rails"],"ServiceAddress":"","ServicePort":80,"ServiceEnableTagOverride":false,"CreateIndex":5,"ModifyIndex":5}]%
查询其健康状况:
curl 'http://localhost:8500/v1/health/service/web?passing'
[{"Node":{"Node":"209-160-124-64.fwd.paradisenetworks.net","Address":"192.168.199.171","TaggedAddresses":{"wan":"192.168.199.171"},"CreateIndex":3,"ModifyIndex":5},"Service":{"ID":"web","Service":"web","Tags":["rails"],"Address":"","Port":80,"EnableTagOverride":false,"CreateIndex":5,"ModifyIndex":5},"Checks":[{"Node":"209-160-124-64.fwd.paradisenetworks.net","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","CreateIndex":3,"ModifyIndex":3}]}]%
修改配置文件后发送SIGHUP
消息给Agent,Agent会自动更新配置。
本章将通过Vagrant说明下consul集群的搭建。
首先你要去下载Vagrant配置。
启动两个节点:
➜ vagrant up
.......
......
➜ vagrant status
Current machine states:
n1 running (virtualbox)
n2 running (virtualbox)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
➜ vagrant
启动成功后,ssh登录到节点1。
vagrant ssh n1
上文中使用-dev
用于测试,这个参数在集群环境中直接干掉。
- 任何consul集群中的都必须有个名字,默认使用主机名,可以使用
-node command-line option
覆盖。 - -bootstrap-expect 这个参数用来提示consul需要加入的剩余节点个数,直到所有的节点都加入后,consul才会开始进行某些集群工作。
- -bind address:指定监听地址,如果不指定,consul会默认监听系统中的第一个私有IP,但是你最好指定一个。
vagrant@n1:~$ consul agent -server -bootstrap-expect 1 \
-data-dir /tmp/consul -node=agent-one -bind=172.20.20.10 \
-config-dir /etc/consul.d
...
然后
vagrant ssh n2
consul agent -data-dir /tmp/consul -node=agent-two \
-bind=172.20.20.11 -config-dir /etc/consul.d
但是这个时候两个节点彼此不认识,必须把它们连在一起:
$ vagrant ssh n1
...
vagrant@n1:~$ consul join 172.20.20.11
Successfully joined cluster by contacting 1 nodes.
vagrant@n2:~$ consul members
Node Address Status Type Build Protocol
agent-two 172.20.20.11:8301 alive client 0.5.0 2
agent-one 172.20.20.10:8301 alive server 0.5.0 2
To join a cluster, a Consul agent only needs to learn about one existing member. After joining the cluster, the agents gossip with each other to propagate full membership information.
上面是手动加入,比较麻烦,使用Atlas by HashiCorp可以自动加入:
$ consul agent -atlas-join \
-atlas=ATLAS_USERNAME/infrastructure \
-atlas-token="YOUR_ATLAS_TOKEN"
你需要创建一个atlas用户,并且替换上面的参数。
可以像查询服务一样查询节点:
vagrant@n1:~$ dig @127.0.0.1 -p 8600 agent-two.node.consul
...
;; QUESTION SECTION:
;agent-two.node.consul. IN A
;; ANSWER SECTION:
agent-two.node.consul. 0 IN A 172.20.20.11
两种方式,定义文件和http.
vagrant@n2:~$ echo '{"check": {"name": "ping",
"script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' \
>/etc/consul.d/ping.json
vagrant@n2:~$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
"check": {"script": "curl localhost >/dev/null 2>&1", "interval": "10s"}}}' \
>/etc/consul.d/web.json
vagrant@n1:~$ curl http://localhost:8500/v1/health/state/critical
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
Consul还提供了简单的Key/Value存储,这个功能可以用来存储动态配置,帮助服务协调,建立leader选举等。
本地需要先启动一个agent哟。
consul agent -dev -config-dir /etc/consul.d -bind 127.0.0.1
curl -v http://127.0.0.1:8500/v1/kv/\?recurse
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8500 (#0)
> GET /v1/kv/?recurse HTTP/1.1
> Host: 127.0.0.1:8500
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< X-Consul-Index: 1
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Tue, 26 Jul 2016 01:06:35 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
Put 一个数据
$ curl -X PUT -d 'test' http://127.0.0.1:8500/v1/kv/web/key1
true
$ curl -X PUT -d 'test' http://127.0.0.1:8500/v1/kv/web/key2?flags=42
true
$ curl -X PUT -d 'test' http://127.0.0.1:8500/v1/kv/web/sub/key3
true
$ curl http://127.0.0.1:8500/v1/kv/?recurse
[{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags":0,"Value":"dGVzdA=="},
{"CreateIndex":98,"ModifyIndex":98,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="},
{"CreateIndex":99,"ModifyIndex":99,"Key":"web/sub/key3","Flags":0,"Value":"dGVzdA=="}]
上面建立了3个Key,所有的数据都是test,注意数据是Base64编码的。可以通过Get来获取。
curl http://127.0.0.1:8500/v1/kv/web/key1
[{"LockIndex":0,"Key":"web/key1","Flags":0,"Value":"dGVzdA==","CreateIndex":8,"ModifyIndex":8}]%
也可以通过Delete删除
$ curl -X DELETE http://127.0.0.1:8500/v1/kv/web/sub?recurse
$ curl http://127.0.0.1:8500/v1/kv/web?recurse
[{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags":0,"Value":"dGVzdA=="},
{"CreateIndex":98,"ModifyIndex":98,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="}]
还可以通过Put修改:
$ curl -X PUT -d 'newval' http://localhost:8500/v1/kv/web/key1?cas=8
true
$ curl -X PUT -d 'newval' http://localhost:8500/v1/kv/web/key1?cas=8
false
?cas=
表示使用consul的Check-And-Set操作,这个操作是原子的。cas的值为这条数据的ModifyIndex
。
第1条操作更新成功,因为ModifyIndex
为8,第2条操作更新失败,因为ModifyIndex
已经不是8了。
也可以通过index
参数指定到ModifyIndex
大于某个值为止:
$ curl "http://localhost:8500/v1/kv/web/key2?index=101&wait=5s"
[{"CreateIndex":98,"ModifyIndex":101,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="}]
上面的数据会等待ModifyIndex
大于101,最多等待5s,这对于等待Key修改非常有用。
Consul提供漂亮的UI,可以用来观察所有的服务和节点,UI也自动支持'多数据中心'。
有两种方式使用UI:
启动时候要指定Atls相关参数
consul agent -atlas=ATLAS_USERNAME/demo -atlas-token="ATLAS_TOKEN"
consul agent -dev -config-dir /etc/consul.d -bind 127.0.0.1 -ui