Linux运维6 min read

Ansible 自动化运维入门与实践

为什么要自动化

手动运维 100 台服务器是灾难,自动化运维 1000 台服务器是常态。

Ansible 是自动化运维领域的瑞士军刀:无 Agent、基于 SSH、YAML 语法,学习曲线平缓但能力强大。

核心概念速览

概念 说明
Inventory 主机清单,定义管理哪些服务器
Module 原子操作单元(如 copy, service, yum)
Playbook YAML 格式的任务编排剧本
Role 可复用的 Playbook 组织单元
Facts 自动采集的主机信息(CPU/内存/OS等)

快速安装与配置

# 安装 Ansible
pip install ansible

# 验证
ansible --version

# 基本配置
cat > ~/.ansible.cfg << 'EOF'
[defaults]
host_key_checking = False
inventory = ./hosts
forks = 20
timeout = 10
gathering = smart
EOF

Inventory 主机清单

# hosts — 静态清单
[webservers]
web01 ansible_host=10.0.1.11 ansible_user=root
web02 ansible_host=10.0.1.12 ansible_user=root
web03 ansible_host=10.0.1.13 ansible_user=root

[dbservers]
db01 ansible_host=10.0.2.11 ansible_user=root
db02 ansible_host=10.0.2.12 ansible_user=root

[prod:children]
webservers
dbservers

[prod:vars]
ansible_python_interpreter=/usr/bin/python3
monitoring_enabled=true

动态 Inventory(生产必备)

# inventory/aliyun.yml — 从阿里云 API 动态获取主机
plugin: alibaba.alicloud.ali
regions:
  - cn-hangzhou
filters:
  vpc_id: vpc-xxx
keyed_groups:
  - key: tags.Role
    prefix: role

Ad-Hoc 命令(即兴操作)

# 所有 web 服务器执行 uptime
ansible webservers -m command -a "uptime"

# 并行执行(-f 指定并发数)
ansible webservers -f 20 -m shell -a "free -m"

# 复制文件到所有主机
ansible prod -m copy -a "src=/etc/nginx/nginx.conf dest=/etc/nginx/nginx.conf backup=yes"

# 重启服务
ansible webservers -m systemd -a "name=nginx state=restarted"

# 查看收集的 Facts
ansible web01 -m setup | less

# 只显示特定 facts
ansible web01 -m setup -a "filter=ansible_memory_mb"

Playbook 实战

基础结构

---
# playbooks/nginx-setup.yml
- name: 安装和配置 Nginx
  hosts: webservers
  become: yes
  vars:
    nginx_port: 80
    nginx_worker_processes: auto

  tasks:
    - name: 安装 Nginx
      yum:
        name: nginx
        state: latest

    - name: 部署配置模板
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        backup: yes
      notify: restart nginx

    - name: 确保服务运行
      systemd:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: restart nginx
      systemd:
        name: nginx
        state: restarted

使用 Jinja2 模板

# templates/nginx.conf.j2
user nginx;
worker_processes {{ nginx_worker_processes }};
error_log /var/log/nginx/error.log;

events {
    worker_connections {{ ansible_processor_vcpus * 1024 }};
}

http {
    server {
        listen {{ nginx_port }};
        server_name {{ inventory_hostname }};

        location / {
            root /usr/share/nginx/html;
        }

        # 条件渲染
{% if monitoring_enabled | default(false) %}
        location /nginx_status {
            stub_status on;
            allow 127.0.0.1;
            deny all;
        }
{% endif %}
    }
}

批量更新:零停机滚动部署

---
# playbooks/rolling-update.yml
- name: Nginx 滚动更新
  hosts: webservers
  serial: 1                     # 每次只操作 1 台
  become: yes
  vars:
    deploy_version: "v2.1.0"

  pre_tasks:
    - name: 摘除负载均衡
      haproxy:
        state: disabled
        host: "{{ inventory_hostname }}"
        socket: /var/run/haproxy.sock
        backend: web_backend
      delegate_to: "{{ item }}"
      with_items: "{{ groups['loadbalancers'] }}"

  tasks:
    - name: 部署新版本
      copy:
        src: "/data/builds/{{ deploy_version }}/"
        dest: /usr/share/nginx/html/
      notify: reload nginx

    - name: 健康检查
      uri:
        url: "http://{{ inventory_hostname }}/health"
        status_code: 200
      register: health_result
      retries: 10
      delay: 3
      until: health_result.status == 200

  post_tasks:
    - name: 恢复负载均衡
      haproxy:
        state: enabled
        host: "{{ inventory_hostname }}"
        socket: /var/run/haproxy.sock
        backend: web_backend
      delegate_to: "{{ item }}"
      with_items: "{{ groups['loadbalancers'] }}"

  handlers:
    - name: reload nginx
      systemd:
        name: nginx
        state: reloaded

Role 目录结构

roles/
└── common/
    ├── tasks/
    │   └── main.yml          # 入口任务
    ├── handlers/
    │   └── main.yml          # 处理器
    ├── templates/
    │   └── sysctl.conf.j2    # Jinja2 模板
    ├── files/
    │   └── rpm-gpg-keys/     # 静态文件
    ├── vars/
    │   └── main.yml          # 变量(高优先级)
    ├── defaults/
    │   └── main.yml          # 默认变量(低优先级)
    └── meta/
        └── main.yml          # 依赖和元信息

常用模块速查

# 包管理
- yum: name=nginx state=latest         # RHEL/CentOS
- apt: name=nginx state=latest         # Debian/Ubuntu
- pip: name=ansible state=latest       # Python 包

# 文件操作
- copy: src=/local/file dest=/remote/file backup=yes
- template: src=config.j2 dest=/etc/app.conf
- lineinfile: path=/etc/hosts line="10.0.1.11 db01"
- blockinfile: path=/etc/ssh/sshd_config block="{{ lookup('file', 'sshd_block') }}"

# 命令执行
- command: uptime                     # 不经过 shell,更安全
- shell: "ps aux | grep nginx"       # 经过 shell,支持管道
- script: /local/scripts/setup.sh    # 先上传再执行

# 系统管理
- user: name=deploy groups=wheel shell=/bin/bash
- group: name=deploy state=present
- cron: name="backup" minute=0 hour=2 job="/opt/backup.sh"
- systemd: name=nginx state=started enabled=yes

# 文件属性
- file: path=/opt/app state=directory owner=deploy mode=0755
- stat: path=/etc/nginx/nginx.conf
  register: nginx_conf

# 条件执行
- debug: msg="需要升级"
  when: ansible_memory_mb.real.total < 2048

# 循环
- user: name="{{ item }}" state=present
  loop:
    - alice
    - bob
    - charlie

生产环境最佳实践

# 1. 加密敏感信息
ansible-vault encrypt secrets.yml
ansible-vault edit secrets.yml
ansible-playbook playbook.yml --ask-vault-pass

# 2. 先检查再执行(Dry Run)
ansible-playbook playbook.yml --check --diff

# 3. 限制执行范围
ansible-playbook playbook.yml --limit web01

# 4. 跳过标签
ansible-playbook playbook.yml --skip-tags "restart,reboot"

# 5. 逐步执行
ansible-playbook playbook.yml --step

# 6. 查看输出(不截断)
ANSIBLE_STDOUT_CALLBACK=yaml ansible-playbook playbook.yml

自动化哲学:任何需要重复做两次以上的操作,都值得写成 Playbook。不要信任手动操作——人会犯错,Playbook 不会。

分享:

相关文章