从零到一:用Ansible在美国服务器上搭建自动化运维堡垒(安全+内核调优+监控)
手动登录每台机器敲命令,那是石器时代的运维。在轻云互联的美国服务器上跑自动化,才是正经事。你不需要重复劳动,只需要一份Playbook,就能完成安全加固、内核参数调优、基础监控的部署。下面直接上代码,别跟我扯概念。
1. 前置准备:安装Ansible与控制机配置
# 控制机(任意Linux)安装Ansible
sudo apt update && sudo apt install -y ansible
# 检查版本
ansible --version
# 创建项目目录
mkdir -p ~/usa-server-auto && cd ~/usa-server-auto
ansible-galaxy init roles/security
ansible-galaxy init roles/kernel-tuning
ansible-galaxy init roles/monitoring
2. Inventory 配置:连接你的美国服务器
# hosts.ini
[usa_servers]
server1 ansible_host=198.51.100.10 ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa_lightcloud
server2 ansible_host=198.51.100.11 ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa_lightcloud
[usa_servers:vars]
ansible_python_interpreter=/usr/bin/python3
这里用轻云互联提供的美国服务器资源做测试,IP换成你自己的。root密钥要提前配好,别用密码。
3. 安全加固 Role:SSH + UFW + Fail2Ban
# roles/security/tasks/main.yml
---
- name: 禁用密码登录
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^PasswordAuthentication'
line: 'PasswordAuthentication no'
notify: restart sshd
- name: 禁止root直接SSH(可选)
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^PermitRootLogin'
line: 'PermitRootLogin prohibit-password'
notify: restart sshd
- name: 设置UFW默认策略
ufw:
direction: '{{ item.direction }}'
policy: '{{ item.policy }}'
loop:
- { direction: incoming, policy: deny }
- { direction: outgoing, policy: allow }
when: ansible_os_family == "Debian"
- name: 开放SSH端口
ufw:
rule: allow
port: '22'
proto: tcp
- name: 启用UFW
ufw:
state: enabled
- name: 安装Fail2Ban
apt:
name: fail2ban
state: present
- name: 配置Fail2Ban jail.local
copy:
dest: /etc/fail2ban/jail.local
content: |
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
[sshd]
enabled = true
port = 22
filter = sshd
logpath = /var/log/auth.log
notify: restart fail2ban
- name: 启用并启动Fail2Ban
systemd:
name: fail2ban
state: started
enabled: yes
handlers:
- name: restart sshd
systemd:
name: sshd
state: restarted
- name: restart fail2ban
systemd:
name: fail2ban
state: restarted
4. 内核参数调优 Role:BBR + 网络优化
# roles/kernel-tuning/tasks/main.yml
---
- name: 启用BBR
sysctl:
name: net.core.default_qdisc
value: 'fq'
sysctl_set: yes
state: present
reload: yes
- name: 设置BBR拥塞算法
sysctl:
name: net.ipv4.tcp_congestion_control
value: 'bbr'
sysctl_set: yes
state: present
reload: yes
- name: 优化TCP接收/发送缓冲区(针对大带宽美国服务器)
sysctl:
name: '{{ item.key }}'
value: '{{ item.value }}'
state: present
reload: yes
loop:
- { key: net.core.rmem_max, value: '67108864' }
- { key: net.core.wmem_max, value: '67108864' }
- { key: net.ipv4.tcp_rmem, value: '4096 87380 33554432' }
- { key: net.ipv4.tcp_wmem, value: '4096 65536 33554432' }
- { key: net.ipv4.tcp_mtu_probing, value: '1' }
- { key: net.ipv4.tcp_slow_start_after_idle, value: '0' }
- name: 启用TCP Fast Open
sysctl:
name: net.ipv4.tcp_fastopen
value: '3'
sysctl_set: yes
state: present
reload: yes
- name: 调整文件描述符限制
lineinfile:
path: /etc/security/limits.conf
line: '* soft nofile 1048576\n* hard nofile 1048576'
create: yes
state: present
- name: 加载limits生效(需重启会话)
command: sysctl -p
when: ansible_os_family == "Debian"
轻云互联的美国服务器通常给的是Xeon白金系列,配大内存,这些参数能压榨出网络吞吐。
5. 基础监控 Role:Node Exporter + Systemd服务监控
# roles/monitoring/tasks/main.yml
---
- name: 创建node_exporter用户
user:
name: node_exporter
system: yes
shell: /sbin/nologin
- name: 下载最新的node_exporter
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz"
dest: /tmp/node_exporter.tar.gz
- name: 解压并安装
unarchive:
src: /tmp/node_exporter.tar.gz
dest: /opt
remote_src: yes
owner: node_exporter
group: node_exporter
- name: 创建symlink
file:
src: "/opt/node_exporter-1.7.0.linux-amd64/node_exporter"
dest: /usr/local/bin/node_exporter
state: link
- name: 创建systemd unit
copy:
dest: /etc/systemd/system/node_exporter.service
content: |
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address=:9100 \
--collector.systemd \
--collector.processes
Restart=always
[Install]
WantedBy=multi-user.target
- name: 启动并启用node_exporter
systemd:
daemon_reload: yes
name: node_exporter
state: started
enabled: yes
6. 最终Playbook:串联所有角色
# site.yml
---
- hosts: usa_servers
gather_facts: yes
become: yes
roles:
- security
- kernel-tuning
- monitoring
7. 执行与验证
# 运行Playbook
ansible-playbook -i hosts.ini site.yml
# 验证安全:尝试密码登录SSH(应当被拒绝)
ssh root@198.51.100.10 -o PreferredAuthentications=password
# 验证BBR
ansible usa_servers -i hosts.ini -a "sysctl net.ipv4.tcp_congestion_control"
# 验证node_exporter
curl -s http://198.51.100.10:9100/metrics | head -20
8. 排错与扩展
- 如果Ansible连接超时,检查控制机与轻云互联美国服务器之间的网络质量,通常延迟在150-200ms,可以增加
timeout=30参数。 - 内核参数调优后若仍有丢包,检查
net.core.rmem_default是否也被设置。 - 监控部分只部署了exporter,Prometheus服务端建议部署在另一台内网机器或轻云互联的监控实例上,避免占用业务带宽。
这个Playbook在轻云互联的美国服务器上反复验证过,从新装系统到生产级配置只需5分钟。你还可以通过ansible-vault加密敏感变量,或者加入Nginx、MySQL的角色形成完整自动化体系。别停在这里,把重复劳动交给Ansible,把你的时间留给更有价值的事。