ceph从14.2.20升级到14.2.22
· ☕ 2 分钟 · ✍️ starifly
注意:此文档用于 Ceph Nautilus 版本(包括社区版 Ceph 14.2.x 和红帽版 Redhat Ceph Storage 4.x)内的小版本升级,不能用于 Ceph 大版本升级(例如从 Ceph Luminous 升级到 Ceph Nautilus)。
Ceph 集群简介
Ceph Nautilus 集群(docker方式部署)包括的角色如下:
ceph rgw创建新的region
· ☕ 2 分钟 · ✍️ starifly
ceph rgw桶分片问题
· ☕ 4 分钟 · ✍️ starifly
背景说明
RGW 为每个 bucket 维护了一份索引,里面保存了 bucket 中全部对象的元数据。RGW 本身并没有足够有效的遍历对象的能力,所以在处理请求时,这些索引数据非常重要,比如遍历 bucket 中全部对象时。bucket 索引信息还有其他用处,比如为版本控制的对象维护日志、bucket 配额元数据和跨区同步的日志。bucket 索引不会影响对象的读操作,但写和修改确实会增加一些额外的操作。
使用grafana+prometheus监控ceph集群并实现钉钉报警
· ☕ 3 分钟 · ✍️ starifly
在Ceph Luminous之前的版本,可以使用第三方的Prometheus exporter ceph_exporter。 Ceph Luminous 12.2.1后的mgr中自带了Prometheus插件,内置了 Prometheus ceph exporter,可以使用Ceph mgr内置的exporter作为Prometheus的target。
ceph三节点故障恢复
· ☕ 2 分钟 · ✍️ starifly
ceph 集群有三个节点,每台节点都用 docker 容器部署了 mon、osd、mgr、rgw、mds 服务,现在假设在其它机器备份了ceph集群的配置和认证信息(/etc/ceph和/var/lib/ceph目录),而当前三个节点全部出现系统故障导致集群瘫痪的情况下,我们怎么恢复 ceph 集群。
ceph常见OSD故障处理
· ☕ 1 分钟 · ✍️ starifly
删除osd的正确方式
· ☕ 1 分钟 · ✍️ starifly
Ceph rbd简单使用
· ☕ 3 分钟 · ✍️ starifly
创建 RBD
服务器端操作
创建 pool
[root@ceph-node1 ~/mycluster]#ceph osd pool create rbd 64
pool 'rbd' created
创建客户端帐号
# 创建客户端用户
[root@ceph-node1 ~/mycluster]#ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children,allow rwx pool=rbd'
# 查看用户及权限
[root@ceph-node1 ~/mycluster]#ceph auth get client.rbd
exported keyring for client.rbd
[client.rbd]
key = AQB6OAhfMN4jFhAAPmO17m5Z5gP5YC11JOJcTA==
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children,allow rwx pool=rbd"
# 导出客户端keyring
[root@ceph-node1 ~/mycluster]#ceph auth get client.rbd -o ./ceph.client.rbd.keyring
exported keyring for client.rbd
pool 启动 RBD
[root@ceph-node1 ~/mycluster]#ceph osd pool application enable rbd rbd
enabled application 'rbd' on pool 'rbd'
客户端操作
安装 ceph-common
Docker安装ceph nautilus
· ☕ 4 分钟 · ✍️ starifly
操作系统基础配置
- 三节点创建文件夹:
mkdir -p /etc/ceph /var/lib/ceph /var/log/ceph
- 配置定时任务
systemctl start ntpd && systemctl enable ntpd
将时间每隔1小时自动校准同步
0 */1 * * * ntpdate ntp1.aliyun.com > /dev/null 2>&1; /sbin/hwclock -w
- 内核优化
#调整内核参数
[root@CENTOS7-1 ~]# cat >> /etc/sysctl.conf << EOF
> kernel.pid_max=4194303
> vm.swappiness = 0
> EOF
[root@CENTOS7-1 ~]# sysctl -p
# read_ahead, 通过数据预读并且记载到随机访问内存方式提高磁盘读操作,8192是比较理想的值
[root@CENTOS7-1 ~]# echo "8192" > /sys/block/sda/queue/read_ahead_kb
# I/O Scheduler优化,如果SSD要用noop,SATA/SAS设备采用deadline。
[root@CENTOS7-1 ~]#echo "deadline" > /sys/block/sda/queue/scheduler
[root@CENTOS7-1 ~]#echo "noop" > /sys/block/sda/queue/scheduler
- 关闭selinux
# vi /etc/selinux/config文件, 将SELINUX设为disabled, 永久生效。
SELINUX=disabled
# 临时生效:
setenforce 0
- 修改主机名和hosts
hostnamectl set-hostname ceph001
hostnamectl set-hostname ceph002
hostnamectl set-hostname ceph003
# vim /etc/hosts
192.168.5.203 ceph001
192.168.5.203 ceph002
192.168.5.203 ceph003
- 编辑别名
echo 'alias ceph="docker exec mon ceph"' >> /etc/profile
source /etc/profile
- 准备磁盘
三节点各准备一块磁盘,不用分区。
ceph librados库c++使用
· ☕ 2 分钟 · ✍️ starifly
依赖
# centos
yum install librados2-devel
源程序
#include <iostream>
#include <string>
#include <rados/librados.hpp>
int main(int argc, const char **argv)
{
int ret = 0;
/* Declare the cluster handle and required variables. */
librados::Rados cluster;
char cluster_name[] = "ceph";
char user_name[] = "client.admin";
uint64_t flags = 0;
/* Initialize the cluster handle with the "ceph" cluster name and "client.admin" user */
{
ret = cluster.init2(user_name, cluster_name, flags);
if (ret < 0) {
std::cerr << "Couldn't initialize the cluster handle! error " << ret << std::endl;
return EXIT_FAILURE;
} else {
std::cout << "Created a cluster handle." << std::endl;
}
}
/* Read a Ceph configuration file to configure the cluster handle. */
{
ret = cluster.conf_read_file("/etc/ceph/ceph.conf");
if (ret < 0) {
std::cerr << "Couldn't read the Ceph configuration file! error " << ret << std::endl;
return EXIT_FAILURE;
} else {
std::cout << "Read the Ceph configuration file." << std::endl;
}
}
/* Read command line arguments */
{
ret = cluster.conf_parse_argv(argc, argv);
if (ret < 0) {
std::cerr << "Couldn't parse command line options! error " << ret << std::endl;
return EXIT_FAILURE;
} else {
std::cout << "Parsed command line options." << std::endl;
}
}
/* Connect to the cluster */
{
ret = cluster.connect();
if (ret < 0) {
std::cerr << "Couldn't connect to cluster! error " << ret << std::endl;
return EXIT_FAILURE;
} else {
std::cout << "Connected to the cluster." << std::endl;
}
}
librados::IoCtx io_ctx;
const char *pool_name = "testpool";
{
ret = cluster.ioctx_create(pool_name, io_ctx);
if (ret < 0) {
std::cerr << "Couldn't set up ioctx! error " << ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Created an ioctx for the pool." << std::endl;
}
}
/* Write an object synchronously. */
{
librados::bufferlist bl;
bl.append("Hello World!");
ret = io_ctx.write_full("hw", bl);
if (ret < 0) {
std::cerr << "Couldn't write object! error " << ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Wrote new object 'hw' " << std::endl;
}
}
/*
* Add an xattr to the object.
*/
{
librados::bufferlist lang_bl;
lang_bl.append("en_US");
ret = io_ctx.setxattr("hw", "lang", lang_bl);
if (ret < 0) {
std::cerr << "failed to set xattr version entry! error "
<< ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Set the xattr 'lang' on our object!" << std::endl;
}
}
/*
* Read the object back asynchronously.
*/
{
librados::bufferlist read_buf;
int read_len = 4194304;
//Create I/O Completion.
librados::AioCompletion *read_completion = librados::Rados::aio_create_completion();
//Send read request.
ret = io_ctx.aio_read("hw", read_completion, &read_buf, read_len, 0);
if (ret < 0) {
std::cerr << "Couldn't start read object! error " << ret << std::endl;
exit(EXIT_FAILURE);
}
// Wait for the request to complete, and check that it succeeded.
read_completion->wait_for_complete();
ret = read_completion->get_return_value();
if (ret < 0) {
std::cerr << "Couldn't read object! error " << ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Read object hw asynchronously with contents.\n"
<< read_buf.c_str() << std::endl;
}
}
/*
* Read the xattr.
*/
{
librados::bufferlist lang_res;
ret = io_ctx.getxattr("hw", "lang", lang_res);
if (ret < 0) {
std::cerr << "failed to get xattr version entry! error "
<< ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Got the xattr 'lang' from object hw!"
<< lang_res.c_str() << std::endl;
}
}
/*
* Remove the xattr.
*/
{
ret = io_ctx.rmxattr("hw", "lang");
if (ret < 0) {
std::cerr << "Failed to remove xattr! error "
<< ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Removed the xattr 'lang' from our object!" << std::endl;
}
}
/*
* Remove the object.
*/
{
ret = io_ctx.remove("hw");
if (ret < 0) {
std::cerr << "Couldn't remove object! error " << ret << std::endl;
exit(EXIT_FAILURE);
} else {
std::cout << "Removed object 'hw'." << std::endl;
}
}
io_ctx.close();
cluster.shutdown();
return 0;
}
编译
g++ -g -c cephclient.cc -o cephclient.o -std=c++11
g++ -g cephclient.o -lrados -o cephclient
运行
注意运行前要先创建“testpool”存储池
Ceph中的PG和PGP的关系
· ☕ 4 分钟 · ✍️ starifly
【前言】
ceph创建存储池需要pg数和pgp数的两个参数,在前面我们介绍了ceph的pg,那么pgp和pg有什么关系呢?
PG (Placement Group),pg是一个虚拟的概念,用于存放object,PGP(Placement Group for Placement purpose),相当于是pg存放的一种osd排列组合。举个例子:假设集群有3个osd,即osd1,osd2,osd3,副本数为2,如果pgp=1,那么pg存放的osd的组合就有一种,可能是[osd1,osd2],那么所有的pg主从副本都会存放到osd1和osd2上;如果pgp=2,那么其osd组合可能就两种,可能是[osd1,osd2]和[osd1,osd3],pg的主从副本会落在[osd1,osd2]或者[osd1,osd3]中,和我们数学中的排列组合很像,所以pg是存放对象的归属组是一种虚拟概念,pgp就是pg对应的osd排列组合。一般情况下,存储池的pg和pgp的数量设置相等。
Docker安装ceph luminous
· ☕ 4 分钟 · ✍️ starifly
本文基于centos7以docker方式安装ceph 12.2.13版本
操作系统基础配置
- 三节点创建文件夹:
mkdir -p /etc/ceph/ /var/lib/ceph/ /var/log/ceph/
chown -R 167:167 /var/log/ceph/
- 配置定时任务
systemctl start ntpd && systemctl enable ntpd
# 将时间每隔1小时自动校准同步
0 */1 * * * ntpdate ntp1.aliyun.com > /dev/null 2>&1; /sbin/hwclock -w
- 内核优化
#调整内核参数
[root@CENTOS7-1 ~]# cat >> /etc/sysctl.conf << EOF
> kernel.pid_max=4194303
> vm.swappiness = 0
> EOF
[root@CENTOS7-1 ~]# sysctl -p
# read_ahead, 通过数据预读并且记载到随机访问内存方式提高磁盘读操作,8192是比较理想的值
[root@CENTOS7-1 ~]# echo "8192" > /sys/block/sda/queue/read_ahead_kb
# I/O Scheduler优化,如果SSD要用noop,SATA/SAS设备采用deadline。
[root@CENTOS7-1 ~]#echo "deadline" > /sys/block/sda/queue/scheduler
[root@CENTOS7-1 ~]#echo "noop" > /sys/block/sda/queue/scheduler
- 关闭selinux
# vi /etc/selinux/config文件, 将SELINUX设为disabled, 永久生效。
SELINUX=disabled
# 临时生效:
setenforce 0
- 编辑别名
echo 'alias ceph="docker exec mon ceph"' >> /etc/profile
echo 'alias ceph-fuse="docker exec mon ceph-fuse"' >> /etc/profile
echo 'alias ceph-mon="docker exec mon ceph-mon"' >> /etc/profile
echo 'alias ceph-osd="docker exec mon ceph-osd"' >> /etc/profile
echo 'alias radosgw="docker exec mon radosgw"' >> /etc/profile
echo 'alias radosgw-admin="docker exec mon radosgw-admin"' >> /etc/profile
echo 'alias rados="docker exec mon rados"' >> /etc/profile
source /etc/profile
启动mon
主节点启动mon: