Ask Your Question
0

Question about Swift Storage Nodes CPU usage

asked 2012-10-02 13:32:46 -0500

gucluakkaya gravatar image

Hi all,

For 3 months we have been using Openstack swift in our test environment and while monitoring the CPU usage and I/O traffic on the storage nodes we realized that even at night in which no one makes a request CPU usage is like 40 to 50% on the servers and I/O throughput is higher and expected.

Test environment:

1 node (swift-proxy + keystone) 5 storage nodes (account,container and object servers)

Server specifications:

Virtual machine VMWare OS: Ubuntu 10.04 64 bit LTS CPU: 4 cores RAM: 6 GB

On the storage nodes for each module (account,container and object) auditor,replicator,updater and server processes are running.

Configuration files:

account-server.conf

[DEFAULT] bind_ip = 10.1.1.152 workers = 2 log_facility = LOG_LOCAL1

[pipeline:main] pipeline = account-server

[app:account-server] use = egg:swift#account

[account-replicator]

[account-auditor]

[account-reaper]

container-server.conf

[DEFAULT] bind_ip = 10.1.1.152 workers = 2 log_facility = LOG_LOCAL2

[pipeline:main] pipeline = container-server

[app:container-server] use = egg:swift#container

[container-replicator]

[container-updater]

[container-auditor]

object-server.conf

[DEFAULT] bind_ip = 10.1.1.152 workers = 2 log_facility = LOG_LOCAL3

[pipeline:main] pipeline = object-server

[app:object-server] use = egg:swift#object

[object-replicator]

[object-updater]

[object-auditor]

rsyncd.conf

uid = swift

gid = swift

log file = /var/log/rsyncd.log

pid file = /var/run/rsyncd.pid

address = 10.1.1.152

[account]

max connections = 2

path = /srv/node/

read only = false

lock file = /var/lock/account.lock

[container]

max connections = 2

path = /srv/node/

read only = false

lock file = /var/lock/container.lock

[object]

max connections = 2

path = /srv/node/

read only = false

lock file = /var/lock/object.lock

Since we have not run a perfomance test yet i think that the amount of resouce usage is too much and will cause problems during our tests. From the logs i can only understand that replicator is always run for a short periods of times. What can you recommend for improving our environment?

Thanks

edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted
0

answered 2012-10-18 08:52:48 -0500

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

edit flag offensive delete link more
0

answered 2012-10-18 10:53:11 -0500

gucluakkaya gravatar image

Update:

After increasing run_pause to 900 from 30 in account_replicator, container_replicator and object_replicator sections in the configuration files the CPU usage has been decrease considerably. However increasing run_pause indicates that replication process stops for the given time. What can be drawback for increasing this value with respect reliability and what would you recommend for run_pause value?

Updated configuration:

account_server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = account-server

[app:account-server] use = egg:swift#account

[account-replicator] run_pause=900

[account-auditor]

[account-reaper]

container_server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = container-server

[app:container-server] use = egg:swift#container

[container-replicator] run_pause=900

[container-updater]

[container-auditor]

object_server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = object-server

[app:object-server] use = egg:swift#object

[object-replicator] run_pause=1500 ring_check_interval=900

[object-updater]

[object-auditor]

edit flag offensive delete link more
0

answered 2012-10-19 12:54:11 -0500

dafridgie gravatar image

Hi, I've also found this with swift. Another value you may to play with are the object auditor file settings as follows:

/etc/swift/object-server/1.conf

[DEFAULT] devices = /srv/node/ bind_port = 6066 user = swift workers = 2 log_facility = LOG_LOCAL2 mount_check = true

[pipeline:main] pipeline = recon object-server

[app:object-server] use = egg:swift#object

[filter:recon] use = egg:swift#recon recon_cache_path = /var/cache/swift recon_lock_path = /var/lock

[object-replicator] concurrency = 4 vm_test_mode = no run_pause = 60 recon_enable = yes recon_cache_path = /var/cache/swift

[object-updater] concurrency = 4

recon_enable = yes recon_cache_path = /var/cache/swift

[object-auditor] files_per_second = 5 <<<<<<<<<<<< Adjusting this down from the default of 20 bytes_per_second = 2500000 <<<<<<<<<<<<< Adjusting this down from a default of 10000000 concurrency = 25 recon_enable = yes recon_cache_path = /var/cache/swift

I had experienced the occasional last write failure as well as the odd write chunk timeout. Adjusting hte settings above reduced the overall read workload the auditor processes placed on the storage nodes.

Also I've found that increasing the run_pause settings from default of 30 on account/container and object settings also reduced both CPU and Io workload, particulalry when large chunked file uploads are done.

Here are all my account/container storage node config files if they are of help:

/etc/swift/Account-server/1.conf

[DEFAULT] devices = /srv/node/ bind_port = 6068 user = swift workers = 2 log_facility = LOG_LOCAL2 mount_check = true

[pipeline:main] pipeline = recon account-server

[app:account-server] use = egg:swift#account set log_address = /dev/log

[filter:recon] use = egg:swift#recon recon_cache_path = /var/cache/swift

[account-replicator] concurrency = 4 run_pause = 45

recon_cache_path = /var/cache/swift vm_test_mode = no

[account-auditor] recon_cache_path = /var/cache/swift

[account-reaper] concurrency = 25

/etc/swift/container-server/1.conf [DEFAULT] devices = /srv/node/ bind_port = 6067 user = swift workers = 2 log_facility = LOG_LOCAL2 mount_check = true

[pipeline:main] pipeline = recon container-server

[app:container-server] use = egg:swift#container

[filter:recon] use = egg:swift#recon recon_cache_path = /var/cache/swift

[container-replicator] concurrency = 8 run_pause = 60

recon_cache_path = /var/cache/swift vm_test_mode = no

[container-updater] concurrency = 4

recon_cache_path = /var/cache/swift

[container-auditor] recon_cache_path = /var/cache/swift

[container-sync] interval = 300 container_time = 60

Overall I've found spending time tuning the rings and then testing with swift-bench + a script of sequential uploads provide the performance feedback needed to get the Io and CPu load on storage nodes more manageable

Hope this helps

Steve a

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2012-10-02 13:32:46 -0500

Seen: 358 times

Last updated: Oct 19 '12