What causes the meta-data service timing out on some nodes?
Hey guys- Has anyone seen this sort of thing and have any idea what is up? The weird thing is that is only happens on some of the compute nodes in this cluster not all of them- Thanks!
cloudinit startlocal running: Fri, 10 May 2013 23:29:23 +0000. up 2.99 seconds
no instance data found in startlocal
ciinfo: lo : 1 127.0.0.1 255.0.0.0 .
ciinfo: eth0 : 1 10.160.4.152 255.255.255.0 fa:16:3e:eb:d9:13
ciinfo: route0: 0.0.0.0 10.160.4.150 0.0.0.0 eth0 UG
ciinfo: route1: 10.160.4.0 0.0.0.0 255.255.255.0 eth0 U
cloudinit start running: Fri, 10 May 2013 23:29:24 +0000. up 3.44 seconds
20130510 23:30:15,787 util.py[WARNING]: 'http://169.254.169.254/20090404/metadata/instanceid' failed [51/120s]: socket timeout [timed out]
20130510 23:31:06,841 util.py[WARNING]: 'http://169.254.169.254/20090404/metadata/instanceid' failed [102/120s]: socket timeout [timed out]
20130510 23:31:23,860 util.py[WARNING]: 'http://169.254.169.254/20090404/metadata/instanceid' failed [119/120s]: socket timeout [timed out]
20130510 23:31:24,862 DataSourceEc2.py[CRITICAL]: giving up on md after 120 seconds
Which version of OpenStack are you running? Earlier versions (eg Diablo, Essex) had some problems with the scalability of the metadata service that are fixed in Folsom and beyond.
I'm getting same error all time, but if I try to curl this: http://169.254.169.254/2009-04-04/meta-data/instance-id it works, with dashes. Why they are missing on instance boot I didn't figure out.
About the missing dashes AFAIK is just a bug in the log and they are not really missing. I used to have problems because of the controller node being overloaded. After the instance boot can you curl http://192.254.169.254/2009-04-04/meta-data/instance-id ?