Ask Your Question
0

Restart Sahara Spark Services

asked 2015-02-05 17:52:41 -0500

Nastooh gravatar image

After rebooting my worker nodes, I am not able to access them from cluster master. Here is what I do:

cd /opt/spark/sbin
ubuntu@vid-c-001:/opt/spark/sbin$ ./stop-all.sh 
vid-w-001: no org.apache.spark.deploy.worker.Worker to stop
vid-w-002: no org.apache.spark.deploy.worker.Worker to stop
no org.apache.spark.deploy.master.Master to stop
ubuntu@vid-c-001:/opt/spark/sbin$ ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/sbin/../logs/spark-ubuntu-org.apache.spark.deploy.master.Master-1-vid-c-001.novalocal.out
vid-w-001: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/sbin/../logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-vid-w-001.out
vid-w-002: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/sbin/../logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-vid-w-002.out
ubuntu@vid-c-001:/opt/spark/sbin$

Looking at services on master and worker nodes I see the relevant services:
On worker node:

  ubuntu@vid-w-002:/opt/spark/sbin$ ps -aefww | grep java
    hdfs      1207     1  0 18:55 ?        00:00:33 /usr/java/jdk1.7.0_51/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-vid-w-002.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
    ubuntu    3885     1 28 23:43 ?        00:00:02 java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://vid-c-001:7077

Master Node:

ubuntu@vid-c-001:~$ ps -aefww | grep java
hdfs      1999     1  0 Feb03 ?        00:04:10 /usr/java/jdk1.7.0_51/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-vid-c-001.novalocal.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
ubuntu   27005     1  1 23:35 pts/2    00:00:10 /usr/java/jdk1.7.0_51/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip vid-c-001 --port 7077 --webui-port 8080

However, shortly after submitting a job, spark services on worker nodes terminate and I get the following: ...

15/02/05 23:40:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/02/05 23:40:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-03-06 18:27:06 -0500

Nastooh gravatar image

Problem is that upon rebooting a worker node, its hosts file gets wiped out, which in turn adversely affects Spark scripts. Workaround is to, upon restart, properly re-populate the /etc/hosts file.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-02-05 17:52:41 -0500

Seen: 600 times

Last updated: Mar 06 '15