Quantcast
Channel: Hortonworks » All Topics
Viewing all articles
Browse latest Browse all 5121

Services stuck in Ambari

$
0
0

Replies: 4

Hello,

After some successful work with my Hadoop cluster, I’m having some trouble managing services

I tried to stop completely all the hadoop services in order to modify some log4j rotation log parameters, and since this time, some services are stuck (they are in state STOP_FAILED, however they are successfully shutdowned).

The symptoms are the following :

I’m able to launch events both from the Ambari-server UI or directly with the API via curl. But all the actions that are launched are never taken into account, they stay with state QUEUED or PENDING …

And puppet site files corresponding to those actions in /var/lib/ambari-agent/data are not generated anymore, as before

(I succeed in stop / start all those services manually, using custom puppet manifests, so the problem seems not to be situated at the service level)

It looks like ambari-agents are doing nothing and didn’t take the Ambari-server actions which are terminated with TIMEOUT state.

I see nothing particular both in Ambari-agent and ambari-server logs which could explain this behavor. I already tried to restart all of them, and even rebooting servers composing my HDP cluster. But the issue is still there.

Below an example for nagios service of what I am saying :

{
“href” : “http://obench20s:8080/api/v1/clusters/hadoop_poc/requests/90/tasks/361″,
“Tasks” : {
“exit_code” : 999,
“stdout” : “”,
“status” : “QUEUED”,
“stderr” : “”,
“host_name” : “obench20s****”,
“id” : 361,
“cluster_name” : “hadoop_poc”,
“attempt_cnt” : 1,
“request_id” : 90,
“command” : “STOP”,
“role” : “NAGIOS_SERVER”,
“start_time” : 1361895724078,
“stage_id” : 1
}

A few time later :

{
“href” : “http://obench20s:8080/api/v1/clusters/hadoop_poc/requests/90/tasks/361″,
“Tasks” : {
“exit_code” : 999,
“stdout” : “”,
“status” : “TIMEDOUT”,
“stderr” : “”,
“host_name” : “obench20s****”,
“id” : 361,
“cluster_name” : “hadoop_poc”,
“attempt_cnt” : 2,
“request_id” : 90,
“command” : “STOP”,
“role” : “NAGIOS_SERVER”,
“start_time” : 1361895724078,
“stage_id” : 1
}

In ambari-server log, I get :

17:28:37,403 DEBUG ResourceProviderImpl:271 – Setting property for resource, resourceType=HostComponent, propertyId=HostRoles/host_name, value=obench20s*****
17:28:37,403 DEBUG ResourceProviderImpl:271 – Setting property for resource, resourceType=HostComponent, propertyId=HostRoles/state, value=STOPPING
17:28:37,404 DEBUG ResourceProviderImpl:271 – Setting property for resource, resourceType=HostComponent, propertyId=HostRoles/desired_state, value=INSTALLED

Ambari-agent logs (from the server where Nagios normally run):

INFO 2013-02-26 17:36:30,487 Heartbeat.py:68 – Heartbeat dump: {‘componentStatus’: [],
‘hostname’: ‘obench20s****’,
‘nodeStatus’: {’cause’: ‘NONE’, ‘status’: ‘HEALTHY’},
‘reports’: [],
‘responseId’: 260,
‘timestamp’: 1361896590486}

Many thanks for help


Viewing all articles
Browse latest Browse all 5121

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>