Quantcast
Channel: Hortonworks » All Topics
Viewing all articles
Browse latest Browse all 5121

GC Error/OOM w/ Hive Query

$
0
0

Replies: 2

Hi all,

I have two tables:

tbl1: 81m rows
tbl2: 4m rows

tbl1 is partitioned on one column and tbl2 has none.

I’m attempting the following query:

SELECT
tbl1.col_pk,
tbl2.col1,
tbl2.col2,
SUM(tbl1.col4),
SUM(tbl1.col5),
SUM(tbl1.col4+col5)
FROM tbl2
JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
WHERE tbl1.partitioned_col IN (’2011′,’2012′,’2013′)
GROUP BY
tbl1.col_pk,
tbl2.col1,
tbl2.col2;

I get this error:

OutOfMemoryError: GC overhead limit exceeded

So, I followed the suggestion at the end of the error output (Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e ‘set hive.map.aggr.hash.percentmemory = 0.25;’) through several iterations, eventually getting my hive.map.aggr.hash.percentmemory setting down to something like .0165 and it still failed.

I did some searching and found some convoluted recommendations of what to try next. Some mentioned upping my heap size, some mentioned re-writing my query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and got the same results.

Currently, some relevant settings are:

NameNode Heap Size: 4096mb
DataNode maximum Java heap size: 4096mb
Hadoop maximum Java heap size: 4096mb
Java Options for MapReduce tasks: 768mb

I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data and one name)

Thanks in advance for the help,
Nick


Viewing all articles
Browse latest Browse all 5121

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>