Member-only story

Optimising Hive Queries with Tez Query Engine

Tuning configuration parameters for a better performing Hive

4 min readJun 13, 2022

--

Hive performance (Race)
Photo by Nicolas Hoizey on Unsplash

Hive provides us the option of executing SQL queries with a few different query engines. It ships with the native MapReduce engine. But we can switch that to Tez which has gained popularity since its launch, or we can also use Apache Spark as well. Most production deployments of Hive today use the Tez query engine.

In this post, we are going to look at a few Hive configuration parameters that we can use to tune the performance of our Hive queries. Using default configuration might not be the best option always. Customising some of the configuration parameters could sometimes result in as much as 50% improvement in query performance. Let’s see some of these configuration parameters.

Number of Reducers

By default, Tez determines how many reducers to use for a query depending on the number of bytes of data processed. But of course, we can override this and provide a constant number. It should be obvious that a constant number of reducers is not optimal for all queries. We will have to change the number of executors for each query and check the performance in a brute force method. But as the amount of data changes, this value will change as well.

--

--

Sunny Srinidhi
Sunny Srinidhi

No responses yet