Querying Hive Tables From a Spring Boot App
Originally published at https://blog.contactsunny.com on June 30, 2021.
In this post, we’ll see how we can query tables that reside in Hive using a Spring Boot application. As always, I’m going to use a Spring Boot web app with a few GET APIs to show how we can query data from Hive. The assumption here is that you already have a Hive instance (either remote or local) running in the HTTP transport mode. Because that’s a complete tutorial in itself, I’m not going to talk about that here.
What you need to do is make sure your Hive server is up and running, and it is listening for HTTP requests, which means the transport mode is set to http
. That's pretty much the only prerequisite for this post.
The Dependencies
Before we can start writing any code, we first need to make sure we have all the dependencies sorted out. Apart from the usual Spring Boot dependencies, we’ll need the following Hive-specific ones as well:
<dependency><groupId>org.apache.hive</groupId><artifactId>hive-jdbc</artifactId><version>1.2.1000.2.4.2.12-1</version><exclusions><exclusion><artifactId>slf4j-log4j12</artifactId><groupId>org.slf4j</groupId></exclusion><exclusion><artifactId>log4j</artifactId><groupId>log4j</groupId></exclusion><exclusion><groupId>javax.servlet</groupId><artifactId>servlet-api</artifactId></exclusion><exclusion><groupId>org.eclipse.jetty.aggregate</groupId><artifactId>jetty-all</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.springframework</groupId><artifactId>spring-jdbc</artifactId></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-core</artifactId><version>1.2.1</version></dependency></dependencies>
And because not all of these are available in Maven’s central repository, we’ll have to fetch some of them from the Hortonworks repository. For this, make sure you add the following <repository>
section as well to your pom.xml
file:
<repositories><repository><id>hortonworks</id><name>Horton works Repo</name><url>http://repo.hortonworks.com/content/repositories/releases/</url></repository></repositories>
That’s all the dependencies we’ll need.
The Properties
We’ll be using JDBC to connect to Hive, as that’s the interface that Hive supports. Before we start, let’s create an application.properties
file so that we can save our connection string and auth details for the connection. For now, we only need three properties:
hive.connectionURL=jdbc:hive2://localhost:10001/default;transportMode=http;httpPath=cliservice
hive.username=
hive.password=
As you can see, we’re using HiveServer2 for the connection, and using all default parameters in the connection string. The username and password fields are blank as I don’t have any authentication setup for my local Hive instance. Make sure to change these there parameters if you are connecting to a remote Hive server.
The Configuration
Now that we have the properties ready, let’s create a config class in our Spring Boot application. I call this the HiveConfig
class, with the @Configuration
annotation. In this class, we'll read the properties that we created in the previous step:
@Value("${hive.connectionURL}")
private String hiveConnectionURL; @Value("${hive.username}")
private String userName; @Value("${hive.password}")
private String password;
This was pretty straight forward. Next, we’ll create a DataSource
instance for Hive using these properties:
public DataSource getHiveDataSource() throws IOException {BasicDataSource dataSource = new BasicDataSource();dataSource.setUrl(this.hiveConnectionURL);dataSource.setDriverClassName("org.apache.hive.jdbc.HiveDriver");dataSource.setUsername(this.userName);dataSource.setPassword(this.password);return dataSource;}
Now we need a JDBC Template that can use this data source. We’ll use the JDBC template as an interface for all our queries to Hive. Creating a JDBC template is pretty simple. We’ll define this as a bean that can be auto-wired anywhere in the project.
@Bean(name = "jdbcTemplate")public JdbcTemplate getJDBCTemplate() throws IOException {return new JdbcTemplate(getHiveDataSource());}
The Queries
Now that we have all that we need to query Hive from our Spring Boot app, let’s look at how we can actually run some simple queries. First, we’ll need to auto-wire the JDBC template that we defined in the previous step. After that, we need to write a SQL query that we want to run. Next, we’ll run the query to get some results.
@Autowiredprivate JdbcTemplate jdbcTemplate;private static final String SHOW_DATABASES_QUERY = "show databases";List<Map<String, Object>> databases = jdbcTemplate.queryForList(SHOW_DATABASES_QUERY);
That’s pretty much it. Once you print the output of this query, you should see something similar to the following JSON:
[ { "database_name": "default" } ]
You’ll of course have more items in the list depending on how many databases or schemes you have in your Hive instance.
And that’s pretty much it. Instead of the simple query to list all databases, you can have any query you’d want and get the result. And once you have the result, you can perform any operation that the Collections framework provides.
This should be a good enough starting point for you to start exploring Spring Boot and Hive connection. You can work your way up from here. Hope this helps! If you want the ready-to-use project instead of creating one from scratch, feel free to clone the one that I created here on Github. And if you think something can be better in the code, do let me know by raising a PR. And make sure to checkout other data related posts that might interest you.
And if you like what you see here, or on my Medium blog, and would like to see more of such helpful technical posts in the future, consider supporting me on Patreon and Github.