Jun 13, 2022Optimising Hive Queries with Tez Query EngineTuning configuration parameters for a better performing Hive — Hive provides us the option of executing SQL queries with a few different query engines. It ships with the native MapReduce engine. But we can switch that to Tez which has gained popularity since its launch, or we can also use Apache Spark as well. …Hive4 min readHive4 min read
Published inTowards Data Science·Jan 17, 2022Cleaning and Normalizing Data Using AWS Glue DataBrewAutomate data cleaning with AWS DataBrew without writing any code — A major part of any data pipeline is the cleaning of data. Depending on the project, cleaning data could mean a lot of things. But in most cases, it means normalizing data and bringing data into a format that is accepted within the project. …AWS9 min readAWS9 min read
Published inCodeX·Nov 28, 2021The Dunning-Kruger Effect In TechThis is not the kind of post I usually write on my blog. This is more of a psychology lecture than a how-to tech tutorial. …The Dunning Kruger Effect6 min readThe Dunning Kruger Effect6 min read
Published inTowards Data Science·Nov 18, 2021Understanding Apache Hive LLAPApache Hive is a complex system when you look at it, but once you go looking for more info, it’s more interesting than complex. There are multiple query engines available for Hive, and then there’s LLAP on top of the query engines to make real-time, interactive queries more workable. …Hive8 min readHive8 min read
Published inDataSeries·Nov 5, 2021Installing Hadoop on the New M1 Pro and M1 Max MacBook ProWe’ll see how to install and configure Hadoop and its components on MacOS running on the new M1 Pro and M1 Max chips by Apple — In the previous series of posts, I wrote about how to install the complete Hadoop stack on Windows 11 using WSL 2. And now that the new MacBook Pro laptops are available with the brand new M1 Pro and M1 Max SOCs, here’s a guide on how to install the…Hadoop8 min readHadoop8 min read
Published inTowards Data Science·Nov 1, 2021Installing Hadoop on Windows 11 with WSL2How to install and configure Hadoop and its components on Windows 11 running a Linux distro using WSL 1 or 2. — In the previous post, we saw how to install a Linux distro on Windows 11 using WSL2 and then how to install Zsh and on-my-zsh to make the terminal more customizable. …Big Data8 min readBig Data8 min read
Oct 27, 2021Installing Zsh and Oh-my-zsh on Windows 11 with WSL2Originally published at https://blog.contactsunny.com on October 27, 2021. Before we begin, you might ask, why am I writing on something this trivial? I sold off my old MacBook Pro because I’m super excited about the new M1 Pro MacBook Pros. I have pre-ordered one of those and am waiting for…Windows 115 min readWindows 115 min read
Published inDataDrivenInvestor·Oct 11, 2021Getting Started With Apache AirflowApache Airflow is another awesome tool that I discovered just recently. Just a couple of months after discovering it, I can’t imagine not using it now. It’s reliable, configurable, and dynamic. Because it’s all driven by code, you can version control it too. It’s just awesome! …Airflow11 min readAirflow11 min read
Published inTowards Data Science·Sep 30, 2021Fake (almost) everything with FakerI was recently tasked with creating some random customer data, with names, phone numbers, addresses, and the usual other stuff. At first, I thought I’ll just generate random strings and numbers (some gibberish) and call it a day. But then I remembered my colleagues using a package for that. …Python4 min readPython4 min read
Jun 30, 2021Querying Hive Tables From a Spring Boot AppOriginally published at https://blog.contactsunny.com on June 30, 2021. In this post, we’ll see how we can query tables that reside in Hive using a Spring Boot application. As always, I’m going to use a Spring Boot web app with a few GET APIs to show how we can query data…Hive4 min readHive4 min read