Originally published at on March 3, 2021.

If you are new to JanusGraph and the Gremlin query language, like I am, you would be confused about the out(), outE(), in(), and inE() methods. If you look at examples of these functions, you'll not be able to comprehend the difference easily. Or is it just me?

Anyway, I got confused and it took me a while to understand there is a difference, and there isn’t. Let me explain.

The Sample Graph

Before we look at the differences, let’s look at a sample graph.

As you can see from the graph above, we have…

Originally published at on February 25, 2021.

JanusGraph is a graph processing tool that can process graphs stored on clusters with multiple nodes. JanusGraph is designed for massive clusters and for real-time traversals and analytics queries.

In this post, we’ll look at a few queries that you would want to run the very first time you install JanusGraph and start playing with the Gremlin console. I did this just yesterday, so it’s still fresh in my memory. I saw the need to write this as I didn’t find a few of my questions answered in the official documentation, and…

First published on my personal blog.

Disclaimer: I use both iPhone and an Android phone. I have used all three major desktop operating systems — Windows, various Linux distros, and MacOS. And I have also used Android tablets, and iPads.

A lot of non-tech-savvy people don’t understand the difference between free and paid apps. And many people who do, just ignore it. There’s one thing you need to understand, there ain’t no such thing as a free lunch. We’ll start this post with that adage.

When something is being given out for free, there’s always a catch. For example, right…

Parquet is an open source file format by Apache for the Hadoop infrastructure. Well, it started as a file format for Hadoop, but it has since become very popular and even cloud service providers such as AWS have started supporting the file format. This could only mean that Parquet should be doing something right. In this post, we’ll see what exactly is the Parquet file format, and then we’ll see a simple Java example to create or write Parquet files.

Intro to Parquet File Format

We store data as rows in the traditional approach. But Parquet takes a different approach, where it flattens the data…

Originally published at on April 1, 2020.

If you have been working in the software development industry for the last few years, you have heard about both microservices and serverless applications. Especially serverless, as most companies are riding this wave, and for good. But, when you’re architecting a whole system as a bunch of microservices, serverless or not, how do you make sure that all transactions are taken care of properly?

When your application is broken down into microservices and is distributed, you can’t have ACID transactions on your databases. Because each microservice will have its own database and…

Originally published at on March 24, 2020.

Docker is everywhere today. A lot of projects in most big companies are deployed in production as Docker containers. Once you realize how easy and useful this approach is, you’d want to Dockerzie everything. A lot of tools and services today provide official Docker images so that you don’t have to worry about downloading all the required dependencies, fixing version conflicts of those dependencies, and other compatibility issues. Docker is just peace of mind, to put it simply.

But how do we get started with this? I get this question a lot…

Originally published at on March 18, 2020.

Most of you would already know about SonarQube, and most of you will already be using it. I never got the opportunity to use it, even though I had heard about it. But recently, I decided to make it a part of my development pipeline for my personal projects. So I set it up as a Docker service on my laptop and installed the SonarLint IntelliJ IDEA plugin. And the analysis results shocked me.

In this post, I’ll talk about what SonarQube is, how to install it, and how to use it…

As the data generated from IoT devices, mobile devices, applications, etc. increases at an hourly rate, creating a data lake to store all that data is getting crucial for almost any application at scale. There are many tools and services that you could use to create a data lake. But sometimes, you overlook the simplest and the easiest of them all, the AWS stack. In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.

Amazon Kinesis Data Firehose

Kinesis Data Firehose is a tool / service that Amazon offers…

Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you’ve already read my post about stemming of words in NLP, you’ll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let’s see what those are.

How is Lemmatization different from Stemming

In stemming…

Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects. If you’re new to this space, it is possible that you don’t exactly know what this is even though you have come across this word. You might also be confused between stemming and lemmatization, which are two similar operations. In this post, we’ll see what exactly is stemming, with a few examples here and there. I hope I’ll be able to explain this process in simple words for you.


To put simply, stemming is the process of removing a part…

Sunny Srinidhi

