Sign in

Sunny Srinidhi

First published on my personal blog.

Image for post
Image for post
Photo by Joshua Earle on Unsplash

Disclaimer: I use both iPhone and an Android phone. I have used all three major desktop operating systems — Windows, various Linux distros, and MacOS. And I have also used Android tablets, and iPads.

A lot of non-tech-savvy people don’t understand the difference between free and paid apps. And many people who do, just ignore it. There’s one thing you need to understand, there ain’t no such thing as a free lunch. We’ll start this post with that adage.

When something is being given out for free, there’s always a catch. For example, right now, my personal money management service is free for all (shameless plug). But as soon as I fix all the bugs, and implement all the requested features, the service is going behind a paywall. The reason? …


Image for post
Image for post
Photo by Maarten van den Heuvel on Unsplash

Parquet is an open source file format by Apache for the Hadoop infrastructure. Well, it started as a file format for Hadoop, but it has since become very popular and even cloud service providers such as AWS have started supporting the file format. This could only mean that Parquet should be doing something right. In this post, we’ll see what exactly is the Parquet file format, and then we’ll see a simple Java example to create or write Parquet files.

Intro to Parquet File Format

We store data as rows in the traditional approach. But Parquet takes a different approach, where it flattens the data into columns before storing it. This allows for better data compression for storing, and also for better query performance. …


Originally published at https://blog.contactsunny.com on April 1, 2020.

Image for post
Image for post

If you have been working in the software development industry for the last few years, you have heard about both microservices and serverless applications. Especially serverless, as most companies are riding this wave, and for good. But, when you’re architecting a whole system as a bunch of microservices, serverless or not, how do you make sure that all transactions are taken care of properly?

When your application is broken down into microservices and is distributed, you can’t have ACID transactions on your databases. Because each microservice will have its own database and usually, a feature or function involves multiple such microservices. …


Originally published at https://blog.contactsunny.com on March 24, 2020.

Image for post
Image for post

Docker is everywhere today. A lot of projects in most big companies are deployed in production as Docker containers. Once you realize how easy and useful this approach is, you’d want to Dockerzie everything. A lot of tools and services today provide official Docker images so that you don’t have to worry about downloading all the required dependencies, fixing version conflicts of those dependencies, and other compatibility issues. Docker is just peace of mind, to put it simply.

But how do we get started with this? I get this question a lot. And that’s the reason I’m writing this post. Getting started is easy. You just create a Dockerfile, write a command to download all the dependencies, copy your project into the Docker image, build it, and run it. In this post, we’ll take a simple Spring Boot web application as an example for this and see how we can get this done. …


Originally published at https://blog.contactsunny.com on March 18, 2020.

Image for post
Image for post

Most of you would already know about SonarQube, and most of you will already be using it. I never got the opportunity to use it, even though I had heard about it. But recently, I decided to make it a part of my development pipeline for my personal projects. So I set it up as a Docker service on my laptop and installed the SonarLint IntelliJ IDEA plugin. And the analysis results shocked me.

In this post, I’ll talk about what SonarQube is, how to install it, and how to use it. Hopefully, by the end of this post, I will convince you enough to include it in your pipeline. …


Image for post
Image for post

As the data generated from IoT devices, mobile devices, applications, etc. increases at an hourly rate, creating a data lake to store all that data is getting crucial for almost any application at scale. There are many tools and services that you could use to create a data lake. But sometimes, you overlook the simplest and the easiest of them all, the AWS stack. In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.

Amazon Kinesis Data Firehose

Kinesis Data Firehose is a tool / service that Amazon offers as a part of AWS that is built for handling large scale streaming data from various sources and dumping that data into a data lake. Not just that, Firehose is even capable of transforming the streaming data before it reaches the data lake. The best part is, the transformation happens completely serverless, and without needing a complex pipeline setup. You only need to create a Lambda function which takes the incoming raw data as input, and returns the transformed data as the output. …


Image for post
Image for post
Source: Unsplash

Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you’ve already read my post about stemming of words in NLP, you’ll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let’s see what those are.

How is Lemmatization different from Stemming

In stemming, a part of the word is just chopped off at the tail end to arrive at the stem of the word. There are definitely different algorithms used to find out how many characters have to be chopped off, but the algorithms don’t actually know the meaning of the word in the language it belongs to. In lemmatization, on the other hand, the algorithms have this knowledge. In fact, you can even say that these algorithms refer a dictionary to understand the meaning of the word before reducing it to its root word, or lemma. …


Image for post
Image for post

Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects. If you’re new to this space, it is possible that you don’t exactly know what this is even though you have come across this word. You might also be confused between stemming and lemmatization, which are two similar operations. In this post, we’ll see what exactly is stemming, with a few examples here and there. I hope I’ll be able to explain this process in simple words for you.

Stemming

To put simply, stemming is the process of removing a part of a word, or reducing a word to its stem or root. This might not necessarily mean we’re reducing a word to its dictionary root. We use a few algorithms to decide how to chop a word off. This is, for the most part, how stemming differs from lemmatization, which is reducing a word to its dictionary root, which is more complex and needs a very high degree of knowledge of a language. We’ll talk about lemmatization in another post, maybe. For this post, we’ll stick to stemming and see a few examples. …


Originally published on my personal blog on February 11, 2020.

Image for post
Image for post

I want to start this post by bluntly saying that using a Spring Boot project as your Amazon Lambda function is a bad, bad idea, for so many reasons. I don’t want to get into that in this post. But sometimes, you can’t stop this from happening because you’re not calling the shots. Anyway, now that you are deploying a Spring Boot app to your Lambda function, let’s see how we can maintain properties in the project which are environment specific.

Reading properties in a Spring Boot app

If you have written any Spring Boot app, you’d know that you can maintain an application.properties or an application.yml file from which you can read all your properties inside a Java class using the @Value annotation. For example, let’s suppose you have the following properties in your .properties


Image for post
Image for post

More in The fastText Series.

Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we’re working with text as the data, there are a lot of words which we want to remove from the data to “clean” it, such as normalising, removing stop words, stemming, lemmatizing, etc. …

About

Sunny Srinidhi

Coding, machine learning, reading, sleeping, listening, potato. https://blog.contactsunny.com and https://www.linkedin.com/in/sunnysrinidhi/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store