Luigi – data pipelines for batch processing

Luigi is a data pipeline library written in python for batch processing jobs handling dependency management, workflow management, visualization and failures using few lines of commands.

Luigi – data pipelines for batch processing

Common Issues faced in Spark


There are several issues everyone faces when they start using spark either at their jobs or for fun. These issues come up every other day and finding easy solutions for them could be hard sometimes. I decided to come up with some common solutions that I faced and will be facing probably in future. Thanks to my teammates who helped me out in solving those issues, and thanks to the internet for sure.

Common Issues faced in Spark

Three things every Computer Science Project Group must do!

Are you a student of any computer science related field? If yes, then this article will guide you how to do your college projects that helps in getting a job. There are several things that usually students don’t do because they aren’t aware of or they don’t care, for example professional tools that are used in top companies, those tools are the basics of majority of tech companies.

Three things every Computer Science Project Group must do!

Renaming Part-NNNN Files on S3 from Spark

We have seen a big issue with Spark job, which is, it writes its output files with part-nnnn naming due to its distributed behavior, and its not possible to rename it directly before writing, or modifying the underlying functions is not that easy.

The only way to carry out this task is to do it on s3 directly once the file has been written. If you have few files you can do manually on the web interface of s3, but if you have many files in different folders, you can use S3 SDK.

Renaming Part-NNNN Files on S3 from Spark

Job guide for fresh graduates in AI and Data Science domain!


Hey guys, today I will share some tips for the people who are looking to start their career in Artificial Intelligence, Data Science and Data Engineering fields. The purpose of this article is to share some quick tips that help when looking for a job in this domain, there are few things that really counts in this area.

Job guide for fresh graduates in AI and Data Science domain!

These fields especially non-engineering require high experience, and companies look only for PHDs and for sure you cannot get that job in the start of your career, that’s the trend in big established companies.  But one can easily look for these jobs in startups, they have low requirements but a lot of room for learning and definitely a great start to a career.

Finding relationships among stores using Apriori Algorithm

As seen in our previous article, Association Rule Mining is a great way to solve problems, but it was computationally expensive, to reduce this expense there is a simple solution known as Apriori Algorithm. In this article, we will see how this algorithm works with an example.

Finding relationships among stores using Apriori Algorithm

Finding relationship among items using Association Rule

Association Rule is a rule based machine learning algorithm. It is a great way to find how things relate to each other, the easiest example we see is among food items like milk, bread and eggs. That example answers a simple question for a business, that is how to increase their sales plus make life easier for their customers.

This solution helps in lot of ways in different industries. Let us see how this problem answers the above question, which will help us to know how we can implement this methodology in our business.

Finding relationship among items using Association Rule