/img/sadanand.jpeg

Sadanand Singh

Principal Research Scientist @whiterabbit.ai, Amateur Photographer, and a Cook

A Practical guide to Autoencoders using Keras

Usually in a conventional neural network, one tries to predict a target vector y from input vectors x. In an auto-encoder network, one tries to predict x from x. It is trivial to learn a mapping from x to x if the network has no constraints, but if the network is constrained the learning process becomes more interesting. In this article, we are going to take a detailed look at the mathematics of different types of autoencoders (with different constraints) along with a sample implementation of it using Keras, with a tensorflow back-end.

My Arch Linux Setup with GNOME 3

If you have been following me on this space, you would have known by now, I am very particular about my computers, its operating systems, looks, softwares etc. Before you start getting any wrong ideas, my love for Arch Linux is still going strong. However, I have moved on to Gnome 3 as my choice desktop. This post is an updated version of my previous post with the latest configuration of my machine.

Sublime Text Setup

I have been using Sublime text as my primary editor for some time now. Here I wanted to share my current setup for the editor including all settings, packages, shortcut keys and themes. Note This post has been updated with my latest sublime settings. Packages First thing you will need to install is the Package Control. This can be easily done by following the directions at their installation instructions.

Understanding Boosted Trees Models

In the previous post, we learned about tree based learning methods - basics of tree based models and the use of bagging to reduce variance. We also looked at one of the most famous learning algorithms based on the idea of bagging- random forests. In this post, we will look into the details of yet another type of tree-based learning algorithms: boosted trees. Boosting Boosting, similar to Bagging, is a general class of learning algorithm where a set of weak learners are combined to get strong learners.

A Practical Guide to Tree Based Learning Algorithms

Tree based learning algorithms are quite common in data science competitions. These algorithms empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. Common examples of tree based models are: decision trees, random forest, and boosted trees. In this post, we will look at the mathematical details (along with various python examples) of decision trees, its advantages and drawbacks. We will find that they are simple and very useful for interpretation.

Understanding Support Vector Machine via Examples

In the previous post on Support Vector Machines (SVM), we looked at the mathematical details of the algorithm. In this post, I will be discussing the practical implementations of SVM for classification as well as regression. I will be using the iris dataset as an example for the classification problem, and a randomly generated data as an example for the regression problem. In Python, scikit-learn is a widely used library for implementing machine learning algorithms, SVM is also available in scikit-learn library and follow the usual structure (Import library, object creation, fitting model and prediction).

Switching to Hugo from Nikola

I have been using Nikola to build this Blog. Its a great static site build system that is based on Python. However, It has some crazy amount of dependencies (to have reasonable looking site). It uses restructured text (rst) as the primary language for content creation. Personally, I use markdown for almost every thing else - taking notes, making diary, code documentation etc. Furthermore, given Nikola tries to support almost everything in a static site builder, lately its is becoming more and more bloated.

An Overview of Descriptive Statistics

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons: Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.