StatOfMind


A melting pot of statistics, machine learning and data vizualization.

Building a Kafka and Spark Streaming pipeline - Part I

Many companies across a multitude of industries are currently maintaining data pipelines used to ingest and analyze large data streams. In effect, the proper implementation of such pipelines belongs to the realm of “data engineering”, and represents a gateway to interesting data science-related problems. Traditional machine learning methods have been...
Read more...


A map of elevators in NYC

Not too long ago, I came across a random tweet pointing to a GitHub repository full of miscallaneous datasets. I was imemdiately excited by the various data science and visualization problems tha I could take on and immediately decided to get my hands dirty. One of the first datasets that...
Read more...


The biggest liars in US politics

Who lies the most in US politics? Most Americans, and anyone that follows US politics, will be aware of the tremendous changes and volatility that has struck the US political landscape in the past year. The ascent of Donald Trump from a billionaire entertainer to a fully fledged presidential candidate,...
Read more...


Data science with Docker

Using docker to facilitate your data science pipelines Until recently, and like many other fellow data scientists I have talked to, I built data science pipelines on my local machine or a remote host while relying on virtual environments. In doing so, I ensured some degree of replicability by keeping...
Read more...


Player and roster similarity in the NBA

Recently, professional sports associations and teams have made big strides towards leveraging data to inform both personel and on-the-field decision making. While the four major leagues (NBA, NFL, MLB, NHL) vary in terms of where they are in that process, most people would argue that the NBA is at the...
Read more...