This year my project Effect Plot and PCA Visualizer got selected for GSoC’19 under NumFOCUS:: Yellowbrick. I will be spending my summers interacting with the awesome community of Yellowbrick, making new friends and coding up open-source projects and helping community as a whole. This blog will take you through my journey to become a student developer with Google Summer of Code!

After spending numerous nights looking for the perfect organization, I shortlisted a few organizations which synced with my skills. Selecting an organization having a perfect match with your skills is key to selection to GSoC. I went through the plethora of projects with these organization and settled to a project with NumFOCUS:: Yellowbrick. I spent a few days to understand the nuts and bolts of the library and the project I was dealing with. Once I gained enough knowledge, I began working on my proposal. Toiling-hard for a month, solving issues with mentors and working on minute detail to perfectly present my proposal, I managed to submit the proposal a day before the final deadline. The one very important thing that I learned from mentors was that they were not looking for a perfectionist. One needs to express his understanding of the project in the proposal along with a well-planned timeline to guide you through the GSoC period.

I waited for a month to reap what I sowed. I had my spirits high and had a gut feeling that I had a good chance for selection to GSoC. Working on a project under GSoC was a dream that I had since my first year which finally came true when I received a mail from Google congratulating me for getting selected for GSoC’19. I plan to code to design and optimize new visualizer for Yellowbrick library with the help of my mentor Mr. Adam Morris. These series of blogs will be used to share my experience throughout the GSoC coding period.

My GSoC Organisation

I will be working with NumFOCUS for my GSoC’19 project. NumFOCUS is an umbrella organization supporting open practices in research, data, and scientific computing. Most of the major libraries in the field of data science and python are affiliated to NumFOCUS.

Among 15 organization that participated in GSoC’19 under NumFOCUS umbrella, I will be working for Yellowbrick. Yellowbrick is an open source library which provides visual steering and visual diagnostics of the models. It helps in analyzing the model performance visually which makes Yellowbrick unique in its own way. It wraps the scikit-learn ML models with Matplotlib to achieve its goal. It also proves quite useful in Exploratory Data Analysis. With its wide coverage from EDA to acting as the ML model while providing visualization for model performance, it will surely become a part of every data scientist’s day to day ML processes.

GSoC Project

My area of focus during this GSoC period will be to help my mentor and community to maintain and upgrade the Yellowbrick library. As stated earlier I will be working on Effect Plots and strengthening PCA visualizer. The former helps in understanding the feature importance for linear models while the latter helps in reducing the dimensionality of data while retaining the trends. You will learn more about them in my upcoming posts.

Community Bonding

The community bonding period will be ending tomorrow. It is a month-long period when a student spends time learning more about the project and interacting with the community. A student is expected to get a clear vision of the workflow of the project during this period.

I got in contact with my mentor through an initial mail followed by a slack call. He introduced me to the Yellowbrick team and I received a warm welcome from them. I worked on some of the existing issues to get familiar with the coding styles. The maintainers and the mentors were kind enough to guide me and correct me when I made a mistake. My GSoC project will be divided into the following major categories with effect plot being the primary focus followed by PCA component strengthening.

I am looking forward to awesome summer coding my way through to contribute to Yellowbrick, making visualization easier and efficient!