Primary reason LinkedIn users are active on the platform for job recruitment efforts. With more than 20 million companies listed on the site and 14 million open jobs, it’s no surprise to find out that 90% of recruiters regularly use LinkedIn.Read More
Natural Language Processing (NLP) is a large area of research with many relevant applications for businesses. Being able to take in arbitrary text and extracting sentiment, performing translation, auto-suggest/correct are some typical use cases seen. But the applications are of course endless.Read More
Wrapper methods evaluate multiple models using procedures that add and/or remove predictors to find the optimal combination that maximizes model performance.Read More
Understanding political ad campaigns through the Illuminating project. New technologies, architecture and processes applied to the 2020 electoral landscape through the project started in 2014 for the gubernatorial state elections..Read More
Thousands of companies around the world, from small startups to global corporations, find great value in improving the performance of their supervised or unsupervised ML models. However, all of them seem to… focus on mainly two things…Read More
Created using Deepart.io between The Great Wave of Kanagawa and one of my paintings.
“How did you become a Data Scientist?”
I get asked this quite often and thought I’d finally put my story into writing. My aim is to give you a few lessons I learned along my path and my leap into data science. Overall, I think taking that leap is one of the best decisions I’ve ever made.
Studying econometrics in university, I analyzed the variables for what predicts a person’s income where the largest factor was… parent’s income. Wait, what? Being born into poverty to 18 year old parents, I began unraveling my conceptions of the American Dream. Does this mean I’m also destined to the same struggle as my parents? Digging further, I found that only 12% of people like me graduated university.¹ Makes sense as I also worked full-time and didn’t really sleep. Also, I estimated approximately 5% of other Data Scientists are Economics Undergraduate like me.² These estimates lead me to believe about 1% of Data Scientists are similar in this way.
I highlight these external factors because I think I’m incredibly lucky in a multitude of ways. I’ve known about these base rates and have actively worked toward moving out those unfavorable base rates.
This is the first lesson: Understand your base rates. Given your background and specific challenges, what are the odds? Are they in your favor or against you? If the odds are against you, investigate in what ways you can improve your chances of success. Even if shown “improbable,” this doesn’t necessarily mean impossible.
I always loved mathematics, logic, and critical thinking. Although I didn’t know it at the time, those were the prerequisites to data science. However, my excitement for data science was met with a sales role in the San Francisco Bay Area. My sales role allowed me to explore some Diet Analytics in the Consumer Packaged Goods space. But, I needed out.
After the sales rotation program, I moved into underwriting commercial loans. At the time I thought “Banks pay more” but found the path to analytics too long in banking. I’d have to underwrite loans for 2+ years before transitioning into an Analyst role — not even data science. After being out of university for 2 years, I made the leap: Data Science Bootcamp.
Bootstrapped into Bootcamp
Aced the initial take-home analysis on Kickstarter campaigns. Prepared my nascent coding skills with the Bootcamp’s online prerequisite coursework via Dataquest. I was ready for the Data Science Immersive — a full-time, 3 month program.
The sixth Data Science Immersive cohort [DSI-6] was a group of 9 scrappy professionals looking to cut one’s teeth into an industry projected to grow by 15% over 10 years.³ Even though I’m writing this years later, most of us stay in touch with one another. Lessons were 9–5, but we stayed until the campus closed. Weekends were a thing of the past, too. It was intense. It was fun. I took out a private student loan.
This was a significant risk since I had some savings, no safety net from my parents, and took on additional debt. I planned a burn rate of 8 months — including the bootcamp.
This is the second lesson: Understand your finances during and after your program. I strongly, strongly recommend bootcamps that have an incentive mechanism for you to get hired. Whether it’s a percentage of your future salary like Lambda School or pay $0 if you don’t find a job in 6 months like Springboard. Ultimately, you want your bootcamp program to be directly invested in your initial success.
The 80 hours per week of coding, debugging, and constant learning concluded. Since a small trickle of positions are posted vs what you want to apply for, most of the job search is apply/network with a lot of “wait & see”. And during my free time I decided to try out a Kaggle competition with a friend from DSI-6. Kaggle is a competition platform where a company crowdsources a problem for data scientists to achieve an optimal score (usually lowest error in some flavor). After a couple weeks of “apply/network, wait & see,” I met the Director of Engineering for a commercial real estate startup. Luckily, the Kaggle competition focused on real estate — The Zillow $1 Million Prize.
During the day of my interview, I cranked out several models lowering my error rate to 0.0745 (vs 1st place of 0.0732). Then I sprinted from Philz on Market Street for several blocks wishing XGBoost didn’t take so long. Right after the presentation with the CTO, the Director, and Principal Architect, I was offered a job on the spot. Worth it.
This is the third lesson: Build out your portfolio. Find data sets where you have a genuine interest and want to learn more about that specific problem. What sort of assumptions do you have to make? How did you clean and reformat the data? How would you productionize your code? My Bootcamp Capstone predicted venture capital funding (purposefully general) then I predicted real estate values. Aim to make luck and opportunity meet.
Caveat to Lesson 3: I’d recommend against Kaggle generally since the work involved doesn’t fully apply to “real world” problems. Optimizing a single metric severely “overfits” the data with extremely slow models. In my Zillow example, I blended 3 computationally intensive models’ outputs into a final regression. It’s not clear you’d ever need to do this in your job.
These are the three most important lessons I can give for transitioning from Bootcamp to Data Scientist.
Know your base rates and actively improve your odds
Choose a Bootcamp that is directly incentivized to get you hired
Build your portfolio by making luck and opportunity meet
Next, I’m going to write about my initial roller coaster ride of being a Data Scientist. Sneak peek: The commercial real estate startup failed. Until then, I’m curious to hear about your experience looking around Data Science Bootcamps. What principles and lessons did you learn along the way?
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. Golang or simply ‘Go’ made its first appeared 10 years ago. It was developed at Google as a general-purpose language.Read More
Papers with Code today announced that preprint paper archive arXiv will now allow researchers to submit code alongside research papers, giving computer scientists an easy way to analyze, scrutinize, or reproduce claims of state-of-the-art AI or novel advances in what’s possible.Read More
A step-by-step guide for maintaining project dependencies clean and reproducible. Virtual environments are a must when developing software projects. They allow you to create self-contained, isolated Python installations that prevent your projects from clashing with each other and let other people reproduce your setup.Read More
Let’s be clear from the start, according to google: Discrimination — The unjust or prejudicial treatment of different categories of people, especially on the grounds of race, age, or sex. Bias — inclination or prejudice for or against one person or group, especially in a way considered to be unfair.Read More
Considering the Data Science Life Cycle as a life cycle enables a natural consideration of crucial overarching factors such as reproducibility, documentation and meta data, ethics, and archiving of research artefacts such as data and code.Read More
In his own words, Witkowski says, “artificial intelligence means that you are trying to copy human intelligence as best as possible. Artificial life says, okay, that’s good, but let’s try to understand human intelligence and recreate it from the fundamental knowledge we have acquired. It’s more constructive. It’s a bit like the Richard Feynman quote: what I cannot create, I do not understand.”Read More
Hypothesis tests are significant for evaluating answers to questions concerning samples of data. What is the value of hypothesis testing to AI models.Read More
One good thing because of the internet is the emergence of E-commerce websites that are so popular that millions of people visit these sites and order their products. This huge data created by all these people cannot just be analyzed by their employees anymore. They need to take help of data science.Read More
There are many tasks in NLP from text classification to question answering, but whatever you do the amount of data you have to train your model impacts the model performance heavily.Read More
Companies sell their shares on the stock market, putting the company squarely in the public domain. While the impact on stock value has various causes and effects, a big factor in price change is the way a company is perceived. Sentiment from news can be used as an predictive indicator of trend. In tis article we give a brief overview of how we analyze sentiment.Read More
As investment management firms evolve, what is the role of Artificial Intelligence? How is AI used? Let’s find out how AI is transforming investment management.Read More
AI research is making ever greater and ever faster advances. Is a good AI more important than data protection?Read More
Data analytics do not always require complicated programming. Applications can be achieved sometimes in a simpler way.Read More
Quick guide to creating professional plots using Matplotlib.Read More