InduNet v1.0 — An industry predictor using company descriptions

Natural Language Processing (NLP) is a large area of research with many relevant applications for businesses. Being able to take in arbitrary text and extracting sentiment, performing translation, auto-suggest/correct are some typical use cases seen. But the applications are of course endless.

Read More

Bootcamp & beyond

Created using between The Great Wave of Kanagawa and one of my paintings.

“How did you become a Data Scientist?”

I get asked this quite often and thought I’d finally put my story into writing. My aim is to give you a few lessons I learned along my path and my leap into data science. Overall, I think taking that leap is one of the best decisions I’ve ever made.

Studying econometrics in university, I analyzed the variables for what predicts a person’s income where the largest factor was… parent’s income. Wait, what? Being born into poverty to 18 year old parents, I began unraveling my conceptions of the American Dream. Does this mean I’m also destined to the same struggle as my parents? Digging further, I found that only 12% of people like me graduated university.¹ Makes sense as I also worked full-time and didn’t really sleep. Also, I estimated approximately 5% of other Data Scientists are Economics Undergraduate like me.² These estimates lead me to believe about 1% of Data Scientists are similar in this way.
I highlight these external factors because I think I’m incredibly lucky in a multitude of ways. I’ve known about these base rates and have actively worked toward moving out those unfavorable base rates.
This is the first lesson: Understand your base rates. Given your background and specific challenges, what are the odds? Are they in your favor or against you? If the odds are against you, investigate in what ways you can improve your chances of success. Even if shown “improbable,” this doesn’t necessarily mean impossible.

I always loved mathematics, logic, and critical thinking. Although I didn’t know it at the time, those were the prerequisites to data science. However, my excitement for data science was met with a sales role in the San Francisco Bay Area. My sales role allowed me to explore some Diet Analytics in the Consumer Packaged Goods space. But, I needed out.
After the sales rotation program, I moved into underwriting commercial loans. At the time I thought “Banks pay more” but found the path to analytics too long in banking. I’d have to underwrite loans for 2+ years before transitioning into an Analyst role — not even data science. After being out of university for 2 years, I made the leap: Data Science Bootcamp.
Bootstrapped into Bootcamp
Aced the initial take-home analysis on Kickstarter campaigns. Prepared my nascent coding skills with the Bootcamp’s online prerequisite coursework via Dataquest. I was ready for the Data Science Immersive — a full-time, 3 month program.
The sixth Data Science Immersive cohort [DSI-6] was a group of 9 scrappy professionals looking to cut one’s teeth into an industry projected to grow by 15% over 10 years.³ Even though I’m writing this years later, most of us stay in touch with one another. Lessons were 9–5, but we stayed until the campus closed. Weekends were a thing of the past, too. It was intense. It was fun. I took out a private student loan.
This was a significant risk since I had some savings, no safety net from my parents, and took on additional debt. I planned a burn rate of 8 months — including the bootcamp.
This is the second lesson: Understand your finances during and after your program. I strongly, strongly recommend bootcamps that have an incentive mechanism for you to get hired. Whether it’s a percentage of your future salary like Lambda School or pay $0 if you don’t find a job in 6 months like Springboard. Ultimately, you want your bootcamp program to be directly invested in your initial success.

The 80 hours per week of coding, debugging, and constant learning concluded. Since a small trickle of positions are posted vs what you want to apply for, most of the job search is apply/network with a lot of “wait & see”. And during my free time I decided to try out a Kaggle competition with a friend from DSI-6. Kaggle is a competition platform where a company crowdsources a problem for data scientists to achieve an optimal score (usually lowest error in some flavor). After a couple weeks of “apply/network, wait & see,” I met the Director of Engineering for a commercial real estate startup. Luckily, the Kaggle competition focused on real estate — The Zillow $1 Million Prize.
During the day of my interview, I cranked out several models lowering my error rate to 0.0745 (vs 1st place of 0.0732). Then I sprinted from Philz on Market Street for several blocks wishing XGBoost didn’t take so long. Right after the presentation with the CTO, the Director, and Principal Architect, I was offered a job on the spot. Worth it.
This is the third lesson: Build out your portfolio. Find data sets where you have a genuine interest and want to learn more about that specific problem. What sort of assumptions do you have to make? How did you clean and reformat the data? How would you productionize your code? My Bootcamp Capstone predicted venture capital funding (purposefully general) then I predicted real estate values. Aim to make luck and opportunity meet.
Caveat to Lesson 3: I’d recommend against Kaggle generally since the work involved doesn’t fully apply to “real world” problems. Optimizing a single metric severely “overfits” the data with extremely slow models. In my Zillow example, I blended 3 computationally intensive models’ outputs into a final regression. It’s not clear you’d ever need to do this in your job.

These are the three most important lessons I can give for transitioning from Bootcamp to Data Scientist.
Know your base rates and actively improve your odds
Choose a Bootcamp that is directly incentivized to get you hired
Build your portfolio by making luck and opportunity meet
Next, I’m going to write about my initial roller coaster ride of being a Data Scientist. Sneak peek: The commercial real estate startup failed. Until then, I’m curious to hear about your experience looking around Data Science Bootcamps. What principles and lessons did you learn along the way?

Read More

arXiv now allows researchers to submit code with their manuscripts

Papers with Code today announced that preprint paper archive arXiv will now allow researchers to submit code alongside research papers, giving computer scientists an easy way to analyze, scrutinize, or reproduce claims of state-of-the-art AI or novel advances in what’s possible.

Read More

Deep Learning is already dead: Towards Artificial Life with Olaf Witkowski

In his own words, Witkowski says, “artificial intelligence means that you are trying to copy human intelligence as best as possible. Artificial life says, okay, that’s good, but let’s try to understand human intelligence and recreate it from the fundamental knowledge we have acquired. It’s more constructive. It’s a bit like the Richard Feynman quote: what I cannot create, I do not understand.”

Read More

Stock trend prediction from News Sentiment

Companies sell their shares on the stock market, putting the company squarely in the public domain. While the impact on stock value has various causes and effects, a big factor in price change is the way a company is perceived. Sentiment from news can be used as an predictive indicator of trend. In tis article we give a brief overview of how we analyze sentiment.

Read More
1 2 3 11