Former Facebook Data Scientist Explains How to Wrangle Your Data

While the deep technical expertise of a data scientist is necessary on some projects, you don’t need one to build a culture of data-driven decision making. Here’s how to empower your team.

Please allow me to geek out for a minute and I promise I’m going to empower you with the tools you need to put Big Data to work for you and your business. This week I got the chance to meet Tye Rattenbury, a veteran of Facebook’s core data science team. For a nerd like me, that’s more exciting than meeting Mark Zuckerberg himself. Sure Mark Zuckerberg is the celebrity, but Tye was the data guy in the trenches who built the data pipelines and analyses that fuel Facebook’s data-driven culture.

So you can imagine how awesome it was to meet a Silicon Valley legend who’s joined an incredible company called Trifacta–whose mission is to democratize Big Data for people like you. For far too long Big Data has been the exclusive tool of elite enterprises with deep pockets. So this article is going to show you how you can get started FOR FREE with incredibly easy to use tools like Trifacta Wrangler.

Pepsi Eliminated 90% of Their Big Data Wrangling Time And So Can You!
If you talk to any data scientist worth their salt, they will tell you that the first challenge of putting data to work for your business is getting it into a structured format so that you can analyze, interpret and make decisions around your data. This is lovingly referred to as “Data Wrangling” and it’s what sucks up the bulk of the unproductive (wasteful) time (4 out of 5 days, by most accounts). That’s because instead of spending time understanding the data, you’re wasting time trying to pull it all together in a useable format. This is what usually creates the bottleneck in any organization.

Think about it. You want to combine your customer data from your CRM (think Salesforce.com) and your Marketing Automation (think Marketo or Hubspot) with your social data (think Facebook and Twitter) and ideally your point of sale and/or ecommerce sales data (think Magento). Then you want to pull in 3rd party data (think Experian and Infosys) to round out your insights. Each of these data sets have a different format and therefore the first order of business is to combine them into a single source where you can begin to query the data.

“Trifacta was founded on the principle of enabling analysts to simply and easily pull in disparate data sources and to structure and cleanse them,” explains Joe Scheuermann, Vice President of Marketing for Trifacta. “We launched Trifacta for just this purpose and have even distributed a free version so that everyone can take advantage of our technology.”

“Pepsi uses our paid version of Trifacta Wrangler Enterprise ,” says Will Davis, Director of Product Marketing, “and in doing so they have cut their data wrangling time by 90%. You can imagine what their team of data scientists and analysts can do now that they don’t have to waste their time wrangling the data they want to analyze.”

Learn From Facebook: Build a Culture of Data-Driven Decision Making
“Facebook has a culture that demands data driven decisions at every turn” explains Tye Rattenbury, Director of Data Science at Trifacta. “Most of those decisions were fueled by on-demand analyses queued up by product managers and answered by a huge variety of people–analysts, engineers and even the product managers themselves. They were fast and rough, but had enough validity to stand on. To support the breadth of speed of these analyses, Facebook built a wide array of home-grown tools (some built from scratch, others extended from open source projects). Today, forward looking companies are finding off-the-shelf tools that give them the necessary breadth and speed to be data driven.”

Said another way, while the deep technical expertise of a data scientist is necessary on some projects, you don’t need them to build a culture of data-driven decision-making. Masking the complexity is what companies like Trifacta are doing so that the business analysts and strategic thinkers are empowered to ask better questions and probe for insights derived from their (near) real-time data sets.

Once you have used a tool like Trifacta to get all of your data into one location and in a usable format, you now have a plethora of other free or cheap tools you can use to empower your team to make data-driven decisions. These include:

  1. Data Robot: An invaluable website that is chalked full of incredibly powerful algorithms. You no longer need to even decide which algorithm to use. Simply import your data and the machine learning takes over. Specifically, it suggests what algorithms might work best to analyze the data you have.

  2. Qlik: If you simply want to visualize your data, this is the best place to start. Data visualization tools such as Qlik help you create charts and graphs so that you can see what you might have otherwise missed buried deep in your data.

  3. RapidMiner and H2O: More “off the shelf” machine learning models and algorithms in a box you can use to dig into your data–without being a data scientist.

The Cost of NOT Making Data-Driven Decisions
“Statistically minded data scientists often argue that the data outputs in many of these quick analyses are wrong because the data is low quality or your assumptions about the data are faulty,” says Dr. Rattenbury. “The key is to wrangle your data to improve its quality, and, in the process, improve your working assumptions. That’s what data scientists do.”

And here’s the thing. If you’re not using data to drive your decisions today, these insights are a quantum leap forward than the gut check that most entrepreneurs use to guide their businesses today. Can you get better insights with the help of those highly sought after data scientists? Absolutely. But if you wait until you can afford a data scientist, you will be losing opportunities and market share to all of your competitors who decide to dive in and begin experimenting immediately.

Therefore the cost of NOT making data-driven decisions is that you are flying blind and eventually will either hit a wall or simply crash and burn. On the contrary, the ROI of making data-driven decisions is that you are well informed and are empowered to know rather than guess what’s happening in your business. This, in turn, leads you to actually stay in business. Today, there simply is no excuse for making uninformed decisions. The tools are freely available to you and your team and you no longer have to be a data scientist to take advantage of them.