The hard things about being a data scientist in marketing

Hint: Not the code.

Most Data Scientist job descriptions are increasingly loaded with requirements of technical skills spanning areas of machine learning, programming, tools and statistical knowledge. Candidates are constantly playing catch up with these requirements by loading their resumes with every known term in the data science vocabulary to improve their chances of being matched, ironically enough by a machine learning driven program.

In my limited experience of being a practitioner in this line of work, the tasks I have found hardest to do are not choosing a modeling technique or writing the code to fit a model or visualizing the results obtained. It is a different set of competencies which can make or break chances of success on a project.

So what are these hard things about being a data scientist in marketing I refer to?

Framing the business questions

For those who’ve been in this trade for a while or enlightened souls in general, this probably sounds obvious. And yet, it is very easy to kick-off projects with business owners without asking questions like:

  • What’s the hypothesis?
  • What is the final use case?
  • What recommendation can we make based on this analysis?

Brand managers, planners and strategists usually come in with specific requests. For them, the context, the business objective and the critical path to reaching to the answer may be clear. Or maybe not. In any case, extracting that context, reiterating and capturing the business questions and using them as a guide post throughout the execution of a project are by far the most critical steps for the success of a data science project. At the very least, it minimizes chances of producing something irrelevant. At its best, doing this greatly improves the impact of the work. It makes the difference between a “Oh thanks, good to know” vs. “This is fantastic stuff! We are learning new things here” type of response.

Assembling the data

Freshly minted data scientists sometimes believe that the perfect data to build models with is one sql query away or just exists in the ionosphere and can be downloaded using an API or is something the professor will share via email before the assignment has to be turned in. Rated on the scale of quality of data, marketing and consumer Insights is one of the most challenging fields to work in for an analyst. I am pretty sure I am self victimizing a little when I say this but I have good reasons.

Reason 1: Where is the data?

Unlike controlled research projects and surveys, the owners of the data pipeline are usually not the data users in the marketing function. If you are lucky, you may be sitting very close to them (like I do) but still have different priorities. If you are unlucky, you could be working with syndicated data aggregated at a monthly level — so you get 12 data points for a brand in one year! Good luck churning awe-inspiring insights with that. If you are really unlucky, you’ll hear a floating myth that the perfect data for your project exists but is buried deep in a forgotten part of the client side corporate universe (usually the first party data capturers) and will take 200 emails and a lot of lobbying to obtain. No, I’m not being cynical. When you add to this territoriality, corporate silos and a natural human tendency to be possessive about precious stuff, this problem becomes very real.

Reason 2: Of Hierarchies & Taxonomies

Marketing use cases entail bringing together data from various sources (first party vs third party, real time vs historical, surveys vs ad servers to name a few) which tend to have no universal key. There is no universal taxonomy in marketing world either that allows us to join across various datasets — from sector to top level categories to subcategories to brands variants and sub-brands — every data vendor / internal team is following different conventions which are not explicitly suited for use cases that require combining these datasets together. There is no single purveyor of hierarchies of categories, brands and their products and as a consequence, there is no single definition of how two things can be compared. The context within which two brands can be compared on a performance metric can only be agreed upon with the users of your work. It is avoidable by making sure that “What’s comparable?” is documented carefully. This means a lot of project organization work and time spent in re-aggregating datasets to compute metrics. It can take days, weeks or even months sometimes. The only thing you can do is to stay practical from a timeline standpoint. This brings me to the next hard thing about being a data scientist in general but more so in marketing.

Right Timing & Scope

I call this the “quicksand of analysis”. It goes back to the the conflict between perfect vs. practical! Trying to produce business relevant insights, running against the clock using not so perfect datasets in a charged office atmosphere where data scientists most often have the lowest decibel voice on the table is in itself a significant challenge. In these circumstances, using judgement and communicating with other stakeholders to know how far you need to take your work to make it impactful is of utmost importance. Back to the adage about a reasonable approximation in time being worth a lot more than the perfect answer which comes too late. A few tips to help avoid the quicksand of analysis :

  • Always keep sight of the original questions and use cases through the chase and reduce the size of the data based on what is required to get to the answers
  • Define a drill-down structure of to the project i.e. a decision tree based on findings at each stage of the analysis. Though this sounds like a complete no-brainer, it’s something we often skip. The consequence being, an inordinate amount of time is spent exploring threads and branches which don’t help us answer business questions any better. A simple thought experiment which goes something like:
  • (a) What do you expect to find at the top level? (b) What should you drill into based on the top level finding? © What happens if the finding does not fit with the hypothesis? (d) When to change tracks or pull the plug on the project?
  • If not at the very start, somewhere after initial data exploration, put together a skeleton presentation or storyboard for the analysis in line with the scope discussed with business owners.

Of course there are other hard things like choosing the right analytical framework, models & statistical technique, parameter tuning, interpreting results, coding challenges along the way, using myriad APIs and complicated SQL, scaling on large datasets that don’t fit in memory, dashboarding the output, making reusable code, working on the linux command line etc. etc. For all these and many such, there are reference books, blogs, public git repos, mentors, ML APIs and bootcamps. But for the things I mentioned earlier, experience, judgement and good managers are key. My advice to data scientists who want to make a career in the marketing industry is: Keep updating coding & technical skills but more importantly, start speaking the marketing language, learn to ask the right business questions, practice working through the organization structure to identify and obtain the right data. These often overlooked skills will serve us better in present but more importantly in the long run. Agree?