12 Jul

The 5 Tips and Tricks for Communicating in Data Science

Written by

If you search for top skills that data scientists need, you will find communication in the top five or ten in every post, right up there with technical skills...


If you search for top skills that data scientists need, you will find communication in the top five or ten in every post, right up there with technical skills like using R or SQL. In my experience working at Salford Systems, a very large part of my time is spent understanding what pieces of information my team or our customers can really connect with and finding ways to show how data can be used with different tools to gain insights. I enjoy getting to write blog posts and contributing to presentations and customer interaction—in short, turning my work into usable content for others. I work with statisticians and marketing professionals, and together we look at problems from finance to academia and everything in between. Although other data scientists describe these activities as important aspects of their jobs, many forums for industry knowledge offer little advice on how to use communication to work with diverse teams and to develop insights. Our ability to offer insights and to bridge gaps between business and technology are what we are ultimately paid to do. These abilities add value by taking data and technology and turning it into actionable knowledge that people can apply to their daily operations. These key abilities would not be possible without good communication. Here are a few tips to advance your communication skills to the next level and begin honing those skills for the dynamic day-to-day of a data scientist.

  1. Jump Right In!

Sometimes you just need a quick and dirty solution to get an output to plot and make rough figures. Other times you really can’t tell if you are on the right track until you get something to run and you start looking at how your data and models are behaving. Beyond clearing the initial path to fine-tuning a model, getting something that works gives you a starting point for discussions which can lead to useful findings more quickly. Recently I read this blog post about applying data science to quality assurance from an economic point of view. Here we have a simple classifier or filtering problem applied to inspection of parts as they are manufactured. The final model sets thresholds for obviously good parts and obviously bad parts and then leaves space in between the “good” and “bad” cut-offs to select parts for further inspection by real people. This proposed solution is relatively simple, but even in this hypothetical case it would be hard to arrive at the proposed solution without using the initial model to discuss real factors like the costs of misclassifying parts or spending man-hours on human oversight.

Starting with the simplest case and a clear idea of how your work will be used will help you quickly understand your data and make progress by actually working with it. This will ensure that you reach useful and presentable conclusions faster, and will increase the returns of subsequent work with the data. Initial results will allow you to demonstrate the value of improvements you incorporate by making direct comparisons with a baseline solution. Having a response or performance metric that you can explain to others also opens up opportunities for you to work with technical and non-technical professionals as opposed to trying a battery of modeling approaches and digging into the idiosyncrasies of the project on your own. Overall, getting some sort of results, or even errors, will inform your next step, and give you a better frame of reference on the problem for yourself and others. 

  1. Actively Manage Context

Delineating relationships and incorporating them into your approaches is key to implementing viable solutions to real problems. Without facilitating communication among groups, your project can quickly unravel. Even challenges that seem to be purely technical always exist within the context of real business needs, real people, and real time constraints.

I have worked on projects with collaborators who all had their own motives, skills, and workloads in science. Research projects always appear highly technical on the surface, but I often had to control context to work with other researcher groups and to use resources from various laboratories. In one situation, we were having trouble with quality of DNA samples. When someone had to do quality control experiments, they were directed to use an instrument in another laboratory. Because of this, quality control experiments were not done with consistency since the laboratory manager in charge of the equipment was not comfortable with everyone using it.

To fix the problem, I had to perceive and communicate these relational issues to take control of the context. I framed the problem from the point of view that we want to have reliable sources of data and not cause undue stress on our interns who may be directed to use the equipment. Doing quality control really meant negotiating terms of use with a laboratory manager. He trusted a few people in our laboratory, but did not appreciate when new people came over to use equipment unannounced. Sending new interns without notifying him communicated a lack of respect. Additionally he had no way to know if new people were trained in the proper use and care of his instruments. By understanding the laboratory manager’s point of view, and our needs for consistent access to instruments, I was able to work on a solution to what seemed like a technical problem we had with sample quality. What we really needed was to clarify our relationship with the other laboratory or source another instrument so that our researchers would not feel hesitant about using equipment for quality control. 

As a data scientist I luckily haven’t had to tread lightly around relationships to use million dollar equipment. However, I do work with people in marketing, statisticians, and professionals in research and business. For consulting projects and technical support we often have to manage context to get the right information needed to help our customers. It is important to ascertain the relationships clients need to navigate for approval to share of data or to schedule conference calls with the real influencers on their project.  For internal tasks like scraping data, we need to keep each other on board with the intended end users for new datasets and models. We also present our requests within the context of time spent on getting results for each other in other areas, or balancing needs of other departments and customers. Keeping track of relationships gives you the full picture of stumbling blocks to success and allows you to reframe questions and communicate from a place of understanding. Empathizing with people puts you on the same team with them, so they can help you understand the best approach and you can work with them to achieve their best results.

  1. Keep the Details Lean

A key to generating actionable knowledge is making sure the intended parties can understand and process it. I first learned this lesson in academic research where you often present a review of the problem, the motivation, and the key technical concepts before going on to discuss your methods. Even for a highly technical audience, concise descriptions are necessary to keep presentations and publications within length constraints and to keep the audience on track. In my graduate program our seminar adviser, a seasoned cancer biologist, would often chastise us for including “alphabet soup,” a jumble of pathways and protein or gene names, in our slides. Covering the things that interact with a protein or gene is important, but proteins and genes have a funny way of interacting with an insane number of other factors as shown below:

jul12 1

The entire scope of a project can usually fit into a more “simplified” figure like this:

jul12 2

Even for a well-studied and intricate pathway like the one above, the simpler schematic could still be too complex if you only did experiments with one or two of the proteins in the pathway represented. Even trained molecular biologists would miss the proteins you want to focus on in the jumble while wondering about another protein in the corner that they used to research. Giving a review of a technical concept should quickly bring those unfamiliar with the material up to speed while cueing audience members with more expertise to focus on particular aspects central to your work.

My projects now involve several statistical methods, sophisticated software, some of my own scripting, and domain-specific data. Data science involves technical details from multiple disciplines and reports are usually prepared for a more results-oriented audience. For an upcoming presentation, I am discussing how to use several machine learning engines for regression with poll data. In my first edits I strove to give complete, technical descriptions of each learning machine as well as ordinary least squares regression. I detailed the derivation of ordinary least squares, leading the audience through equations behind the method as an important part of the talk. However upon reviewing what I really wanted to do with my presentation, I realized that this approach gave the wrong signals and distracted from the bigger themes. People with statistical backgrounds would have scrutinized how the methods were presented and paid less attention to the bigger lessons about applying and comparing different methods to a new area and gaining insights on voting trends. Less technical audience members would either tune out at the first sighting of equations or fret neurotically over copying and understanding the mathematical underpinnings of each approach.

Diving into details that are outside the scope of the immediate issues can confuse your audience about what the main topic is. An audience member with more technical expertise may have questions about finer details that have important effects on your models and analysis, but you really don’t do them any favors by inviting those discussions before providing all the information about how the details play into the final results. Bringing up details that are not directly involved in interpreting your findings beforehand draws attention away from the insights both technical and non-technical audience members came to see. Springing fine details on people without showing how they work in the bigger picture first also denies the audience the chance to develop and frame their questions within the context of how the audience will actually use your work. In addition to the complexity of data science overall, data scientists and analysts often have to adjust the presentation of technical approaches to lead with more conclusions and emphasize the big picture more heavily. In the business world you are usually being paid to get results quickly and not to teach. You have to lead with your findings, any big conclusions or problems, and then follow up details and questions once you have laid out the situation for a room or auditorium full of people who work on different projects and come from different backgrounds.

  1. Always Be Summarizing

Finding places to insert summaries helps keep your audience on track. When I delivered my first attempt at a webinar here at Salford, our senior scientist really wanted to see more indications of milestones and summaries as we progressed through different methods and concepts. I had designed my slides according to my experience in research. I’m used to a technical audience paying close attention and waiting to make conclusions or to tear the presentation apart. Our senior scientist pointed out that you can’t count on attention spans when presenting at large meetings, conference talks, or webinars. Not only will these people have a lot on their minds, but they may take a break to get coffee, nod off due to jet lag and long hours, or open other windows over your webinar to catch up on email. In some work environments you may get cut off with questions or asked to speed things up and get to the point. Constantly summarizing the motivation, the previous steps, and the big take-aways for every part of your work leaves an entry point open for members of your audience who missed a particular section because they did not understand it or were physically or mentally absent.  

The more you practice making summaries, the better you will understand your project. In the end it will strengthen your grasp of your own work while building your flexibility. I like to see making summaries as practice for myself. Summaries are a chance to ask myself, “What am I really doing? How is it important and what’s the best way to communicate that?” Anytime someone asks what I do for work or anytime I have to write something up or make slides, I view it as an opportunity to look at my projects from a different perspective. With constant practice like this, you will inevitably be able to communicate a quick, 2-minutes-or-less-summary of what you are trying to do at work when you’re at a bar with friends, in the hotel lobby at a conference, or giving a presentation to any audience. Whether a CEO cuts you off 5 minutes into your presentation to “get to the point,” or you need to reel back in the attention of a sleepy conference-goer, you will be ready with a strong understanding of why your project is important as well as a practiced and engaging delivery—tested in every professional and candid situation possible.  

  1. Bring it Back Home

Whether you are talking about a particular model or analysis, or describing how a more general approach works, always bring your points back to your main topic and application. If you open any textbook about machine learning or statistical methods you will see descriptions accompanied by examples. Even professors with highly motivated and educated students provide examples and datasets to work with. Keeping methods and analyses rooted in concrete examples and real situations is not hand-holding or patronizing—it is an effective way for both you and your audience to develop a concept through framing discussions and helping to visualize properties of systems.

Recently I generated synthetic datasets to give examples of dealing with non-linear and local trends. My colleagues suggested tying all examples back to the topic of my talk, and I realized that some of the most interesting findings from my datasets were changes in local trends that my different learning engines capture in different ways. Instead of showing that a particular form in the data can affect performance of learning engines differently, I now had an example that both provides incentive for following the talk and keeps the material relatable. By presenting real scenarios related to my topic and dataset, I could point back to the motivation for the presentation and connect a transitional part of the webinar directly with intriguing findings and overall conclusions.

Examples help the audience stay engaged. Keeping your discussions anchored with concrete examples that relate to your topic allows you to point back to the reasoning and motivation for your project continuously. Inevitably you will have to describe the interpretation of a plot, a schematic of a particular process, or the meaning of a derived metric. Describing the effects of a treatment you have analyzed with a cox survival model or a Kaplan-Meier Curve is much easier if you continually tie your description of how to derive the plots back to real patient numbers, real time scales, and the practical aspects you considered in your trial and treatments. Using examples not only helps the audience understand abstract concepts, but also keeps them involved with your topic throughout descriptions of many steps and really pays off when putting everything together and discussing results and conclusions.

Christian Kendall

Christian Kendall is a Data Scientist at Salford Systems. He brings more than 4 years of research expertise, with a background in physical and life science emphasizing informatics and software development. Christian graduated with a Bachelor’s degree in Chemistry from Occidental College in Eagle Rock, CA, starting with a focus on biochemistry and bioinformatics that later turned into a passion for statistical data analytics and data science. As a researcher, Christian first saw and understood the need for practical modeling applications while working on automatic target recognition at NASA and then developing code for identifying proteins in high-throughput experiments later that same year in the Yates Laboratory at The Scripps Research Institute. At NASA, Christian fixed and optimized instruments while developing analytical methods for detecting bio-interest molecules. Christian also helped to design nanometer-scale structures for the study of photovoltaics while at the California Institute of Technology using 3D modeling and finite-difference time-domain solutions to simulate light absorption. His research continued at both the Mason Laboratory at Weill Cornell Medical College in New York, and The Scripps Research Institute in California, both with a focus on analysis and preparation of DNA sequencing libraries for genomics and metagenomics. Christian’s continued interest in data, automation, software development led him to Salford Systems as a Data Scientist, where he implements machine learning and data mining techniques with our proprietary software to create practical applications for real-world problems. When he’s not crunching numbers, Christian enjoys cooking and baking, brewing kombucha, and trying to keep a lot of cacti and flowers alive.

Exploring the world of data mining and the triumphs and challenges experienced first hand from our in-house data scientists.

About Us

a blog by Salford Systems Subscribe Here