Basic Data Analysis for Non-Researchers: What You Need to Know and Pitfalls to Avoid

Basic Data Analysis for Non-Researchers

Basic Data Analysis for Non-Researchers

If it’s been a while since your last statistics course, you might feel a bit intimidated by some of the terminology used in marketing research. After all, great examples of data misinterpretation and misuse are all around us these days! This blog should help you understand some the most common terms as well as explain common pitfalls for each.

  1. Mean. The arithmetic mean, or as most of us say, “the average,” is the total of individual responses divided by the total number on the list. Means describe the overall trend of a data set or to give a quick snapshot of your results. Anyone can quickly and easily calculate a mean without sophisticated software or analytics – another advantage of the average.

Common Uses: The mean is often used to compare data over time. You might hear about the average price of milk or a gallon of gas, the average number of people per household, or the common academic achievement test. Using the average makes it easy to evaluate and describe the trend.

Pitfall:  However, the simple mean can be a dangerous tool. It is sometimes confused with the mode the most common measure in the set) or the median (the measure where exactly 50% of measurements are above, and 50% below). However, if your dataset has a lot of outliers or a very skewed distribution, the mean alone does not provide sufficient information for analysis.

  1. Standard Deviation. The standard deviation, often represented by the Greek letter sigma (∑), tells you about how the data is distributed around the mean. Low standard deviations indicate that the data are closely aligned with the mean, while high standard deviations indicate the data are spread more widely around the mean. The standard deviation gives you a quick view of the dispersion of your data.

Common Uses: Standard deviation is often utilized in the calculation of statistical tests (see below) but are also indicators of data dispersion. For example, a researcher uses the standard deviation to understand how well a sample represents the population it was drawn from.

Pitfall: The standard deviation can also be misleading if the data don’t have a normal distribution curve or many outliers. Again, if that is the case, the standard deviation will not give you the insight you need.

  1. Regression analysis is used to describe the relationships between dependent and explanatory (independent) variables. In addition to the direction of the relationship, regression can also tell us if the relationship is strong or weak.

Common Uses: Regression helps us understand how different variables are related to other variables. So, if you wanted to know what factors drive satisfaction, you would use regression to understand which factors had the strongest relationship with your customer satisfaction metric. You could then focus your efforts on those factors to improve the overall satisfaction of your clients.

Pitfall: Regression is commonly used, but can also be misleading. For example, an outlier may represent the input from a significant customer or product. Because regression analysis tends to smooth out these outliers, you might be tempted to ignore them.

  1. Calculating Sample Size. When your customer base, a population, or any large data set, you can save time and money by collecting data from a representative sample, not from every single individual. But what’s the right sample? Understanding how accurate you need your results (how statistically significant), you can then apply an equation to give you the correct sample size for your research project.

Common Uses: Any time you need to select a subset of a greater population, whether people or things, you should understand how the number you choose will impact your ability to analyze and project your results back to the population. So, of course, if you are drawing a sample for a survey, calculate sample size. But also if you are conducting quality control tests in a manufacturing setting, you might also want to understand how many you would need to test.

Pitfall: When studying a new, untested variable in a population, you might need to make assumptions about the proportion of that variable. If your assumptions are wrong, this error is then passed along to your sample size, potentially distorting your data analysis.

  1. Hypothesis Testing. In data analysis and statistics, we test hypotheses to see if the results are statistically significant, that is they could not be the result of random chance alone. These tests allow us to sort out the significant differences between subgroups and population segments.

Common Uses: When we conduct quantitative marketing research, we begin with project objectives. Those goals lead to assumptions and hypotheses, and we collect the data to test these assumptions. Once the research is conducted, we prove or disprove each of those hypotheses and assumptions, and tells us what our next steps should be in solving the issue or problem we face.

Pitfall: When conducting hypothesis tests, researchers need to be aware of and watch out for the impact of certain common errors within themselves and because of their perceptions and expectations. As an example, the placebo effect occurs when participants expect one result and then falsely perceive or get that expected result. Another example is the Hawthorne effect (also called observer effect), which happens when participants give skewed results because they know they are the subject of the research

The methods described here are very commonly applied in basic marketing research. However, as you well know, data analysis can also use very sophisticated and complex advanced techniques. If these terms sound unfamiliar to you, it’s probably best to check with an expert before making leaps of judgment based on your results. After all, better safe than sorry!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.