When you plan to conduct an experiment, there are some factors that are under direct control of the researcher:
Unlike α and n, which are specified by the researcher, the magnitude of β depends on the actual value of the population parameter. In addition, β is influenced by the effect size (e.g., Cohen’s d), which can be used to determine a standardized measure of the magnitude of an observed effect. The following parameters are affected more indirectly:
Although β is unknown, it is related to α. For example, if we would like to be absolutely sure that we do not falsely identify an effect which does not exist (i.e., make a type I error), this means that the probability of identifying an effect that does exist (i.e., 1-β) decreases and vice versa. Thus, an extremely low value of α (e.g., α = 0.0001) will result in intolerably high β errors. A common approach is to set α=0.05 and 1-β=0.80.
Unlike the t-value of our test, the effect size (d) is unaffected by the sample size and can be categorized as follows (see Cohen, J. 1988):
In order to test more subtle effects (smaller effect sizes), you need a larger sample size compared to the test of more obvious effects. In this paper , you can find a list of examples for different effect sizes and the number of observations you need to reliably find an effect of that magnitude. Although the exact effect size is unknown before the experiment, you might be able to make a guess about the effect size (e.g., based on previous studies).
If you wish to obtain a standardized measure of the effect, you may compute the effect size (Cohen’s d) using the cohensD() function from the lsr package. Using the examples from the independent-means t-test above, we would use:
According to the thresholds defined above, this effect would be judged to be a small-medium effect.
For the dependent-means t-test, we would use:
According to the thresholds defined above, this effect would also be judged to be a small-medium effect.
When constructing an experimental design, your goal should be to maximize the power of the test while maintaining an acceptable significance level and keeping the sample as small as possible. To achieve this goal, you may use the pwr package, which let’s you compute n , d , alpha , and power . You only need to specify three of the four input variables to get the fourth.
For example, what sample size do we need (per group) to identify an effect with d = 0.6, α = 0.05, and power = 0.8:
Or we could ask, what is the power of our test with 51 observations in each group, d = 0.6, and α = 0.05:
From my experience, students tend to place a lot of weight on p-values when interpreting their research findings. It is therefore important to note some points that hopefully help to put the meaning of a “significant” vs. “insignificant” test result into perspective.
Significant result
Insignificant result
Thus, you should not base your research conclusion on p-values alone!
It is also crucial to determine the sample size before you run the experiment or before you start your analysis. Why? Consider the following example:
This is called p-hacking and should be avoided at all costs. Assuming that both groups come from the same population (i.e., there is no difference in the means): What is the likelihood that the result will be significant at some point? In other words, what is the likelihood that you will draw the wrong conclusion from your data that there is an effect, while there is none? This is shown in the following graph using simulated data - the color red indicates significant test results that arise although there is no effect (i.e., false positives).
Figure 5.1: p-hacking (red indicates false positives)
This chapter is primarily based on Field, A., Miles J., & Field, Z. (2012): Discovering Statistics Using R. Sage Publications, chapters 10 & 12 .
In the previous section we learned how to compare means using a t-test. The t-test has some limitations since it only lets you compare 2 means and you can only use it with one independent variable. However, often we would like to compare means from 3 or more groups. In addition, there may be instances in which you manipulate more than one independent variable. For these applications, ANOVA (ANalysis Of VAriance) can be used. Hence, to conduct ANOVA you need:
A treatment is a particular combination of factor levels, or categories. One-way ANOVA is used when there is only one categorical variable (factor). In this case, a treatment is the same as a factor level. N-way ANOVA is used with two or more factors. Note that we are only going to talk about a single independent variable in the context of ANOVA. If you have multiple independent variables please refere to the chapter on Regression .
Let’s use an example to see how ANOVA works. Similar to the previous example it is also imaginable that the music streaming service experiments with a recommendation system for user created playlists. We now have three groups, the control group “A” with the current system, treatment group “B” who have access to playlists created by other users but are not shown recommendations and treatment group “C” who are shown recommendations for user created playlists. As always, we load and inspect the data first:
The null hypothesis, typically, is that all means are equal (non-directional hypothesis). Hence, in our case:
\[H_0: \mu_1 = \mu_2 = \mu_3\]
The alternative hypothesis is simply that the means are not all equal, i.e.,
\[H_1: \textrm{Means are not all equal}\]
If you wanted to put this in mathematical notation, you could also write:
\[H_1: \exists {i,j}: {\mu_i \ne \mu_j} \]
To get a first impression if there are any differences in listening times across the experimental groups, we use the describeBy(...) function from the psych package:
In addition, you should visualize the data using appropriate plots:
Figure 5.2: Plot of means
Note that ANOVA is an omnibus test, which means that we test for an overall difference between groups. Hence, the test will only tell you if the group means are different, but it won’t tell you exactly which groups are different from another.
So why don’t we then just conduct a series of t-tests for all combinations of groups (i.e., A vs. B, A vs. C, B vs. C)? The reason is that if we assume each test to be independent, then there is a 5% probability of falsely rejecting the null hypothesis (Type I error) for each test. In our case:
This means that the overall probability of making a Type I error is 1-(0.95 3 ) = 0.143, since the probability of no Type I error is 0.95 for each of the three tests. Consequently, the Type I error probability would be 14.3%, which is above the conventional standard of 5%. This is also known as the family-wise or experiment-wise error.
The basic concept underlying ANOVA is the decomposition of the variance in the data. There are three variance components which we need to consider:
The following figure shows the different variance components using a generalized data matrix:
Decomposing variance
The total variation is determined by the variation between the categories (due to our experimental manipulation) and the within-category variation that is due to extraneous factors (e.g., promotion of artists on a social network):
\[SS_T= SS_M+SS_R\]
To get a better feeling how this relates to our data set, we can look at the data in a slightly different way. Specifically, we can use the dcast(...) function from the reshape2 package to convert the data to wide format:
In this example, X 1 from the generalized data matrix above would refer to the factor level “A”, X 2 to the level “B”, and X 3 to the level “C”. Y 11 refers to the first data point in the first row (i.e., “13”), Y 12 to the second data point in the first row (i.e., “21”), etc.. The grand mean ( \(\overline{Y}\) ) and the category means ( \(\overline{Y}_c\) ) can be easily computed:
To see how each variance component can be derived, let’s look at the data again. The following graph shows the individual observations by experimental group:
Figure 5.3: Sum of Squares
To compute the total variation in the data, we consider the difference between each observation and the grand mean. The grand mean is the mean over all observations in the data set. The vertical lines in the following plot measure how far each observation is away from the grand mean:
Figure 5.4: Total Sum of Squares
The formal representation of the total sum of squares (SS T ) is:
\[ SS_T= \sum_{i=1}^{N} (Y_i-\bar{Y})^2 \]
This means that we need to subtract the grand mean from each individual data point, square the difference, and sum up over all the squared differences. Thus, in our example, the total sum of squares can be calculated as:
\[ \begin{align} SS_T =&(13−24.67)^2 + (14−24.67)^2 + … + (2−24.67)^2\\ &+(21−24.67)^2 + (18-24.67)^2 + … + (17−24.67)^2\\ &+(30−24.67)^2 + (37−24.67)^2 + … + (28−24.67)^2\\ &=30855.64 \end{align} \]
You could also compute this in R using:
For the subsequent analyses, it is important to understand the concept behind the degrees of freedom . Remember that in order to estimate a population value from a sample, we need to hold something in the population constant. In ANOVA, the df are generally one less than the number of values used to calculate the SS. For example, when we estimate the population mean from a sample, we assume that the sample mean is equal to the population mean. Then, in order to estimate the population mean from the sample, all but one scores are free to vary and the remaining score needs to be the value that keeps the population mean constant. In our example, we used all 300 observations to calculate the sum of square, so the total degrees of freedom (df T ) are:
\[\begin{equation} \begin{split} df_T = N-1=300-1=299 \end{split} \tag{5.1} \end{equation}\]
Now we know that there are 26646.33 units of total variation in our data. Next, we compute how much of the total variation can be explained by the differences between groups (i.e., our experimental manipulation). To compute the explained variation in the data, we consider the difference between the values predicted by our model for each observation (i.e., the group mean) and the grand mean. The group mean refers to the mean value within the experimental group. The vertical lines in the following plot measure how far the predicted value for each observation (i.e., the group mean) is away from the grand mean:
Figure 5.5: Model Sum of Squares
The formal representation of the model sum of squares (SS M ) is:
\[ SS_M= \sum_{j=1}^{c} n_j(\bar{Y}_j-\bar{Y})^2 \]
where c denotes the number of categories (experimental groups). This means that we need to subtract the grand mean from each group mean, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:
\[ \begin{align} SS_M &= 100*(15.47−24.67)^2 + 100*(24.88−24.67)^2 + 100*(33.66−24.67)^2 \\ &= 21321.21 \end{align} \]
You could also compute this manually in R using:
In this case, we used the three group means to calculate the sum of squares, so the model degrees of freedom (df M ) are:
\[ df_M= c-1=3-1=2 \]
Lastly, we calculate the amount of variation that cannot be explained by our model. In ANOVA, this is the sum of squared distances between what the model predicts for each data point (i.e., the group means) and the observed values. In other words, this refers to the amount of variation that is caused by extraneous factors, such as differences between product characteristics of the products in the different experimental groups. The vertical lines in the following plot measure how far each observation is away from the group mean:
Figure 5.6: Residual Sum of Squares
The formal representation of the residual sum of squares (SS R ) is:
\[ SS_R= \sum_{j=1}^{c} \sum_{i=1}^{n} ({Y}_{ij}-\bar{Y}_{j})^2 \]
This means that we need to subtract the group mean from each individual observation, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:
\[ \begin{align} SS_R =& (13−14.34)^2 + (14−14.34)^2 + … + (2−14.34)^2 \\ +&(21−24.7)^2 + (18−24.7)^2 + … + (17−24.7)^2 \\ +& (30−34.99)^2 + (37−34.99)^2 + … + (28−34.99)^2 \\ =& 9534.43 \end{align} \]
In this case, we used the 10 values for each of the SS for each group, so the residual degrees of freedom (df R ) are:
\[ \begin{align} df_R=& (n_1-1)+(n_2-1)+(n_3-1) \\ =&(100-1)+(100-1)+(100-1)=297 \end{align} \]
Once you have computed the different sum of squares, you can investigate the effect strength. \(\eta^2\) is a measure of the variation in Y that is explained by X:
\[ \eta^2= \frac{SS_M}{SS_T}=\frac{21321.21}{30855.64}=0.69 \]
To compute this in R:
The statistic can only take values between 0 and 1. It is equal to 0 when all the category means are equal, indicating that X has no effect on Y. In contrast, it has a value of 1 when there is no variability within each category of X but there is some variability between categories.
How can we determine whether the effect of X on Y is significant?
The F-statistic uses the ratio of mean square related to X (explained variation) and the mean square related to the error (unexplained variation):
\(\frac{SS_M}{SS_R}\)
However, since these are summed values, their magnitude is influenced by the number of scores that were summed. For example, to calculate SS M we only used the sum of 3 values (the group means), while we used 30 and 27 values to calculate SS T and SS R , respectively. Thus, we calculate the average sum of squares (“mean square”) to compare the average amount of systematic vs. unsystematic variation by dividing the SS values by the degrees of freedom associated with the respective statistic.
Mean square due to X:
\[ MS_M= \frac{SS_M}{df_M}=\frac{SS_M}{c-1}=\frac{21321.21}{(3-1)} \]
Mean square due to error:
\[ MS_R= \frac{SS_R}{df_R}=\frac{SS_R}{N-c}=\frac{9534.43}{(300-3)} \]
Now, we compare the amount of variability explained by the model (experiment), to the error in the model (variation due to extraneous variables). If the model explains more variability than it can’t explain, then the experimental manipulation has had a significant effect on the outcome (DV). The F-radio can be derived as follows:
\[ F= \frac{MS_M}{MS_R}=\frac{\frac{SS_M}{c-1}}{\frac{SS_R}{N-c}}=\frac{\frac{21321.21}{(3-1)}}{\frac{9534.43}{(300-3)}}=332.08 \]
You can easily compute this in R:
This statistic follows the F distribution with (m = c – 1) and (n = N – c) degrees of freedom. This means that, like the \(\chi^2\) distribution, the shape of the F-distribution depends on the degrees of freedom. In this case, the shape depends on the degrees of freedom associated with the numerator and denominator used to compute the F-ratio. The following figure shows the shape of the F-distribution for different degrees of freedom:
The F distribution
The outcome of the test is one of the following:
For 2 and 297 degrees of freedom, the critical value of F is 3.026 for α=0.05. As usual, you can either look up these values in a table or use the appropriate function in R:
The output tells us that the calculated test statistic exceeds the critical value. We can also show the test result visually:
Visual depiction of the test result
Thus, we conclude that because F CAL = 332.08 > F CR = 3.03, H 0 is rejected!
Interpretation: one or more of the differences between means are statistically significant.
Reporting: There was a significant effect of promotion on sales levels, F(2,297) = 332.08, p < 0.05, \(\eta^2\) = 0.69.
Remember: This doesn’t tell us where the differences between groups lie. To find out which group means exactly differ, we need to use post-hoc procedures (see below).
You don’t have to compute these statistics manually! Luckily, there is a function for ANOVA in R, which does the above calculations for you as we will see in the next section.
5.4.3.1 basic anova.
As already indicated, one-way ANOVA is used when there is only one categorical variable (factor). Before conducting ANOVA, you need to check if the assumptions of the test are fulfilled. The assumptions of ANOVA are discussed in the following sections.
The observations in the groups should be independent. Because we randomly assigned the listeners to the experimental conditions, this assumption can be assumed to be met.
ANOVA is relatively immune to violations to the normality assumption when sample sizes are large due to the Central Limit Theorem. However, if your sample is small (i.e., n < 30 per group) you may nevertheless want to check the normality of your data, e.g., by using the Shapiro-Wilk test or QQ-Plot. In our example, we have 100 observations in each group which is plenty but let’s create another example with only 10 observations in each group. In the latter case we cannot rely on the Central Limit Theorem and we should test the normality of our data. This can be done using the Shapiro-Wilk Test, which has the Null Hypothesis that the data is normally distributed. Hence, an insignificant test results means that the data can be assumed to be approximately normally distributed:
Since the test result is insignificant for all groups, we can conclude that the data approximately follow a normal distribution.
We could also test the distributional assumptions visually using a Q-Q plot (i.e., quantile-quantile plot). This plot can be used to assess if a set of data plausibly came from some theoretical distribution such as the Normal distribution. Since this is just a visual check, it is somewhat subjective. But it may help us to judge if our assumption is plausible, and if not, which data points contribute to the violation. A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight. In other words, Q-Q plots take your sample data, sort it in ascending order, and then plot them versus quantiles calculated from a theoretical distribution. Quantiles are often referred to as “percentiles” and refer to the points in your data below which a certain proportion of your data fall. Recall, for example, the standard Normal distribution with a mean of 0 and a standard deviation of 1. Since the 50th percentile (or 0.5 quantile) is 0, half the data lie below 0. The 95th percentile (or 0.95 quantile), is about 1.64, which means that 95 percent of the data lie below 1.64. The 97.5th quantile is about 1.96, which means that 97.5% of the data lie below 1.96. In the Q-Q plot, the number of quantiles is selected to match the size of your sample data.
To create the Q-Q plot for the normal distribution, you may use the qqnorm() function, which takes the data to be tested as an argument. Using the qqline() function subsequently on the data creates the line on which the data points should fall based on the theoretical quantiles. If the individual data points deviate a lot from this line, it means that the data is not likely to follow a normal distribution.
Figure 5.7: Q-Q plot 1
Figure 5.8: Q-Q plot 2
Figure 5.9: Q-Q plot 3
The Q-Q plots suggest an approximately Normal distribution. If the assumption had been violated, you might consider transforming your data or resort to a non-parametric test.
Let’s return to our original dataset with 100 observations in each group for the rest of the analysis.
You can test the homogeneity of variances in R using Levene’s test:
The null hypothesis of the test is that the group variances are equal. Thus, if the test result is significant it means that the variances are not equal. If we cannot reject the null hypothesis (i.e., the group variances are not significantly different), we can proceed with the ANOVA as follows:
You can see that the p-value is smaller than 0.05. This means that, if there really was no difference between the population means (i.e., the Null hypothesis was true), the probability of the observed differences (or larger differences) is less than 5%.
To compute η 2 from the output, we can extract the relevant sum of squares as follows
You can see that the results match the results from our manual computation above ( \(\eta^2 =\) 0.69).
The aov() function also automatically generates some plots that you can use to judge if the model assumptions are met. We will inspect two of the plots here.
We will use the first plot to inspect if the residual variances are equal across the experimental groups:
Generally, the residual variance (i.e., the range of values on the y-axis) should be the same for different levels of our independent variable. The plot shows, that there are some slight differences. Notably, the range of residuals is higher in group “B” than in group “C”. However, the differences are not that large and since the Levene’s test could not reject the Null of equal variances, we conclude that the variances are similar enough in this case.
The second plot can be used to test the assumption that the residuals are approximately normally distributed. We use a Q-Q plot to test this assumption:
The plot suggests that, the residuals are approximately normally distributed. We could also test this by extracting the residuals from the anova output using the resid() function and using the Shapiro-Wilk test:
Confirming the impression from the Q-Q plot, we cannot reject the Null that the residuals are approximately normally distributed.
Note that if Levene’s test would have been significant (i.e., variances are not equal), we would have needed to either resort to non-parametric tests (see below), or compute the Welch’s F-ratio instead:
You can see that the results are fairly similar, since the variances turned out to be fairly equal across groups.
Provided that significant differences were detected by the overall ANOVA you can find out which group means are different using post hoc procedures. Post hoc procedures are designed to conduct pairwise comparisons of all different combinations of the treatment groups by correcting the level of significance for each test such that the overall Type I error rate (α) across all comparisons remains at 0.05.
In other words, we rejected H 0 : μ 1 = μ 2 = μ 3 , and now we would like to test:
\[H_0: \mu_1 = \mu_2\]
\[H_0: \mu_1 = \mu_3\]
\[H_0: \mu_2 = \mu_3\]
There are several post hoc procedures available to choose from. In this tutorial, we will cover Bonferroni and Tukey’s HSD (“honest significant differences”). Both tests control for family-wise error. Bonferroni tends to have more power when the number of comparisons is small, whereas Tukey’ HSDs is better when testing large numbers of means.
One of the most popular (and easiest) methods to correct for the family-wise error rate is to conduct the individual t-tests and divide α by the number of comparisons („k“):
\[ p_{CR}= \frac{\alpha}{k} \]
In our example with three groups:
\[p_{CR}= \frac{0.05}{3}=0.017\]
Thus, the “corrected” critical p-value is now 0.017 instead of 0.05 (i.e., the critical t value is higher). You can implement the Bonferroni procedure in R using:
In the output, you will get the corrected p-values for the individual tests. In our example, we can reject H 0 of equal means for all three tests, since p < 0.05 for all combinations of groups.
Note the difference between the results from the post-hoc test compared to individual t-tests. For example, when we test the “B” vs. “C” groups, the result from a t-test would be:
Usually the p-value is lower in the t-test, reflecting the fact that the family-wise error is not corrected (i.e., the test is less conservative). In this case the p-value is extremely small in both cases and thus indistinguishable.
Tukey’s HSD also compares all possible pairs of means (two-by-two combinations; i.e., like a t-test, except that it corrects for family-wise error rate).
Test statistic:
\[\begin{equation} \begin{split} HSD= q\sqrt{\frac{MS_R}{n_c}} \end{split} \tag{5.2} \end{equation}\]
\[|\bar{Y}_i-\bar{Y}_j | > HSD\]
The value from the studentized range table can be obtained using the qtukey() function.
\[HSD= 3.33\sqrt{\frac{33.99}{100}}=1.94\]
Since all mean differences between groups are larger than 1.906, we can reject the null hypothesis for all individual tests, confirming the results from the Bonferroni test. To compute Tukey’s HSD, we can use the appropriate function from the multcomp package.
We may also plot the result for the mean differences incl. their confidence intervals:
Figure 5.10: Tukey’s HSD
You can see that the CIs do not cross zero, which means that the true difference between group means is unlikely zero.
Reporting of post hoc results:
The post hoc tests based on Bonferroni and Tukey’s HSD revealed that people listened to music significantly more when:
The following video summarizes how to conduct a one-way ANOVA in R
Non-Parametric tests do not require the sampling distribution to be normally distributed (a.k.a. “assumption free tests”). These tests may be used when the variable of interest is measured on an ordinal scale or when the parametric assumptions do not hold. They often rely on ranking the data instead of analyzing the actual scores. By ranking the data, information on the magnitude of differences is lost. Thus, parametric tests are more powerful if the sampling distribution is normally distributed.
When should you use non-parametric tests?
The Mann-Whitney U test is a non-parametric test of differences between groups, similar to the two sample t-test. In contrast to the two sample t-test it only requires ordinally scaled data and relies on weaker assumptions. Thus it is often useful if the assumptions of the t-test are violated, especially if the data is not on a ratio scale. The following assumptions must be fulfilled for the test to be applicable:
Intuitively, the test compares the frequency of low and high ranks between groups. Under the null hypothesis, the amount of high and low ranks should be roughly equal in the two groups. This is achieved through comparing the expected sum of ranks to the actual sum of ranks.
As an example, we will be using data obtained from a field experiment with random assignment. In a music download store, new releases were randomly assigned to an experimental group and sold at a reduced price (i.e., 7.95€), or a control group and sold at the standard price (9.95€). A representative sample of 102 new releases were sampled and these albums were randomly assigned to the experimental groups (i.e., 51 albums per group). The sales were tracked over one day.
Let’s load and investigate the data first:
Inspect descriptives (overall and by group).
Create boxplot and plot of means.
Figure 5.11: Boxplot
Let’s assume that one of the parametric assumptions has been violated and we needed to conduct a non-parametric test. Then, the Mann-Whitney U test is implemented in R using the function wilcox.test() . Using the ranking data as an independent variable and the listening time as a dependent variable, the test could be executed as follows:
The p-value is smaller than 0.05, which leads us to reject the null hypothesis, i.e. the test yields evidence that the new service feature leads to higher music listening times.
The Wilcoxon signed-rank test is a non-parametric test used to analyze the difference between paired observations, analogously to the paired t-test. It can be used when measurements come from the same observational units but the distributional assumptions of the paired t-test do not hold, because it does not require any assumptions about the distribution of the measurements. Since we subtract two values, however, the test requires that the dependent variable is at least interval scaled, meaning that intervals have the same meaning for different points on our measurement scale.
Under the null hypothesis \(H_0\) , the differences of the measurements should follow a symmetric distribution around 0, meaning that, on average, there is no difference between the two matched samples. \(H_1\) states that the distributions mean is non-zero.
As an example, let’s consider a slightly different experimental setup for the music download store. Imagine that new releases were either sold at a reduced price (i.e., 7.95€), or at the standard price (9.95€). Every time a customer came to the store, the prices were randomly determined for every new release. This means that the same 51 albums were either sold at the standard price or at the reduced price and this price was determined randomly. The sales were then recorded over one day. Note the difference to the previous case, where we randomly split the sample and assigned 50% of products to each condition. Now, we randomly vary prices for all albums between high and low prices.
Again, let’s assume that one of the prarametric assumptions has been violated and we needed to conduct a non-parametric test. Then the Wilcoxon signed-rank test can be performed with the same command as the Mann-Whitney U test, provided that the argument paired is set to TRUE .
Using the 95% confidence level, the result would suggest a significant effect of price on sales (i.e., p < 0.05).
The Kruskal–Wallis test is the non-parametric counterpart of the one-way independent ANOVA. It is designed to test for significant differences in population medians when you have more than two samples (otherwise you would use the Mann-Whitney U-test). The theory is very similar to that of the Mann–Whitney U-test since it is also based on ranked data. The Kruskal-Wallis test is carried out using the kruskal.test() function. Using the same data as before, we type:
The test-statistic follows a chi-square distribution and since the test is significant (p < 0.05), we can conclude that there are significant differences in population medians. Provided that the overall effect is significant, you may perform a post hoc test to find out which groups are different. To get a first impression, we can plot the data using a boxplot:
Figure 5.12: Boxplot
To test for differences between groups, we can, for example, apply post hoc tests according to Nemenyi for pairwise multiple comparisons of the ranked data using the appropriate function from the PMCMR package.
The results reveal that there is a significant difference between the “low” and “high” promotion groups. Note that the results are different compared to the results from the parametric test above. This difference occurs because non-parametric tests have more power to detect differences between groups since we lose information by ranking the data. Thus, you should rely on parametric tests if the assumptions are met.
In some instances, you will be confronted with differences between proportions, rather than differences between means. For example, you may conduct an A/B-Test and wish to compare the conversion rates between two advertising campaigns. In this case, your data is binary (0 = no conversion, 1 = conversion) and the sampling distribution for such data is binomial. While binomial probabilities are difficult to calculate, we can use a Normal approximation to the binomial when n is large (>100) and the true likelihood of a 1 is not too close to 0 or 1.
Let’s use an example: assume a call center where service agents call potential customers to sell a product. We consider two call center agents:
As always, we load the data first:
Next, we create a table to check the relative frequencies:
We could also plot the data to visualize the frequencies using ggplot:
Figure 5.13: proportion of conversions per agent (stacked bar chart)
… or using the mosaicplot() function:
Figure 5.14: proportion of conversions per agent (mosaic plot)
Recall that we can use confidence intervals to determine the range of values that the true population parameter will take with a certain level of confidence based on the sample. Similar to the confidence interval for means, we can compute a confidence interval for proportions. The (1- \(\alpha\) )% confidence interval for proportions is approximately
\[ CI = p\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p*(1-p)}{N}} \]
where \(\sqrt{p(1-p)}\) is the equivalent to the standard deviation in the formula for the confidence interval for means. Based on the equation, it is easy to compute the confidence intervals for the conversion rates of the call center agents:
Similar to testing for differences in means, we could also ask: Is agent 1 twice as likely as agent 2 to convert a customer? Or, to state it formally:
\[H_0: \pi_1=\pi_2 \\ H_1: \pi_1\ne \pi_2\]
where \(\pi\) denotes the population parameter associated with the proportion in the respective population. One approach to test this is based on confidence intervals to estimate the difference between two populations. We can compute an approximate confidence interval for the difference between the proportion of successes in group 1 and group 2, as:
\[ CI = p_1-p_2\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p_1*(1-p_1)}{n_1}+\frac{p_2*(1-p_2)}{n_2}} \]
If the confidence interval includes zero, then the data does not suggest a difference between the groups. Let’s compute the confidence interval for differences in the proportions by hand first:
Now we can see that the 95% confidence interval estimate of the difference between the proportion of conversions for agent 1 and the proportion of conversions for agent 2 is between 26% and 41%. This interval tells us the range of plausible values for the difference between the two population proportions. According to this interval, zero is not a plausible value for the difference (i.e., interval does not cross zero), so we reject the null hypothesis that the population proportions are the same.
Instead of computing the intervals by hand, we could also use the prop.test() function:
Note that the prop.test() function uses a slightly different (more accurate) way to compute the confidence interval (Wilson’s score method is used). It is particularly a better approximation for smaller N. That’s why the confidence interval in the output slightly deviates from the manual computation above, which uses the Wald interval.
You can also see that the output from the prop.test() includes the results from a χ 2 test for the equality of proportions (which will be discussed below) and the associated p-value. Since the p-value is less than 0.05, we reject the null hypothesis of equal probability. Thus, the reporting would be:
The test showed that the conversion rate for agent 1 was higher by 33%. This difference is significant χ (1) = 70, p < .05 (95% CI = [0.25,0.41]).
In the previous section, we saw how we can compute the confidence interval for the difference between proportions to decide on whether or not to reject the null hypothesis. Whenever you would like to investigate the relationship between two categorical variables, the \(\chi^2\) test may be used to test whether the variables are independent of each other. It achieves this by comparing the expected number of observations in a group to the actual values. Let’s continue with the example from the previous section. Under the null hypothesis, the two variables agent and conversion in our contingency table are independent (i.e., there is no relationship). This means that the frequency in each field will be roughly proportional to the probability of an observation being in that category, calculated under the assumption that they are independent. The difference between that expected quantity and the actual quantity can be used to construct the test statistic. The test statistic is computed as follows:
\[ \chi^2=\sum_{i=1}^{J}\frac{(f_o-f_e)^2}{f_e} \]
where \(J\) is the number of cells in the contingency table, \(f_o\) are the observed cell frequencies and \(f_e\) are the expected cell frequencies. The larger the differences, the larger the test statistic and the smaller the p-value.
The observed cell frequencies can easily be seen from the contingency table:
The expected cell frequencies can be calculated as follows:
\[ f_e=\frac{(n_r*n_c)}{n} \]
where \(n_r\) are the total observed frequencies per row, \(n_c\) are the total observed frequencies per column, and \(n\) is the total number of observations. Thus, the expected cell frequencies under the assumption of independence can be calculated as:
To sum up, these are the expected cell frequencies
… and these are the observed cell frequencies
To obtain the test statistic, we simply plug the values into the formula:
The test statistic is \(\chi^2\) distributed. The chi-square distribution is a non-symmetric distribution. Actually, there are many different chi-square distributions, one for each degree of freedom as show in the following figure.
Figure 5.15: The chi-square distribution
You can see that as the degrees of freedom increase, the chi-square curve approaches a normal distribution. To find the critical value, we need to specify the corresponding degrees of freedom, given by:
\[ df=(r-1)*(c-1) \]
where \(r\) is the number of rows and \(c\) is the number of columns in the contingency table. Recall that degrees of freedom are generally the number of values that can vary freely when calculating a statistic. In a 2 by 2 table as in our case, we have 2 variables (or two samples) with 2 levels and in each one we have 1 that vary freely. Hence, in our example the degrees of freedom can be calculated as:
Now, we can derive the critical value given the degrees of freedom and the level of confidence using the qchisq() function and test if the calculated test statistic is larger than the critical value:
Figure 5.16: Visual depiction of the test result
We could also compute the p-value using the pchisq() function, which tells us the probability of the observed cell frequencies if the null hypothesis was true (i.e., there was no association):
The test statistic can also be calculated in R directly on the contingency table with the function chisq.test() .
Since the p-value is smaller than 0.05 (i.e., the calculated test statistic is larger than the critical value), we reject H 0 that the two variables are independent.
Note that the test statistic is sensitive to the sample size. To see this, let’s assume that we have a sample of 100 observations instead of 1000 observations:
You can see that even though the proportions haven’t changed, the test is insignificant now. The following equation lets you compute a measure of the effect size, which is insensitive to sample size:
\[ \phi=\sqrt{\frac{\chi^2}{n}} \]
The following guidelines are used to determine the magnitude of the effect size (Cohen, 1988):
In our example, we can compute the effect sizes for the large and small samples as follows:
You can see that the statistic is insensitive to the sample size.
Note that the Φ coefficient is appropriate for two dichotomous variables (resulting from a 2 x 2 table as above). If any your nominal variables has more than two categories, Cramér’s V should be used instead:
\[ V=\sqrt{\frac{\chi^2}{n*df_{min}}} \]
where \(df_{min}\) refers to the degrees of freedom associated with the variable that has fewer categories (e.g., if we have two nominal variables with 3 and 4 categories, \(df_{min}\) would be 3 - 1 = 2). The degrees of freedom need to be taken into account when judging the magnitude of the effect sizes (see e.g., here ).
Note that the correct = FALSE argument above ensures that the test statistic is computed in the same way as we have done by hand above. By default, chisq.test() applies a correction to prevent overestimation of statistical significance for small data (called the Yates’ correction). The correction is implemented by subtracting the value 0.5 from the computed difference between the observed and expected cell counts in the numerator of the test statistic. This means that the calculated test statistic will be smaller (i.e., more conservative). Although the adjustment may go too far in some instances, you should generally rely on the adjusted results, which can be computed as follows:
As you can see, the results don’t change much in our example, since the differences between the observed and expected cell frequencies are fairly large relative to the correction.
Caution is warranted when the cell counts in the contingency table are small. The usual rule of thumb is that all cell counts should be at least 5 (this may be a little too stringent though). When some cell counts are too small, you can use Fisher’s exact test using the fisher.test() function.
The Fisher test, while more conservative, also shows a significant difference between the proportions (p < 0.05). This is not surprising since the cell counts in our example are fairly large.
To calculate the required sample size when comparing proportions, the power.prop.test() function can be used. For example, we could ask how large our sample needs to be if we would like to compare two groups with conversion rates of 2% and 2.5%, respectively using the conventional settings for \(\alpha\) and \(\beta\) :
The output tells us that we need 13809 observations per group to detect a difference of the desired size.
Every product owner knows that it takes effort to build something that'll cater to user needs. You'll have to make many tough calls if you wish to grow the company and evolve the product so it delivers more value. But how do you decide what to change in the product, your marketing strategy, or the overall direction to succeed? And how do you make a product that truly resonates with your target audience?
There are many unknowns in business, so many fundamental decisions start from a simple "what if?". But they can't be based on guesses, as you need some proof to fill in the blanks reasonably.
Because there's no universal recipe for successfully building a product, teams collect data, do research, study the dynamics, and generate hypotheses according to the given facts. They then take corresponding actions to find out whether they were right or wrong, make conclusions, and most likely restart the process again.
On this page, we thoroughly inspect product hypotheses. We'll go over what they are, how to create hypothesis statements and validate them, and what goes after this step.
A hypothesis in product development and product management is a statement or assumption about the product, planned feature, market, or customer (e.g., their needs, behavior, or expectations) that you can put to the test, evaluate, and base your further decisions on . This may, for instance, regard the upcoming product changes as well as the impact they can result in.
A hypothesis implies that there is limited knowledge. Hence, the teams need to undergo testing activities to validate their ideas and confirm whether they are true or false.
Hypotheses guide the product development process and may point at important findings to help build a better product that'll serve user needs. In essence, teams create hypothesis statements in an attempt to improve the offering, boost engagement, increase revenue, find product-market fit quicker, or for other business-related reasons.
It's sort of like an experiment with trial and error, yet, it is data-driven and should be unbiased . This means that teams don't make assumptions out of the blue. Instead, they turn to the collected data, conducted market research , and factual information, which helps avoid completely missing the mark. The obtained results are then carefully analyzed and may influence decision-making.
Such experiments backed by data and analysis are an integral aspect of successful product development and allow startups or businesses to dodge costly startup mistakes .
When do teams create hypothesis statements and validate them? To some extent, hypothesis testing is an ongoing process to work on constantly. It may occur during various product development life cycle stages, from early phases like initiation to late ones like scaling.
In any event, the key here is learning how to generate hypothesis statements and validate them effectively. We'll go over this in more detail later on.
You might be wondering whether ideas and hypotheses are the same thing. Well, there are a few distinctions.
An idea is simply a suggested proposal. Say, a teammate comes up with something you can bring to life during a brainstorming session or pitches in a suggestion like "How about we shorten the checkout process?". You can jot down such ideas and then consider working on them if they'll truly make a difference and improve the product, strategy, or result in other business benefits. Ideas may thus be used as the hypothesis foundation when you decide to prove a concept.
A hypothesis is the next step, when an idea gets wrapped with specifics to become an assumption that may be tested. As such, you can refine the idea by adding details to it. The previously mentioned idea can be worded into a product hypothesis statement like: "The cart abandonment rate is high, and many users flee at checkout. But if we shorten the checkout process by cutting down the number of steps to only two and get rid of four excessive fields, we'll simplify the user journey, boost satisfaction, and may get up to 15% more completed orders".
A hypothesis is something you can test in an attempt to reach a certain goal. Testing isn't obligatory in this scenario, of course, but the idea may be tested if you weigh the pros and cons and decide that the required effort is worth a try. We'll explain how to create hypothesis statements next.
The last thing those developing a product want is to invest time and effort into something that won't bring any visible results, fall short of customer expectations, or won't live up to their needs. Therefore, to increase the chances of achieving a successful outcome and product-led growth , teams may need to revisit their product development approach by optimizing one of the starting points of the process: learning to make reasonable product hypotheses.
If the entire procedure is structured, this may assist you during such stages as the discovery phase and raise the odds of reaching your product goals and setting your business up for success. Yet, what's the entire process like?
Such processes imply sharing ideas when a problem is spotted by digging deep into facts and studying the possible risks, goals, benefits, and outcomes. You may apply various MVP tools like (FigJam, Notion, or Miro) that were designed to simplify brainstorming sessions, systemize pitched suggestions, and keep everyone organized without losing any ideas.
Predictive product analysis can also be integrated into this process, leveraging data and insights to anticipate market trends and consumer preferences, thus enhancing decision-making and product development strategies. This approach fosters a more proactive and informed approach to innovation, ensuring products are not only relevant but also resonate with the target audience, ultimately increasing their chances of success in the market.
Besides, you can settle on one of the many frameworks that facilitate decision-making processes , ideation phases, or feature prioritization . Such frameworks are best applicable if you need to test your assumptions and structure the validation process. These are a few common ones if you're looking toward a systematic approach:
Upsilon's team of pros is ready to share our expertise in building tech products.
Once you've indicated the addressable problem or opportunity and broken down the issue in focus, you need to work on formulating the hypotheses and associated tasks. By the way, it works the same way if you want to prove that something will be false (a.k.a null hypothesis).
If you're unsure how to write a hypothesis statement, let's explore the essential steps that'll set you on the right track.
Product hypotheses are generally different for each case, so begin by pinpointing the major variables, i.e., the cause and effect . You'll need to outline what you think is supposed to happen if a change or action gets implemented.
Put simply, the "cause" is what you're planning to change, and the "effect" is what will indicate whether the change is bringing in the expected results. Falling back on the example we brought up earlier, the ineffective checkout process can be the cause, while the increased percentage of completed orders is the metric that'll show the effect.
Make sure to also note such vital points as:
Mind that generic connections that lack specifics will get you nowhere. So if you're thinking about how to word a hypothesis statement, make sure that the cause and effect include clear reasons and a logical dependency .
Think about what can be the precise and link showing why A affects B. In our checkout example, it could be: fewer steps in the checkout and the removed excessive fields will speed up the process, help avoid confusion, irritate users less, and lead to more completed orders. That's much more explicit than just stating the fact that the checkout needs to be changed to get more completed orders.
Certainly, multiple things can be used to measure the effect. Therefore, you need to choose the optimal metrics and validation criteria that'll best envision if you're moving in the right direction.
If you need a tip on how to create hypothesis statements that won't result in a waste of time, try to avoid vagueness and be as specific as you can when selecting what can best measure and assess the results of your hypothesis test. The criteria must be measurable and tied to the hypotheses . This can be a realistic percentage or number (say, you expect a 15% increase in completed orders or 2x fewer cart abandonment cases during the checkout phase).
Once again, if you're not realistic, then you might end up misinterpreting the results. Remember that sometimes an increase that's even as little as 2% can make a huge difference, so why make 50% the merit if it's not achievable in the first place?
It's quite common that you'll end up with multiple product hypotheses. Some are more important than others, of course, and some will require more effort and input.
Therefore, just as with the features on your product development roadmap , prioritize your hypotheses according to their impact and importance. Then, group and order them, especially if the results of some hypotheses influence others on your list.
To demonstrate how to formulate your assumptions clearly, here are several more apart from the example of a hypothesis statement given above:
There are multiple options when it comes to validating hypothesis statements. To get appropriate results, you have to come up with the right experiment that'll help you test the hypothesis. You'll need a control group or people who represent your target audience segments or groups to participate (otherwise, your results might not be accurate).
What can serve as the experiment you may run? Experiments may take tons of different forms, and you'll need to choose the one that clicks best with your hypothesis goals (and your available resources, of course). The same goes for how long you'll have to carry out the test (say, a time period of two months or as little as two weeks). Here are several to get you started.
Talking to users, potential customers, or members of your own online startup community can be another way to test your hypotheses. You may use surveys, questionnaires, or opt for more extensive interviews to validate hypothesis statements and find out what people think. This assumption validation approach involves your existing or potential users and might require some additional time, but can bring you many insights.
One of the experiments you may develop involves making more than one version of an element or page to see which option resonates with the users more. As such, you can have a call to action block with different wording or play around with the colors, imagery, visuals, and other things.
To run such split experiments, you can apply tools like VWO that allows to easily construct alternative designs and split what your users see (e.g., one half of the users will see version one, while the other half will see version two). You can track various metrics and apply heatmaps, click maps, and screen recordings to learn more about user response and behavior. Mind, though, that the key to such tests is to get as many users as you can give the tests time. Don't jump to conclusions too soon or if very few people participated in your experiment.
Demos and clickable prototypes can be a great way to save time and money on costly feature or product development. A prototype also allows you to refine the design. However, they can also serve as experiments for validating hypotheses, collecting data, and getting feedback.
For instance, if you have a new feature in mind and want to ensure there is interest, you can utilize such MVP types as fake doors . Make a short demo recording of the feature and place it on your landing page to track interest or test how many people sign up.
Similarly, you can run experiments to observe how users interact with the feature, page, product, etc. Usually, such experiments are held on prototype testing platforms with a focus group representing your target visitors. By showing a prototype or early version of the design to users, you can view how people use the solution, where they face problems, or what they don't understand. This may be very helpful if you have hypotheses regarding redesigns and user experience improvements before you move on from prototype to MVP development.
You can even take it a few steps further and build a barebone feature version that people can really interact with, yet you'll be the one behind the curtain to make it happen. There were many MVP examples when companies applied Wizard of Oz or concierge MVPs to validate their hypotheses.
Or you can actually develop some functionality but release it for only a limited number of people to see. This is referred to as a feature flag , which can show really specific results but is effort-intensive.
Analysis is what you move on to once you've run the experiment. This is the time to review the collected data, metrics, and feedback to validate (or invalidate) the hypothesis.
You have to evaluate the experiment's results to determine whether your product hypotheses were valid or not. For example, if you were testing two versions of an element design, color scheme, or copy, look into which one performed best.
It is crucial to be certain that you have enough data to draw conclusions, though, and that it's accurate and unbiased . Because if you don't, this may be a sign that your experiment needs to be run for some additional time, be altered, or held once again. You won't want to make a solid decision based on uncertain or misleading results, right?
On another note, make sure to record your hypotheses and experiment results . Some companies use CRMs to jot down the key findings, while others use something as simple as Google Docs. Either way, this can be your single source of truth that can help you avoid running the same experiments or allow you to compare results over time.
Upsilon's team of pros can help you build a product most optimally.
The hypothesis-driven approach in product development is a great way to avoid uncalled-for risks and pricey mistakes. You can back up your assumptions with facts, observe your target audience's reactions, and be more certain that this move will deliver value.
However, this only makes sense if the validation of hypothesis statements is backed by relevant data that'll allow you to determine whether the hypothesis is valid or not. By doing so, you can be certain that you're developing and testing hypotheses to accelerate your product management and avoiding decisions based on guesswork.
Certainly, a failed experiment may bring you just as much knowledge and findings as one that succeeds. Teams have to learn from their mistakes, boost their hypothesis generation and testing knowledge , and make improvements according to the results of their experiments. This is an ongoing process, of course, as no product can grow if it isn't iterated and improved.
If you're only planning to or are currently building a product, Upsilon can lend you a helping hand. Our team has years of experience providing product development services for growth-stage startups and building MVPs for early-stage businesses , so you can use our expertise and knowledge to dodge many mistakes. Don't be shy to contact us to discuss your needs!
Never miss an update.
We noticed that you are using Internet Explorer 11 or older that is not support any longer. Please consider using an alternative such as Microsoft Edge, Chrome, or Firefox.
This template will help you test a hypothesis as part of a market research effort.
This tool can be used alone, but it’s also part of the comprehensive Market Research Playbook. It provides step-by-step planning guidance while also helping you utilize more than 25 downloadable tools from the popular AMA Marketer’s Toolkit library.
This tool is powered by Demand Metric .
By continuing to use this site, you accept the use of cookies, pixels and other technology that allows us to understand our users better and offer you tailored content. You can learn more about our privacy policy here
Breadcrumbs Section. Click here to navigate to respective pages.
Omsk Oblast
DOI link for Omsk Oblast
Click here to navigate to parent product.
This chapter presents history, economic statistics, and federal government directories of Omsk Oblast. Omsk Oblast is situated in the south of the Western Siberian Plain on the middle reaches of the Irtysh river. Kazakhstan lies to the south. Tyumen Oblast lies to the north-west, and Tomsk Oblast and Novosibirsk Oblast lie to the east. The city of Omsk was founded as a fortress in 1716. In 1918 it became the seat of Adm. Aleksandr Kolchak's 'white' 'All-Russian Government'. Omsk fell to the Bolsheviks in 1919, and Kolchak 'abdicated' in January 1920. In 2015 Omsk Oblast's gross regional product (GRP) amounted to 617,184m. roubles, equivalent to 311,973 roubles per head. The Oblast's soil is the fertile black earth characteristic of the region. Its agriculture consists mainly of animal husbandry, hunting and the production of grain. The sector employed 14.6% of the workforce and contributed 9.6% of GRP in 2015.
Connect with us
Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2023 Informa UK Limited
IMAGES
COMMENTS
Conversion gives you tactical examples that walk you through how to properly structure a marketing hypothesis to ensure usable results
Follow these 9 key stages in the marketing research process to ensure that your research project is a successful one!
Market research is the systematic process of gathering, analyzing and interpreting information about a specific market or industry. What are your customers' needs? How does your product compare to the...
A hypothesis is a research-based statement that aims to explain an observed trend and create a solution that will improve the result. This statement is an educated, testable prediction about what will happen.
27 min read Wondering how to do market research? Or even where to start learning about it? Use our ultimate guide to understand the basics and discover how you can use market research to help your business.
A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.
A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.
A marketing research aggregator is a marketing research company that doesn't conduct its own research and sell it. Instead, it buys research reports from other marketing research companies and then sells the reports in their entirety or in pieces to other firms.
Definition: Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation. Hypothesis is often used in scientific research to guide the design of experiments ...
Research begins with a research question and a research hypothesis. But what are the characteristics of a good hypothesis? In this article, we dive into the types of research hypothesis, explain how to write a research hypothesis, offer research hypothesis examples and answer top FAQs on research hypothesis. Read more!
Centering your testing on a hypothesis that is rooted in solving problems can be a huge benefit to your testing and optimization efforts. Read to learn more about you can craft a good hypothesis that will drive the focus of your testing efforts to discovering more about your customers.
A research hypothesis is a statement that brings up a question and predicts what might happen. It's really important in the scientific method and is used in experiments to figure things out. Essentially, it's an educated guess about how things are connected in the research.
We examined three approaches to research in marketing: exploratory hypotheses, dominant hypothesis, and competing hypotheses. Our review of empirical studies on scientific methodology suggests that the use of a single dominant hypothesis lacks objectivity relative to the use of exploratory and competing hypotheses approaches. We then conducted a publication audit of over 1,700 empirical papers ...
The story of a research study begins by asking a question. Researchers all around the globe are asking curious questions and formulating research hypothesis. However, whether the research study provides an effective conclusion depends on how well one develops a good research hypothesis. Research hypothesis examples could help researchers get an idea as to how to write a good research hypothesis.
A research hypothesis can be defined as a clear, specific and predictive statement that states the possible outcome of a scientific study. The result of the research study is based on previous research studies and can be tested by scientific research. The research hypothesis is written before the beginning of any scientific research or data ...
5 Hypothesis testing This chapter is primarily based on Field, A., Miles J., & Field, Z. (2012): Discovering Statistics Using R. Sage Publications, chapters 5, 9, 15, 18.
A hypothesis in product development and product management is a statement or assumption about the product, planned feature, market, or customer (e.g., their needs, behavior, or expectations) that you can put to the test, evaluate, and base your further decisions on.
Hypothesis Testing Tool This template will help you test a hypothesis as part of a market research effort.
Hypothesis Group is a consumer insights and strategy agency. We use full-service market research, strategy, and design to help brands do amazing things. Let's work together.
Other fatalities: 4. Total fatalities: 178. Circumstances: Following an uneventful flight from Krasnodar, the crew started the approach to Omsk Airport in a reduced visibility due to the night and rain falls. The aircraft landed at a speed of 270 km/h and about one second later, the captain noticed the presence of vehicles on the runway.
THE PORK PROCESSING FACTORY IS LOCATED ON THE PREMISES OF A FORMER SAWMILL IN KALACHINSK. The first stage capacity of ООО "Kalachinskiy Myasnoy Product" plant is up to 120 tonnes of processed pork per month.
Omsk Oblast ( Russian: О́мская о́бласть, romanized : Omskaya oblast') is a federal subject of Russia (an oblast ), located in southwestern Siberia. The oblast has an area of 139,700 square kilometers (53,900 sq mi). Its population is 1,977,665 ( 2010 Census) [ 9] with the majority, 1.12 million, living in Omsk, the administrative ...
This chapter presents history, economic statistics, and federal government directories of Omsk Oblast. Omsk Oblast is situated in the south of the Western Siberian Plain on the middle reaches of the Irtysh river.