• Open access
  • Published: 06 April 2022

Design of a new Z -test for the uncertainty of Covid-19 events under Neutrosophic statistics

  • Muhammad Aslam   ORCID: orcid.org/0000-0003-0644-1950 1  

BMC Medical Research Methodology volume  22 , Article number:  99 ( 2022 ) Cite this article

4826 Accesses

6 Citations

Metrics details

The existing Z-test for uncertainty events does not give information about the measure of indeterminacy/uncertainty associated with the test.

This paper introduces the Z-test for uncertainty events under neutrosophic statistics. The test statistic of the existing test is modified under the philosophy of the Neutrosophy. The testing process is introduced and applied to the Covid-19 data.

Based on the information, the proposed test is interpreted as the probability that there is no reduction in uncertainty of Covid-19 is accepted with a probability of 0.95, committing a type-I error is 0.05 with the measure of an indeterminacy 0.10. Based on the analysis, it is concluded that the proposed test is informative than the existing test. The proposed test is also better than the Z-test for uncertainty under fuzzy-logic as the test using fuzz-logic gives the value of the statistic from 2.20 to 2.42 without any information about the measure of indeterminacy. The test under interval statistic only considers the values within the interval rather than the crisp value.

Conclusions

From the Covid-19 data analysis, it is found that the proposed Z-test for uncertainty events under the neutrosophic statistics is efficient than the existing tests under classical statistics, fuzzy approach, and interval statistics in terms of information, flexibility, power of the test, and adequacy.

Peer Review reports

The Z-test is playing an important role in analyzing the data. The main aim of the Z-test is to test the mean of the unknown population in decision-making. The Z-test for uncertainty events is applied to test the reduction in the uncertainty of past events. This type of test is applied to test the null hypothesis that there is no reduction in uncertainty against the alternative hypothesis that there is a significant reduction in uncertainty of past events. The Z-test for uncertainty events uses the information of the past events for testing the reduction of uncertainty [ 1 ]. discussed the performance of the statistical test under uncertainty [ 2 ]. discussed the design of the Z-test for uncertainty events [ 3 ]. worked on the test in the presence of uncertainty [ 4 ]. worked on the modification of non-parametric test. The applications of [ 5 ], [ 6 ], [ 7 ] and [ 8 ].

[ 9 ] mentioned that “statistical data are frequently not precise numbers but more or less non-precise also called fuzzy. Measurements of continuous variables are always fuzzy to a certain degree”. In such cases, the existing Z-tests cannot be applied for the testing of the mean of population or reduction in uncertainty. Therefore, the existing Z-tests are modified under the fuzzy-logic to deal with uncertain, fuzzy, and vague data [ 10 ]., [ 11 ], [ 12 ], [ 13 ], [ 14 ], [ 15 ], [ 16 ], [ 17 ], [ 18 ], [ 19 ] worked on the various statistical tests using the fuzzy-logic.

Nowadays, neutrosophic logic attracts researchers due to its many applications in a variety of fields. The neutrosophic logic counters the measure of indeterminacy that is considered by the fuzzy logic, see [ 20 ] [ 21 ]. proved that neutrosophic logic is efficient than interval-based analysis. More applications of neutrosophic logic can be seen in [ 22 ], [ 23 ], [ 24 ] and [ 25 ] [ 26 ]. applied the neutrosophic statistics to deal with uncertain data [ 27 ]. and [ 28 ] presented neutrosophic statistical methods to analyze the data. Some applications of neutrosophic tests can be seen in [ 29 ], [ 30 ] and [ 31 ].

The existing Z-test for uncertainty events under classical statistics does not consider the measure of indeterminacy when testing the reduction in events. By exploring the literature and according to the best of our knowledge, there is no work on Z-test for uncertainty events under neutrosophic statistics. In this paper, the medication of Z-test for uncertainty events under neutrosophic statistics will be introduced. The application of the proposed test will be given using the Covid-19 data. It is expected that the proposed Z-test for uncertainty events under neutrosophic statistics will be more efficient than the existing tests in terms of the power of the test, information, and adequacy.

The existing Z-test for uncertainty events can be applied only when the probability of events is known. The existing test does not evaluate the effect of the measure of indeterminacy/uncertainty in the reduction of uncertainty of past events. We now introduce the modification of the Z-test for uncertainty events under neutrosophic statistics. With the aim that the proposed test will be more effective than the existing Z-test for uncertainty events under classical statistics. Let \({A}_N={A}_L+{A}_U{I}_{A_N};{I}_{A_N}\epsilon \left[{I}_{A_L},{I}_{A_U}\right]\) and \({B}_N={B}_L+{B}_U{I}_{B_N};{I}_{B_N}\epsilon \left[{I}_{B_L},{I}_{B_U}\right]\) be two neutrosophic events, where lower values  A L ,  B L denote the determinate part of the events, upper values \({A}_U{I}_{A_N}\) , \({B}_U{I}_{B_N}\) be the indeterminate part, and \({I}_{A_N}\epsilon \left[{I}_{A_L},{I}_{A_U}\right]\) , \({I}_{B_N}\epsilon \left[{I}_{B_L},{I}_{B_U}\right]\) be the measure of indeterminacy associated with these events. Note here that the events A N ϵ [ A L ,  A U ] and B N ϵ [ B L ,  B U ] reduces to events under classical statistics (determinate parts) proposed by [ 2 ] if \({I}_{A_L}={I}_{B_L}\) =0. Suppose n N  =  n L  +  n U I N ; I N ϵ [ I L ,  I U ] be a neutrosophic random sample where n L is the lower (determinate) sample size and n U I N be the indeterminate part and  I N ϵ [ I L ,  I U ] be the measure of uncertainty in selecting the sample size. The neutrosophic random sample reduces to random sample if no uncertainty is found in the sample size. The methodology of the proposed Z-test for uncertainty events is explained as follows.

Suppose that the probability that an event A N ϵ [ A L ,  A U ] occurs (probability of truth) is  P ( A N ) ϵ [ P ( A L ),  P ( A U )], the probability that an event A N ϵ [ A L ,  A U ] does not occur (probability of false) is \(P\left({A}_N^c\right)\epsilon \left[P\left({A}_L^c\right),P\left({A}_U^c\right)\right]\) , the probability that an event B N ϵ [ B L ,  B U ] occurs (probability of truth) is P ( B N ) ϵ [ P ( B L ),  P ( B U )], the probability that an event B N ϵ [ B L ,  B U ] does not occur (probability of false) is \(P\left({B}_N^c\right)\epsilon \left[P\left({B}_L^c\right),P\left({B}_U^c\right)\right]\) . It is important to note that sequential analysis is done to reduce the uncertainty by using past events information. The purpose of the proposed test is whether the reduction of uncertainty is significant or not. Let Z N ϵ [ Z L ,  Z U ] be neutrosophic test statistic, where Z L and Z U are the lower and upper values of statistic, respectively and defined by.

Note that P ( B + kN |  A N ) =  P ( B N |  A N ) at lag  k N , where P ( B N |  A N ) ϵ [ P ( B L |  A L ),  P ( B U |  A U )] denotes the conditional probability. It means that the probability of event P ( B N ) ϵ [ P ( B L ),  P ( B U )] will be calculated when the event A N ϵ [ A L ,  A U ] has occurred.

The neutrosophic form of the proposed test statistic, say Z N ϵ [ Z L ,  Z U ] is defined by.

The alternative form of Eq. ( 2 ) can be written as.

The proposed test Z N ϵ [ Z L ,  Z U ] is the extension of several existing tests. The proposed test reduces to the existing Z test under classical statistics when I ZN =0. The proposed test is also an extension of the Z test under fuzzy approach and interval statistics.

The proposed test will be implemented as follows.

Step-1: state the null hypothesis  H 0 : there is no reduction in uncertainty vs. the alternative hypothesis  H 1 : there is a significant reduction in uncertainty.

Step-2: Calculate the statistic Z N ϵ [ Z L ,  Z U . ]

Step-3: Specify the level of significance α and select the critical value from [ 2 ].

Step-4: Do not accept the null hypothesis if the value of Z N ϵ [ Z L ,  Z U ] is larger than the critical value.

The application of the proposed test is given in the medical field. The decision-makers are interested to test the reduction in uncertainty of Covid-19 when the measure of indeterminacy/uncertainty is  I ZN ϵ [0,0.10]. The decision-makers are interested to test that the reduction in death due to Covid-19 (event  A N ) with the increase in Covid-19 vaccines (event  B N ). By following [ 2 ], the sequence in which both events occur is given as

where n N ϵ [12, 12], k N ϵ [1, 1], P ( A N ) = 6/12 = 0.5 and P ( B N ) = 6/12 = 0.5.

Note here that event  A N occurs 6 times and that of these 6 times  B N occurs immediately after  A N five times. Given that  A N has occurred, we get

( B + kN |  A N ) =  P ( B N |  A N ) = 5/6 = 0.83 at lag 1. The value of Z N ϵ [ Z L ,  Z U ] is calculated as

\({Z}_N=\left(1+0.1\right)\frac{0.83-0.50}{\sqrt{\frac{0.50\left[1-0.50\right]\left[1-0.50\right]}{\left(12-1\right)0.50}}}=2.42;{I}_{ZN}\epsilon \left[\mathrm{0,0.1}\right]\) . From [ 2 ], the critical value is 1.96.

The proposed test for the example will be implemented as follows

Step-1: state the null hypothesis  H 0 : there is no reduction in uncertainty of Covid-19 vs. the alternative hypothesis  H 1 : there is a significant reduction in uncertainty of Covid-19.

Step-2: the value of the statistic is 2.42.

Step-3: Specify the level of significance α  = 0.05 and select the critical value from [ 2 ] which is 1.96.

Step-4: Do not accept the null hypothesis as the value of Z N is larger than the critical value.

From the analysis, it can be seen that the calculated value of Z N ϵ [ Z L ,  Z U ] is larger than the critical value of 1.96. Therefore, the null hypothesis  H 0 : there is no reduction in uncertainty of Covid-19 will be rejected in favor of  H 1 : there is a significant reduction in uncertainty of Covid-19. Based on the study, it is concluded that there is a significant reduction in the uncertainty of Covid-19.

Simulation study

In this section, a simulation study is performed to see the effect of the measure of indeterminacy on statistic  Z N ϵ [ Z L ,  Z U ]. For this purpose, a neutrosophic form of  Z N ϵ [ Z L ,  Z U ] obtained from the real data will be used. The neutrosophic form of  Z N ϵ [ Z L ,  Z U ] is given as

To analyze the effect on  H 0 , the various values of I ZN ϵ [ I ZL ,  I ZU ] are considered. The computed values of  Z N ϵ [ Z L ,  Z U ] along with the decision on  H 0 are reported in Table  1 . For this study α  = 0.05 and the critical value is 1.96. The null hypothesis  H 0 will be accepted if the calculated value of  Z N is less than 1.96. From Table 1 , it can be seen that as the values of I ZN ϵ [ I ZL ,  I ZU ] increases from 0.01 to 2, the values of  Z N ϵ [ Z L ,  Z U ] increases. Although, a decision about  H 0 remains the same at all values of measure of indeterminacy I ZN ϵ [ I ZL ,  I ZU ] but the difference between  Z N ϵ [ Z L ,  Z U ] and the critical value of 1.96 increases as I ZU increases. From the study, it can be concluded that the measure of indeterminacy I ZN ϵ [ I ZL ,  I ZU ] affects the values of  Z N ϵ [ Z L ,  Z U ].

Comparative studies

As mentioned earlier, the proposed Z-test for uncertainty events is an extension of several tests. In this section, a comparative study is presented in terms of measure of indeterminacy, flexibility and information. We will compare the efficiency of the proposed Z-test for uncertainty with the proposed Z-test for uncertainty under classical statistics, proposed Z-test for uncertainty under fuzzy logic and proposed Z-test for uncertainty under interval statistics. The neutrosophic form of the proposed statistic  Z N ϵ [ Z L ,  Z U ] is expressed as  Z N  = 2.20 + 2.20 I ZN ; I ZN ϵ [0,0.1]. Note that the first 2.20 presents the existing Z-test for uncertainty under classical statistics, the second part 2.20 I ZN is an indeterminate part and 0.1 is a measure of indeterminacy associated with the test. From the neutrosophic form, it can be seen that the proposed test is flexible as it gives the values of  Z N ϵ [ Z L ,  Z U ] in an interval from 2.20 to 2.42 when I ZU =0. On the other hand, the existing test gives the value of 2.20. In addition, the proposed test uses information about the measure of indeterminacy that the existing test does not consider. Based on the information, the proposed test is interpreted as the probability that  H 0 : there is no reduction in uncertainty of Covid-19 is accepted with a probability of 0.95, committing a type-I error is 0.05 with the measure of an indeterminacy 0.10. Based on the analysis, it is concluded that the proposed test is informative than the existing test. The proposed test is also better than the Z-test for uncertainty under fuzzy-logic as the test using fuzz-logic gives the value of the statistic from 2.20 to 2.42 without any information about the measure of indeterminacy. The test under interval statistic only considers the values within the interval rather than the crisp value. On the other hand, the analysis based on neutrosophic considers any type of set. Based on the analysis, it is concluded that the proposed Z-test is efficient than the existing tests in terms of information, flexibility, and indeterminacy.

Comparison using power of the test

In this section, the efficiency of the proposed test is compared with the existing test in terms of the power of the test. The power of the test is defined as the probability of rejecting  H 0 when it is false and it is denoted by  β . As mentioned earlier, the probability of rejecting  H 0 when it is true is known as a type-I error is denoted by α . The values of  Z N ϵ [ Z L ,  Z U ] are simulated using the classical standard normal distribution and neutrosophic standard normal distribution. During the simulation 100 values of  Z N ϵ [ Z L ,  Z U ] are generated from a classical standard normal distribution and neutrosophic standard normal distribution with mean \({\mu}_N={\mu}_L+{\mu}_U{I}_{\mu_N};{I}_{\mu_N}\epsilon \left[{I}_{\mu_L},{I}_{\mu_U}\right]\) , where μ L  = 0 presents the mean of classical standard normal distribution, \({\mu}_U{I}_{\mu_N}\) denote the indeterminate value and \({I}_{\mu_N}\epsilon \left[{I}_{\mu_L},{I}_{\mu_U}\right]\) is a measure of indeterminacy. Note that when \({I}_{\mu_L}\) =0, μ N reduces to μ L . The values of  Z N ϵ [ Z L ,  Z U ] are compared with the tabulated value at α =0.05. The values of the power of the test for the existing test and for the proposed test for various values of \({I}_{\mu_U}\) are shown in Table  2 . From Table 2, it is clear that the existing test under classical statistics provides smaller values of the power of the test as compared to the proposed test at all values of \({I}_{\mu_U}\) . For example, when \({I}_{\mu_U}\) =0.1, the power of the test provided by the Z-test for uncertainty events under classical statistics is 0.94 and the power of the test provided by the proposed Z-test for uncertainty events is 0.96. The values of the power of the test for Z-test for uncertainty events under classical statistics and Z-test for uncertainty events under neutrosophic statistics are plotted in Fig.  1 . From Fig. 1, it is quite clear that the power curve of the proposed test is higher than the power curve of the existing test. Based on the analysis, it can be concluded that the proposed Z-test for uncertainty events under neutrosophic statistics is efficient than the existing Z-test for uncertainty events.

figure 1

The power curves of the two tests

The Z-test of uncertainty was introduced under neutrosophic statistics in this paper. The proposed test was a generalization of the existing Z-test of uncertain events under classical statistics, fuzzy-based test, and interval statistics. The performance of the proposed test was compared with the listed existing tests. From the real data and simulation study, the proposed test was found to be more efficient in terms of information and power of the test. Based on the information, it is recommended to apply the proposed test to check the reduction in uncertainty under an indeterminate environment. The proposed test for big data can be considered as future research. The proposed test using double sampling can also be studied as future research. The estimation of sample size and other properties of the proposed test can be studied in future research.

Availability of data and materials

All data generated or analysed during this study are included in this published article

DOLL H, CARNEY S. Statistical approaches to uncertainty: p values and confidence intervals unpacked. BMJ evidence-based medicine. 2005;10(5):133–4.

Article   Google Scholar  

Kanji, G.K, 100 statistical tests 2006: Sage.

Lele SR. How should we quantify uncertainty in statistical inference? Front Ecol Evol. 2020;8:35.

Wang F, et al. Re-evaluation of the power of the mann-kendall test for detecting monotonic trends in hydrometeorological time series. Front Earth Sci. 2020;8:14.

Maghsoodloo S, Huang C-Y. Comparing the overlapping of two independent confidence intervals with a single confidence interval for two normal population parameters. J Stat Plan Inference. 2010;140(11):3295–305.

Rono BK, et al. Application of paired student t-test on impact of anti-retroviral therapy on CD4 cell count among HIV Seroconverters in serodiscordant heterosexual relationships: a case study of Nyanza region. Kenya.

Zhou X-H. Inferences about population means of health care costs. Stat Methods Med Res. 2002;11(4):327–39.

Niwitpong S, Niwitpong S-a. Confidence interval for the difference of two normal population means with a known ratio of variances. Appl Math Sci. 2010;4(8):347–59.

Google Scholar  

Viertl R. Univariate statistical analysis with fuzzy data. Comput Stat Data Anal. 2006;51(1):133–47.

Filzmoser P, Viertl R. Testing hypotheses with fuzzy data: the fuzzy p-value. Metrika. 2004;59(1):21–9.

Tsai C-C, Chen C-C. Tests of quality characteristics of two populations using paired fuzzy sample differences. Int J Adv Manuf Technol. 2006;27(5):574–9.

Taheri SM, Arefi M. Testing fuzzy hypotheses based on fuzzy test statistic. Soft Comput. 2009;13(6):617–25.

Jamkhaneh EB, Ghara AN. Testing statistical hypotheses with fuzzy data. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics: IEEE; 2010.

Chachi J, Taheri SM, Viertl R. Testing statistical hypotheses based on fuzzy confidence intervals. Austrian J Stat. 2012;41(4):267–86.

Kalpanapriya D, Pandian P. Statistical hypotheses testing with imprecise data. Appl Math Sci. 2012;6(106):5285–92.

Parthiban, S. and P. Gajivaradhan, A Comparative Study of Two-Sample t-Test Under Fuzzy Environments Using Trapezoidal Fuzzy Numbers.

Montenegro M, et al. Two-sample hypothesis tests of means of a fuzzy random variable. Inf Sci. 2001;133(1-2):89–100.

Park S, Lee S-J, Jun S. Patent big data analysis using fuzzy learning. Int J Fuzzy Syst. 2017;19(4):1158–67.

Garg H, Arora R. Generalized Maclaurin symmetric mean aggregation operators based on Archimedean t-norm of the intuitionistic fuzzy soft set information. Artif Intell Rev. 2020:1–41.

Smarandache F. Neutrosophy. Neutrosophic probability, set, and logic, ProQuest Information & Learning, vol. 105. Michigan: Ann Arbor; 1998. p. 118–23.

Broumi S, Smarandache F. Correlation coefficient of interval neutrosophic set. In: Applied mechanics and materials: Trans Tech Publ; 2013.

Abdel-Basset M, et al. A novel group decision making model based on neutrosophic sets for heart disease diagnosis. Multimed Tools Appl. 2019:1–26.

Alhasan KFH, Smarandache F. Neutrosophic Weibull distribution and Neutrosophic Family Weibull Distribution2019. Infinite Study.

Das SK, Edalatpanah S. A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci. 2020;4(2).

El Barbary G, O. and R. Abu Gdairi, Neutrosophic logic-based document summarization. J Undergrad Math. 2021.

Smarandache, F., Introduction to neutrosophic statistic 014: Infinite Study.

Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9(10):208.

Chen J, et al. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9(7):123.

Sherwani RAK, et al. A new neutrosophic sign test: an application to COVID-19 data. PLoS One. 2021;16(8):e0255671.

Article   CAS   Google Scholar  

Aslam M. Neutrosophic statistical test for counts in climatology. Sci Rep. 2021;11(1):1–5.

Albassam M, Khan N, Aslam M. Neutrosophic D’Agostino test of normality: an application to water data. J Undergrad Math. 2021;2021.

Download references

Acknowledgements

We are thankful to the editor and reviewers for their valuable suggestions to improve the quality of the paper.

Author information

Authors and affiliations.

Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah, 21551, Saudi Arabia

Muhammad Aslam

You can also search for this author in PubMed   Google Scholar

Contributions

MA wrote the paper.

Corresponding author

Correspondence to Muhammad Aslam .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Aslam, M. Design of a new Z -test for the uncertainty of Covid-19 events under Neutrosophic statistics. BMC Med Res Methodol 22 , 99 (2022). https://doi.org/10.1186/s12874-022-01593-x

Download citation

Received : 27 September 2021

Accepted : 31 March 2022

Published : 06 April 2022

DOI : https://doi.org/10.1186/s12874-022-01593-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Uncertainty
  • Classical statistics

BMC Medical Research Methodology

ISSN: 1471-2288

research paper that uses z test

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The use of weighted Z-tests in medical research

Affiliation.

  • 1 Sanofi-aventis, Bridgewater, New Jersey, USA. [email protected]
  • PMID: 16022168
  • DOI: 10.1081/BIP-200062284

Traditionally the un-weighted Z-tests, which follow the one-patient-one-vote principle, are standard for comparisons of treatment effects. We discuss two types of weighted Z-tests in this manuscript to incorporate data collected in two (or more) stages or in two (or more) regions. We use the type A weighted Z-test to exemplify the variance spending approach in the first part of this manuscript. This approach has been applied to sample size re-estimation. In the second part of the manuscript, we introduce the type B weighted Z-tests and apply them to the design of bridging studies. The weights in the type A weighted Z-tests are pre-determined, independent of the prior observed data, and controls alpha at the desired level. To the contrary, the weights in the type B weighted Z-tests may depend on the prior observed data; and the type I error rate for the bridging study is usually inflated to a level higher than that of a full-scale study. The choice of the weights provides a simple statistical framework for communication between the regulatory agency and the sponsor. The negotiation process may involve practical constrains and some characteristics of prior studies.

PubMed Disclaimer

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Taylor & Francis
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Open access
  • Published: 02 January 2024

Analysis of imprecise measurement data utilizing z-test for correlation

  • Muhammad Aslam 1  

Journal of Big Data volume  11 , Article number:  4 ( 2024 ) Cite this article

1052 Accesses

2 Citations

Metrics details

The conventional Z-test for correlation, grounded in classical statistics, is typically employed in situations devoid of vague information. However, real-world data often comes with inherent uncertainty, necessitating an adaptation of the Z-test using neutrosophic statistics. This paper introduces a modified Z-test for correlation designed to explore correlations in the presence of imprecise data. We will present the simulation to check the effect of the measure of indeterminacy on the evolution of type-I error and the power of the test. The application of this modification is illustrated through an examination of heartbeat and temperature data. Upon analyzing the heartbeat and temperature data, it is determined that, in the face of indeterminacy, the correlation between heartbeat and temperature emerges as significant. This highlights the importance of accounting for imprecise data when investigating relationships between variables.

Introduction

In medical science, correlation analysis has been used to investigate the degree of dependence between two medical variables. The correlation analysis tells about the strength of the relationship between two variables. The Z-test of correlation has been applied to investigating the significance of the correlation between two variables. For example, in medical science, the decision-makers are interested to see the significance of the relationship between blood pressure and diet. Based on correlation analysis, the medical decision-makers can suggest a suitable medication. Therefore, the statistical test investigates the correlation between the significance of the two variables under study. The null hypothesis that the correlation between two variables is insignificant is tested versus the alternative hypothesis that two variables are significantly correlated. Statistical tests have been widely used in medical science for decision-making. Gordon Lan et al. [ 13 ] introduced the weighted Z-tests and applied them in medical science. Bellolio et al. [ 7 ] provided a detailed discussion on the suitability of the statistical tests for medical studies. Mukaka [ 19 ] discussed the suitability of correlation analysis for medical data. Pandey et al. [ 21 ] discussed the importance of t-tests in medical-related problems. Schober et al. [ 23 ] discussed the correlation analysis for anesthesia data. Kc [ 17 ] wrote a review on the applications of statistical methods for medical science. Janse et al. [ 15 ] discussed the limitations of correlation analysis using medical data. More applications of statistical methods in the medical field can be seen in [ 10 , 18 , 29 , 32 ].

Statistical tests have been widely used for the analysis of measurement data. Grzesiek et al. [ 14 ] used the statistical test for the analysis of the temperature data. Avuçlu [ 6 ] presented the work on the detection covid-19 using statistical measurements. More applications of statistical analysis for the measurement data can be seen in [ 22 , 27 , 31 ].

The neutrosophic statistics were developed by [ 26 ] using the idea of neutrosophic logic developed by [ 25 ], and its efficiency of fuzzy logic and interval-analysis is shown by [ 9 ]. The applications of neutrosophic logic in medical science can be read in [ 8 , 30 ]. Neutrosophic statistics are used for the collection of imprecise and interval data, analysis and interpretation of the imprecise data. The efficiency of neutrosophic statistics over classical statistics was discussed by [ 5 , 11 , 12 ]. Later on, the applications of neutrosophic statistics in the field of medical science were given by [ 1 , 3 , 5 , 24 ].

The existing Z-test for correlation cannot be applied when the data is expressed in intervals or when uncertainty in parameters or level of significance is noted. To overcome this issue, in this paper, the Z-test for a single correlation coefficient using neutrosophic statistics will be presented. The test statistic for the proposed test will be developed and the application will be given using the heartbeat and body temperature data. It is expected that the proposed test will be efficient in investigating the significance of the correlation between variables expressed in intervals.

Let \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) and \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) be two neutrosophic random variables of size \({n}_{N}={n}_{L}+{n}_{U}{I}_{{n}_{N}};{I}_{{n}_{N}}\epsilon \left[{I}_{{n}_{L}},{I}_{{n}_{U}}\right]\) follow the neutrosophic normal distribution with the neutrosophic means \({\mu }_{XN}={\mu }_{XL}+{\mu }_{XU}{I}_{{\mu }_{XN}};{I}_{{\mu }_{XN}}\epsilon \left[{I}_{{\mu }_{XL}},{I}_{{\mu }_{XU}}\right]\) and \({\mu }_{YN}={\mu }_{YL}+{\mu }_{YU}{I}_{{\mu }_{YN}};{I}_{{\mu }_{YN}}\epsilon \left[{I}_{{\mu }_{YL}},{I}_{{\mu }_{YU}}\right]\) , and neutrosophic standard deviation \({\sigma }_{XN}={\sigma }_{XL}+{\sigma }_{XU}{I}_{{\sigma }_{XN}};{I}_{{\sigma }_{XN}}\epsilon \left[{I}_{{\sigma }_{XL}},{I}_{{\sigma }_{XU}}\right]\) and \({\sigma }_{YN}={\sigma }_{YL}+{\sigma }_{YU}{I}_{{\sigma }_{YN}};{I}_{{\sigma }_{YN}}\epsilon \left[{I}_{{\sigma }_{YL}},{I}_{{\sigma }_{YU}}\right]\) , respectively. Note that \({X}_{L},{Y}_{L},{n}_{L},{\mu }_{XL}\) , \({\mu }_{YL}\) are the determinate parts denote the classical statistics, \({X}_{U}{I}_{{X}_{N}},{Y}_{U}{I}_{{Y}_{N}},{I}_{{n}_{N}}\epsilon \left[{I}_{{n}_{L}},{I}_{{n}_{U}}\right], {n}_{U}{I}_{{n}_{N}},{\mu }_{XU}{I}_{{\mu }_{XN}},{\mu }_{YU}{I}_{{\mu }_{YN}}\) are indeterminate parts and \({I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right],{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right],{I}_{{\mu }_{XN}}\epsilon \left[{I}_{{\mu }_{XL}},{I}_{{\mu }_{XU}}\right],{I}_{{\mu }_{YN}}\epsilon \left[{I}_{{\mu }_{YL}},{I}_{{\mu }_{YU}}\right]\) are measures of indeterminacy. For designing of the proposed Z-test of a correlation coefficient, it is assumed that variance in \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) should be independent from the variance in \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) . Suppose that \({r}_{N}={r}_{L}+{r}_{U}{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[{I}_{{r}_{L}},{I}_{{r}_{u}}\right]\) is neutrosophic correlation between \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) and \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) that is given by the following [ 2 ] as

where \({\overline{X} }_{L}\) and \({\overline{X} }_{U}\) are the lower and upper values of the neutrosophic sample average. To investigate either correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) differs significantly from the specified correlation \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) , the null hypothesis that the correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is at least \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) vs. the alternative hypothesis correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is at most \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) . Using the Fisher’s transformation, the value of quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is calculated as

The quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) can be written as

Note that \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) follows the neutrosophic normal distribution. The mean, say \({\mu }_{{Z}_{1N}}\) of \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is given by

where \({\rho }_{0N}\) represents the specified value of the correlation coefficient.

The variance, say \({\sigma }_{{Z}_{1N}}\) of \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is given by

The neutrosophic test statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) is defined as

The statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) can be written as

Note that the proposed statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) is an extension of the existing \(Z\) -test. The proposed \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) reduces to the existing \(Z\) -test under classical statistics when \({I}_{{Z}_{L}}\) =0.

Simulation studies

In this section, we will simulate the impact of the measure of indeterminacy on the type-I error, denoted by \(\alpha\) , which represents the probability of rejecting the null hypothesis when it is true. Additionally, we will investigate the way in which indeterminacy affects the power of the test \((1-\beta )\) , with \(\beta\) representing the probability of failing to reject the null hypothesis when it is false. Following the approach outlined by [ 28 ], we define the neutrosophic variable \({\rho }_{0N}\) as \({\rho }_{0L}+{\rho }_{0U}{I}_{N}\) , where \({\rho }_{0L}\) represents the determinate value of the correlation coefficient, \({\rho }_{0U}{I}_{N}\) is the indeterminate part, and \({I}_{N}\) belongs to the interval \(\left[{I}_{L},{I}_{U}\right]\) , representing the degree of indeterminacy. Our analysis will initially focus on the impact of \({I}_{N}\) on the type-I error and subsequently examine its effect on the power of the test.

Effect of \({{\varvec{I}}}_{{\varvec{N}}}\) on type-I error

We will examine the impact of the measure of indeterminacy on the type-I error through a simulation conducted 10 6 times. Following the approach outlined in [ 20 ], the type-I error is computed as the ratio of rejecting the null hypothesis to the total number of replicates. The type-I error values for both the classical statistics test and the proposed test, across various values of \({I}_{N}\) , are depicted in Fig.  1 . The lower curve in Fig.  1 represents the type-I error for the test using classical statistics, while the upper curve displays the values for the proposed test. The observation from Fig.  1 is that, for the classical statistics test, the type-I error remains consistent across all levels of indeterminacy. Conversely, the higher curve indicates an increase in the type-I error as the values of \({I}_{N}\) rise. This suggests a significant effect of the measure of indeterminacy on the evaluation of the type-I error, cautioning decision-makers to exercise care when making decisions regarding hypothesis testing in the presence of uncertainty.

figure 1

Graphs illustrating the type-I error for both tests

Effect of \({{\varvec{I}}}_{{\varvec{N}}}\) on type-II error

We will assess how the degree of indeterminacy influences the test's efficacy through a simulation conducted a million times. Following the methodology outlined by [ 20 ], the type-II error is calculated as the ratio of incorrect decisions to the total number of replicates. Table 1 presents the type-II error values for both the conventional statistical test and the proposed test across various levels of \({I}_{N}\) . Figure  2 illustrates the trends in test power. In Fig.  2 , the lower curve represents the power of the test using classical statistics, while the upper curve depicts the power of the test for the proposed method. Figure  2 reveals that, for the classical statistics test, the power remains consistent regardless of the level of indeterminacy. In contrast, the higher curve indicates a decline in test power as \({I}_{N}\) values increase. This implies a significant impact of the measure of indeterminacy on test performance. This study suggests that, unlike the classical statistics test, the proposed test's power is affected by the degree of indeterminacy. Consequently, it is concluded that relying on the existing test under classical statistics may lead decision-makers astray when making decisions in the presence of uncertainty.

figure 2

Power of the test curves

Application

This section presents the application of the proposed \(Z\) -test for correlation using the heartbeat (HBT) and temperature (TMP) data. The medical decision-makers are interested in investigating the relationship between the HBT and TMP. The primary and secondary healthcare department is responsible for the delivery of essential and effective health services in the province of Punjab, Pakistan. Punjab Health Facilities Management Company (PHFMC) on behalf of the health department engages in providing the required services. Basic Health Unit (BHU) is the first level health care unit under the supervision of qualified doctors which usually covers around 10,000 to 25,000 population. Three months (June 2021 to August 2021) patients’ daily data who visited BHU with reporting gastritis (authenticated and reported by a qualified medical doctor). The minimum and maximum values of the patients visited in a day are recorded and the data of two variables is arranged in intervals. The schematic diagram is shown in Fig.  3 . The interval data of HBT and TMP is recorded and reported in Table  2 . From the data given in Table  2 , it can be seen that the medical decision-makers cannot apply the existing Z-test to investigate the significance of the correlation between HBT and TMP. The use of the proposed Z-test for correlation seems suitable for analyzing HBT and TMP data.

figure 3

The schematic diagram

The proposed Z-test for correlation using the HBT and TMP data is carried out as: the neutrosophic correlation \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is calculated as follows \({r}_{N}\epsilon \left[\mathrm{0.2024,0.4333}\right]\) and expressed in neutrosophic form as \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) . The quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is calculated as \({Z}_{1N}\epsilon \left[\mathrm{0.2052,0.4640}\right]\) . The mean and standard deviation are calculated as \({\mu }_{{Z}_{1N}}\epsilon \left[\mathrm{0.5493,0.5493}\right]\) and \({\sigma }_{{Z}_{1N}}\epsilon \left[\mathrm{0.1222,0.1222}\right]\) , respectively. The proposed test statistic \(\left|{Z}_{N}\epsilon \left[{Z}_{L},{Z}_{U}\right]\right|\) is calculated as \({Z}_{N}=2.8162-0.6981{I}_{{Z}_{N}};{I}_{{Z}_{N}}\epsilon \left[\mathrm{0,3.0341}\right]\) . Suppose that the value of the level of significance \(\alpha\)  = 0.05. The proposed test for investigating the relationship between HBT and TMP is implemented as.

Step 1: State null hypothesis \({H}_{0N}:\) correlation between HBT and TMP is \({r}_{0N}=0.50\) vs. the alternative hypothesis \({H}_{1N}:\) correlation between HBT and TMP is less than \({r}_{0N}=0.50\) .

Step 2: For \(\alpha\) =0.10, the tabulated value from [ 16 ] is 1.64.

Step 3: Compare \({Z}_{N}\epsilon [\mathrm{2.8162,0.6981}]\) with 1.64 and reject \({H}_{0N}\) if \({Z}_{N}\epsilon \left[\mathrm{2.8162,0.6981}\right]>1.64\) .

By comparing the values of \({Z}_{N}\epsilon \left[\mathrm{2.8162,0.6981}\right]\) with 1.64, it is clear that the lower value of \({Z}_{L}\) is larger than 1.64, so, the null hypothesis \({H}_{0N}\) will rejected in favor of \({H}_{1N}\) . On the other hand, the upper value of statistic \({Z}_{U}\) is smaller than 1.64 which leads to rejection of \({H}_{1N}\) . From the analysis, it is clear that the determinate part which presents the statistic using classical statistics indicates that the correlation between HBT and TMP is less than 0.50. On the other hand, the indeterminate part shows that the correlation between HBT and TMP is 0.50. Under uncertainty, it is expected that there will be significant correlation between HBT and TMP.

Comparative studies based on HBT and TMP data

Based on the analysis of HBT and TMP data, the comparisons of the proposed Z-test for correlation are carried out with the existing Z-test for correlation using the classical statistics, fuzzy-based test and interval-statistics in terms of information and flexibility of the results. The neutrosophic forms of the correlation \({r}_{N}\) and the test statistic \(\left|{Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\right|\) are presented as \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) and \({Z}_{N}=2.8162-0.6981{I}_{{Z}_{N}};{I}_{{Z}_{N}}\epsilon \left[\mathrm{0,3.0341}\right]\) , respectively. From the results, it can be analyzed that under indeterminacy, the correlation between HBT and TMP may vary from 0.2024 to 0.4333. The values of statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) may vary from \(2.8162\) to \(0.6981\) . Note that as mentioned before the first values 0.2024, and \(2.8162\) present the results of the Z-test for correlation using classical statistics. The statistical test using classical statistics states that the probability of rejecting \({H}_{0N}:\) the correlation between HBT and TMP is \({r}_{0N}=0.50\) when it is true is 0.05 and the probability of accepting \({H}_{0N}:\) a correlation between HBT and TMP is \({r}_{0N}=0.50\) is 0.95. On the other hand, the proposed test for correlation states that the probability of rejecting \({H}_{0N}:\) the correlation between HBT and TMP is \({r}_{0N}=0.50\) when it is true is 0.05, the probability of accepting \({H}_{0N}:\) correlation between HBT and TMP is \({r}_{0N}=0.50\) is 0.95 and the measure of indeterminacy/uncertainty associated with the decision is \(3.0341\) . Similarly, the Z-test using fuzzy-logic gives the information about the statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) in intervals only. According to the fuzzy-based statistical test, it can be expected that the values of \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) may vary from \(2.8162\) to \(0.6981\) . The fuzzy-based analysis and interval-analysis only give information in intervals and are unable to give any information about the measure of indeterminacy. From the comparative studies, it is concluded that the proposed Z-test for correlation is more informative than the test using classical statistics, fuzzy-based analysis and interval-based analysis.

Discussions based on HBT and TMP data

The main aim of the paper is to investigate the significance of the relationship between HBT and TMP. The neutrosophic form correlation analysis of HBT and TMP is \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) . The correlation analysis of HBT and TMP shows that the correlation between HBT and TMP may vary from 0.2024 to 0.4333. As mentioned earlier, the correlation value 0.2024 denotes the correlation using classical statistics. The value \(0.4333{I}_{{r}_{N}}\) denotes the correlation related to the indeterminate part. From the correlation analysis of the determined part that is 0.2024, it can be seen that there is weak correlation between HBT and TMP. It means that the increase in TMP does not increase the HBT significantly. Under indeterminacy, the correlation of the indeterminate part is 0.4333. This means that there is a moderate correlation between HBT and TMP. It means that the increase in TMP may increase HBT. From the correlation analysis, it can be seen that although the correlation of the determinate part is insignificant as the measure of indeterminacy increases, it can increase the correlation between HBT and TMP. Therefore, the decision makers should be careful in dealing with patents having the diseases of HBT and TMP.

Concluding remarks

The paper discussed the adaptation of the Z-test of correlation through the application of neutrosophic statistics. It provided an explanation of the rationale behind employing the proposed Z-test and detailed the neutrosophic test statistic along with the corresponding implementation steps. The simulation study conducted in the paper led to the conclusion that there is a notable impact of indeterminacy on both the type-I error and the power of the test. The paper demonstrated the application of the proposed test using data from HBT and TMP intervals. The findings revealed that, in the presence of indeterminacy or when dealing with interval data, the correlation between HBT and TMP increases as the measure of indeterminacy rises. The analysis suggests that decision-makers can effectively use the proposed test to explore correlations between variables in diverse fields such as medical science, business, and industry. Additionally, the paper suggested avenues for future research, including the exploration of the proposed test using a resampling scheme. It also recommended further investigation into additional statistical properties as potential areas for future research.

Availability of data and materials

The data is given in the paper.

Aslam M. A new method to analyze rock joint roughness coefficient based on neutrosophic statistics. Measurement. 2019;146:65–71.

Article   Google Scholar  

Aslam M, Albassam M. Application of neutrosophic logic to evaluate correlation between prostate cancer mortality and dietary fat assumption. Symmetry. 2019;11(3):330.

Aslam M, Arif OH, Sherwani RAK. New diagnosis test under the neutrosophic statistics: an application to diabetic patients. BioMed Res Int. 2020. https://doi.org/10.1155/2020/2086185 .

Aslam M. Chi-square test under indeterminacy: an application using pulse count data. BMC Med Res Methodol. 2021;21:1–5.

Aslam M. Assessing the significance of relationship between metrology variables under indeterminacy. Mapan. 2021;37:119–24.

Avuçlu E. COVID-19 detection using X-ray images and statistical measurements. Measurement. 2022;201: 111702.

Bellolio MF, Serrano LA, Stead LG. Understanding statistical tests in the medical literature: which test should I use? Int J Emerg Med. 2008;1(3):197–9.

Broumi S, Deli I. Correlation measure for neutrosophic refined sets and its application in medical diagnosis. Infinite Study; 2015.

Google Scholar  

Broumi S, Smarandache F. Correlation coefficient of interval neutrosophic set. Appl Mech Mater. 2013;436:511–7.

Chen J, Talha M. Audit data analysis and application based on correlation analysis algorithm. Comput Math Methods Med. 2021;2021:1–11.

Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9(10):208.

Chen J, Ye J, Du S, Yong R. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9(7):123.

Gordon Lan K, Soo Y, Siu C, Wang M. The use of weighted Z-tests in medical research. J Biopharm Stat. 2005;15:625–39.

Article   MathSciNet   Google Scholar  

Grzesiek A, Zimroz R, Śliwiński P, Gomolla N, Wyłomańska A. Long term belt conveyor gearbox temperature data analysis–statistical tests for anomaly detection. Measurement. 2020;165: 108124.

Janse RJ, Hoekstra T, Jager KJ, Zoccali C, Tripepi G, Dekker FW, van Diepen M. Conducting correlation analysis: important limitations and pitfalls. Clin Kidney J. 2021;14:2332–7.

Kanji GK. 100 statistical tests. Sage; 2006.

Book   Google Scholar  

Kc B. A note on the application of advanced statistical methods in medical research. Biomed J Sci Tech Res. 2018;11(2):8476–9.

Lin L, Wu F, Chen W, Zhu C, Huang T. Research on urban medical and health services efficiency and its spatial correlation in china: based on panel data of 13 cities in Jiangsu Province. Paper presented at the Healthcare; 2021.

Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71.

Nosakhare UH, Bright AF. Statistical analysis of strength of W/S test of normality against non-normal distribution using Monte Carlo simulation. Am J Theor Appl Stat. 2017;6(5–1):62–5.

Pandey R. Commonly used t-tests in medical research. J Pract Cardiovasc Sci. 2015;1(2):185.

Rivas T, Pozo-Antonio J, Barral D, Martínez J, Cardell C. Statistical analysis of colour changes in tempera paints mock-ups exposed to urban and marine environment. Measurement. 2018;118:298–310.

Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and interpretation. Anesth Analg. 2018;126(5):1763–8.

Sherwani RAK, Shakeel H, Saleem M, Awan WB, Aslam M, Farooq M. A new neutrosophic sign test: an application to COVID-19 data. PLoS ONE. 2021;16(8): e0255671.

Smarandache F. Neutrosophy Neutrosophic probability, set, and logic, proQuest information & learning, vol. 105. Infinty Study; 1998. p. 118–23.

Smarandache F. Introduction to neutrosophic statistics. Infinite Study; 2014.

Sofińska K, Cieśla M, Barbasz J, Wilkosz N, Lipiec E, Szymoński M, Białas P. Double-strand breaks quantification by statistical length analysis of DNA fragments imaged with AFM. Measurement. 2022;198: 111362.

Wu B, Chang S-K. On testing hypothesis of fuzzy sample mean. Jpn J Ind Appl Math. 2007;24:197–209.

Yazici AC, Öğüş E, Ankarali H, Gürbüz F. An application of nonlinear canonical correlation analysis on medical data. Turk J Med Sci. 2010;40(3):503–10.

Zhang D, Zhao M, Wei G, Chen X. Single-valued neutrosophic TODIM method based on cumulative prospect theory for multi-attribute group decision making and its application to medical emergency management evaluation. Econ Res Ekonomska Istraživanja. 2021;35:4520–36.

Zhang Y, Chen Z, Zhu Z, Wang X. A sampling method for blade measurement based on statistical analysis of profile deviations. Measurement. 2020;163: 107949.

Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Map. 2020;41(13):3807–33.

Download references

Acknowledgements

We express our gratitude to the editor and reviewers for their valuable suggestions, which have significantly contributed to enhancing the quality and presentation of the paper.

Author information

Authors and affiliations.

Department of Statistics, Faculty of Science, King Abdulaziz University, 21551, Jeddah, Saudi Arabia

Muhammad Aslam

You can also search for this author in PubMed   Google Scholar

Contributions

MA wrote the paper.

Corresponding author

Correspondence to Muhammad Aslam .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

No conflict of interest regarding the paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Aslam, M. Analysis of imprecise measurement data utilizing z-test for correlation. J Big Data 11 , 4 (2024). https://doi.org/10.1186/s40537-023-00873-7

Download citation

Received : 14 February 2023

Accepted : 20 December 2023

Published : 02 January 2024

DOI : https://doi.org/10.1186/s40537-023-00873-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Correlation
  • Medical data
  • Imprecise data

research paper that uses z test

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Z Test: Uses, Formula & Examples

By Jim Frost Leave a Comment

What is a Z Test?

Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ.

A Z test is a form of inferential statistics . It uses samples to draw conclusions about populations.

For example, use Z tests to assess the following:

  • One sample : Do students in an honors program have an average IQ score different than a hypothesized value of 100?
  • Two sample : Do two IQ boosting programs have different mean scores?

In this post, learn about when to use a Z test vs T test. Then we’ll review the Z test’s hypotheses, assumptions, interpretation, and formula. Finally, we’ll use the formula in a worked example.

Related post : Difference between Descriptive and Inferential Statistics

Z test vs T test

Z tests and t tests are similar. They both assess the means of one or two groups, have similar assumptions, and allow you to draw the same conclusions about population means.

However, there is one critical difference.

Z tests require you to know the population standard deviation, while t tests use a sample estimate of the standard deviation. Learn more about Population Parameters vs. Sample Statistics .

In practice, analysts rarely use Z tests because it’s rare that they’ll know the population standard deviation. It’s even rarer that they’ll know it and yet need to assess an unknown population mean!

A Z test is often the first hypothesis test students learn because its results are easier to calculate by hand and it builds on the standard normal distribution that they probably already understand. Additionally, students don’t need to know about the degrees of freedom .

Z and T test results converge as the sample size approaches infinity. Indeed, for sample sizes greater than 30, the differences between the two analyses become small.

William Sealy Gosset developed the t test specifically to account for the additional uncertainty associated with smaller samples. Conversely, Z tests are too sensitive to mean differences in smaller samples and can produce statistically significant results incorrectly (i.e., false positives).

When to use a T Test vs Z Test

Let’s put a button on it.

When you know the population standard deviation, use a Z test.

When you have a sample estimate of the standard deviation, which will be the vast majority of the time, the best statistical practice is to use a t test regardless of the sample size.

However, the difference between the two analyses becomes trivial when the sample size exceeds 30.

Learn more about a T-Test Overview: How to Use & Examples and How T-Tests Work .

Z Test Hypotheses

This analysis uses sample data to evaluate hypotheses that refer to population means (µ). The hypotheses depend on whether you’re assessing one or two samples.

One-Sample Z Test Hypotheses

  • Null hypothesis (H 0 ): The population mean equals a hypothesized value (µ = µ 0 ).
  • Alternative hypothesis (H A ): The population mean DOES NOT equal a hypothesized value (µ ≠ µ 0 ).

When the p-value is less or equal to your significance level (e.g., 0.05), reject the null hypothesis. The difference between your sample mean and the hypothesized value is statistically significant. Your sample data support the notion that the population mean does not equal the hypothesized value.

Related posts : Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels

Two-Sample Z Test Hypotheses

  • Null hypothesis (H 0 ): Two population means are equal (µ 1 = µ 2 ).
  • Alternative hypothesis (H A ): Two population means are not equal (µ 1 ≠ µ 2 ).

Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant. Your sample data support the idea that the two population means are different.

These hypotheses are for two-sided analyses. You can use one-sided, directional hypotheses instead. Learn more in my post, One-Tailed and Two-Tailed Hypothesis Tests Explained .

Related posts : How to Interpret P Values and Statistical Significance

Z Test Assumptions

For reliable results, your data should satisfy the following assumptions:

You have a random sample

Drawing a random sample from your target population helps ensure that the sample represents the population. Representative samples are crucial for accurately inferring population properties. The Z test results won’t be valid if your data do not reflect the population.

Related posts : Random Sampling and Representative Samples

Continuous data

Z tests require continuous data . Continuous variables can assume any numeric value, and the scale can be divided meaningfully into smaller increments, such as fractional and decimal values. For example, weight, height, and temperature are continuous.

Other analyses can assess additional data types. For more information, read Comparing Hypothesis Tests for Continuous, Binary, and Count Data .

Your sample data follow a normal distribution, or you have a large sample size

All Z tests assume your data follow a normal distribution . However, due to the central limit theorem, you can ignore this assumption when your sample is large enough.

The following sample size guidelines indicate when normality becomes less of a concern:

  • One-Sample : 20 or more observations.
  • Two-Sample : At least 15 in each group.

Related posts : Central Limit Theorem and Skewed Distributions

Independent samples

For the two-sample analysis, the groups must contain different sets of items. This analysis compares two distinct samples.

Related post : Independent and Dependent Samples

Population standard deviation is known

As I mention in the Z test vs T test section, use a Z test when you know the population standard deviation. However, when n > 30, the difference between the analyses becomes trivial.

Related post : Standard Deviations

Z Test Formula

These Z test formulas allow you to calculate the test statistic. Use the Z statistic to determine statistical significance by comparing it to the appropriate critical values and use it to find p-values.

The correct formula depends on whether you’re performing a one- or two-sample analysis. Both formulas require sample means (x̅) and sample sizes (n) from your sample. Additionally, you specify the population standard deviation (σ) or variance (σ 2 ), which does not come from your sample.

I present a worked example using the Z test formula at the end of this post.

Learn more about Z-Scores and Test Statistics .

One Sample Z Test Formula

One sample Z test formula.

The one sample Z test formula is a ratio.

The numerator is the difference between your sample mean and a hypothesized value for the population mean (µ 0 ). This value is often a strawman argument that you hope to disprove.

The denominator is the standard error of the mean. It represents the uncertainty in how well the sample mean estimates the population mean.

Learn more about the Standard Error of the Mean .

Two Sample Z Test Formula

Two sample Z test formula.

The two sample Z test formula is also a ratio.

The numerator is the difference between your two sample means.

The denominator calculates the pooled standard error of the mean by combining both samples. In this Z test formula, enter the population variances (σ 2 ) for each sample.

Z Test Critical Values

As I mentioned in the Z vs T test section, a Z test does not use degrees of freedom. It evaluates Z-scores in the context of the standard normal distribution. Unlike the t-distribution , the standard normal distribution doesn’t change shape as the sample size changes. Consequently, the critical values don’t change with the sample size.

To find the critical value for a Z test, you need to know the significance level and whether it is one- or two-tailed.

0.01 Two-Tailed ±2.576
0.01 Left Tail –2.326
0.01 Right Tail +2.326
0.05 Two-Tailed ±1.960
0.05 Left Tail +1.650
0.05 Right Tail –1.650

Learn more about Critical Values: Definition, Finding & Calculator .

Z Test Worked Example

Let’s close this post by calculating the results for a Z test by hand!

Suppose we randomly sampled subjects from an honors program. We want to determine whether their mean IQ score differs from the general population. The general population’s IQ scores are defined as having a mean of 100 and a standard deviation of 15.

We’ll determine whether the difference between our sample mean and the hypothesized population mean of 100 is statistically significant.

Specifically, we’ll use a two-tailed analysis with a significance level of 0.05. Looking at the table above, you’ll see that this Z test has critical values of ± 1.960. Our results are statistically significant if our Z statistic is below –1.960 or above +1.960.

The hypotheses are the following:

  • Null (H 0 ): µ = 100
  • Alternative (H A ): µ ≠ 100

Entering Our Results into the Formula

Here are the values from our study that we need to enter into the Z test formula:

  • IQ score sample mean (x̅): 107
  • Sample size (n): 25
  • Hypothesized population mean (µ 0 ): 100
  • Population standard deviation (σ): 15

Using the formula to calculate the results.

The Z-score is 2.333. This value is greater than the critical value of 1.960, making the results statistically significant. Below is a graphical representation of our Z test results showing how the Z statistic falls within the critical region.

Graph displaying the Z statistic falling in the critical region.

We can reject the null and conclude that the mean IQ score for the population of honors students does not equal 100. Based on the sample mean of 107, we know their mean IQ score is higher.

Now let’s find the p-value. We could use technology to do that, such as an online calculator. However, let’s go old school and use a Z table.

To find the p-value that corresponds to a Z-score from a two-tailed analysis, we need to find the negative value of our Z-score (even when it’s positive) and double it.

In the truncated Z-table below, I highlight the cell corresponding to a Z-score of -2.33.

Using a Z-table to find the p-value.

The cell value of 0.00990 represents the area or probability to the left of the Z-score -2.33. We need to double it to include the area > +2.33 to obtain the p-value for a two-tailed analysis.

P-value = 0.00990 * 2 = 0.0198

That p-value is an approximation because it uses a Z-score of 2.33 rather than 2.333. Using an online calculator, the p-value for our Z test is a more precise 0.0196. This p-value is less than our significance level of 0.05, which reconfirms the statistically significant results.

See my full Z-table , which explains how to use it to solve other types of problems.

Share this:

research paper that uses z test

Reader Interactions

Comments and questions cancel reply.

research paper that uses z test

Journals By Subject

  • Proceedings

Information

research paper that uses z test

The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology

Teshome Hailemeskel Abebe

Department of Economics, Ambo University, Ambo, Ethiopia

Contributor Roles: Teshome Hailemeskel Abebe is the sole author. The author read and approved the final manuscript.

Add to Mendeley

research paper that uses z test

The main objective of this paper is to choose an appropriate test statistic for research methodology. Specifically, this article tries to explore the concept of statistical hypothesis test, derivation of the test statistic and its role on research methodology. It also try to show the basic formulating and testing of hypothesis using test statistic since choosing appropriate test statistic is the most important tool of research. To test a hypothesis various statistical test like Z-test, Student’s t-test, F test (like ANOVA), Chi square test were identified. In testing the mean of a population or comparing the means from two continuous populations, the z-test and t-test were used, while the F test is used for comparing more than two means and equality of variance. The chi-square test was used for testing independence, goodness of fit and population variance of single sample in categorical data. Therefore, choosing an appropriate test statistic gives valid results about hypothesis testing.

Published in ( )
DOI
Page(s) 33-40
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License ( ), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2020. Published by Science Publishing Group

Test Statistic, Z-test, Student’s t-test, F Test (Like ANOVA), Chi Square Test, Research Methodology

[1] Alena Košťálová. (2013). Proceedings of the 10th International Conference “Reliability and Statistics in Transportation and Communication” (RelStat’10), 20–23 October 2010, Riga, Latvia, p. 163-171. ISBN 978-9984-818-34-4 Transport and Telecommunication Institute, Lomonosova 1, LV-1019, Riga, Latvia.
[2] Banda Gerald. (2018). A Brief Review of Independent, Dependent and One Sample t-test. International Journal of Applied Mathematics and Theoretical Physics. 4 (2), pp. 50-54. doi: 10.11648/j.ijamtp.20180402.13.
[3] David, J. Pittenger. (2001). Hypothesis Testing as a Moral Choice. Ethics & Behavior, 11 (2), 151-162, DOI: 10.1207/S15327019EB1102_3.
[4] DOWNWARD, L. B., LUKENS, W. W. & BRIDGES, F. (2006). A Variation of the F-test for Determining Statistical Relevance of Particular Parameters in EXAFS Fits. 13th International Conference on X-ray Absorption Fine Structure, 129-131.
[5] J. P. Verma. (2013). Data Analysis in Management with SPSS Software, DOI 10.1007/978-81-322-0786-3_7, Springer, India.
[6] Joginder Kaur. (2013). Techniques Used in Hypothesis Testing in Research Methodology A Review. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value: 6.14 | Impact Factor: 4.438.
[7] Kousar, J. B. and Azeez Ahmed. (2015). The Importance of Statistical Tools in Research Work. International Journal of Scientific and Innovative Mathematical Research (IJSIMR) Volume 3, Issue 12, PP 50-58 ISSN 2347-307X (Print) & ISSN 2347-3142 (Online).
[8] Liang, J. (2011). Testing the Mean for Business Data: Should One Use the Z-Test, T-Test, F-Test, The Chi-Square Test, Or The P-Value Method? Journal of College Teaching and Learning, 3 (7), doi: 10.19030/tlc.v3i7.1704.
[9] LING, M. (2009b). Compendium of Distributions: Beta, Binomial, Chi-Square, F, Gamma, Geometric, Poisson, Student's t, and Uniform. The Python Papers Source Codes 1:4.
[10] MCDONALD, J. H. (2008). Handbook of Biological Statistics. Baltimore, Sparky House Publishing.
[11] Pallant, J. (2007). SPSS Survival Manual: A Step by Step to Data Analysis Using SPSS for Windows (Version 15). Sydney: Allen and Unwin.
[12] Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables that arisen from random sampling. Philos. Mag. 1900, 50, 157-175. Reprinted in K. Pearson (1956), pp. 339-357.
[13] Philip, E. Crewson.(2014). Applied Statistics, First Edition. United States Department of Justice. https://www.researchgate.net/publication/297394168.
[14] Sorana, D. B., Lorentz, J., Adriana, F. S., Radu, E. S. and Doru, C. P. (2011). Pearson-Fisher Chi-Square Statistic Revisited. Journal of Information; 2, 528-545; doi: 10.3390/info2030528; ISSN 2078-2489, www.mdpi.com/journal/information.
[15] Sureiman Onchiri.(2013). Conceptual model on application of chi-square test in education and social sciences. Academic Journals, Educational Research and Reviews; 8 (15); 1231-1241; DOI: 10.5897/ERR11.0305; ISSN 1990-3839.http://www.academicjournals.org/ERR.
[16] Tae Kyun Kim. (2015). T test as a parametric statistic. Korean Society of Anesthesiologists. pISSN 2005-6419, eISSN 2005-7563.
[17] Vinay Pandit. (2015). A Study on Statistical Z Test to Analyses Behavioral Finance Using Psychological Theories, Journal of Research in Applied Mathematics Volume 2~ Issue 1, pp: 01-07 ISSN (Online), 2394-0743 ISSN (Print): 2394-0735. www.questjournals.org.

Teshome Hailemeskel Abebe. (2020). The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Mathematics Letters , 5 (3), 33-40. https://doi.org/10.11648/j.ml.20190503.11

research paper that uses z test

Teshome Hailemeskel Abebe. The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Math. Lett. 2020 , 5 (3), 33-40. doi: 10.11648/j.ml.20190503.11

Teshome Hailemeskel Abebe. The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Math Lett . 2020;5(3):33-40. doi: 10.11648/j.ml.20190503.11

Cite This Article

  • Author Information

Verification Code/

research paper that uses z test

The captcha is required.

Captcha is not valid.

research paper that uses z test

Verification Code

research paper that uses z test

Science Publishing Group (SciencePG) is an Open Access publisher, with more than 300 online, peer-reviewed journals covering a wide range of academic disciplines.

Learn More About SciencePG

research paper that uses z test

  • Special Issues
  • AcademicEvents
  • ScholarProfiles
  • For Authors
  • For Reviewers
  • For Editors
  • For Conference Organizers
  • For Librarians
  • Article Processing Charges
  • Special Issues Guidelines
  • Editorial Process
  • Peer Review at SciencePG
  • Open Access
  • Ethical Guidelines

Important Link

  • Manuscript Submission
  • Propose a Special Issue
  • Join the Editorial Board
  • Become a Reviewer

Z-Test for Statistical Hypothesis Testing Explained

The Z-test is a statistical hypothesis test that determines where the distribution of the statistic we are measuring, like the mean, is part of the normal distribution.

Egor Howell

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Recent Data Science Articles

Machine Learning in Finance: 18 Companies to Know

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Restor Dent Endod
  • v.44(3); 2019 Aug

Logo of rde

Statistical notes for clinical researchers: the independent samples t -test

Hae-young kim.

Department of Health Policy and Management, College of Health Science, and Department of Public Health Science, Graduate School, Korea University, Seoul, Korea.

The t -test is frequently used in comparing 2 group means. The compared groups may be independent to each other such as men and women. Otherwise, compared data are correlated in a case such as comparison of blood pressure levels from the same person before and after medication ( Figure 1 ). In this section we will focus on independent t -test only. There are 2 kinds of independent t -test depending on whether 2 group variances can be assumed equal or not. The t -test is based on the inference using t -distribution.

An external file that holds a picture, illustration, etc.
Object name is rde-44-e26-g001.jpg

T -DISTRIBUTION

The t -distribution was invented in 1908 by William Sealy Gosset, who was working for the Guinness brewery in Dublin, Ireland. As the Guinness brewery did not permit their employee's publishing the research results related to their work, Gosset published his findings by a pseudonym, “Student.” Therefore, the distribution he suggested was called as Student's t -distribution. The t -distribution is a distribution similar to the standard normal distribution, z -distribution, but has lower peak and higher tail compared to it ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is rde-44-e26-g002.jpg

According to the sampling theory, when samples are drawn from a normal-distributed population, the distribution of sample means is expected to be a normal distribution. When we know the variance of population, σ 2 , we can define the distribution of sample means as a normal distribution and adopt z -distribution in statistical inference. However, in reality, we generally never know σ 2 , we use sample variance, s 2 , instead. Although the s 2 is the best estimator for σ 2 , the degree of accuracy of s 2 depends on the sample size. When the sample size is large enough ( e.g. , n = 300), we expect that the sample variance would be very similar to the population variance. However, when sample size is small, such as n = 10, we could guess that the accuracy of sample variance may be not that high. The t -distribution reflects this difference of uncertainty according to sample size. Therefore the shape of t -distribution changes by the degree of freedom (df), which is sample size minus one (n − 1) when one sample mean is tested.

The t -distribution appears to be a family of distribution of which shape varies according to its df ( Figure 2 ). When df is smaller, the t -distribution has lower peak and higher tail compared to those with higher df. The shape of t -distribution approaches to z -distribution as df increases. When df gets large enough, e.g. , n = 300, t -distribution is almost identical with z -distribution. For the inferences of means using small samples, it is necessary to apply t -distribution, while similar inference can be obtain by either t -distribution or z -distribution for a case with a large sample. For inference of 2 means, we generally use t -test based on t -distribution regardless of the sizes of sample because it is always safe, not only for a test with small df but also for that with large df.

INDEPENDENT SAMPLES T -TEST

To adopt z - or t -distribution for inference using small samples, a basic assumption is that the distribution of population is not significantly different from normal distribution. As seen in Appendix 1 , the normality assumption needs to be tested in advance. If normality assumption cannot be met and we have a small sample ( n < 25), then we are not permitted to use ‘parametric’ t -test. Instead, a non-parametric analysis such as Mann-Whitney U test should be selected.

For comparison of 2 independent group means, we can use a z -statistic to test the hypothesis of equal population means only if we know the population variances of 2 groups, σ 1 2 and σ 2 2 , as follows;

where X ̄ 1 and X ̄ 2 , σ 1 2 and σ 2 2 , and n 1 and n 2 are sample means, population variances, and the sizes of 2 groups.

Again, as we never know the population variances, we need to use sample variances as their estimates. There are 2 methods whether 2 population variances could be assumed equal or not. Under assumption of equal variances, the t -test devised by Gosset in 1908, Student's t -test, can be applied. The other version is Welch's t -test introduced in 1947, for the cases where the assumption of equal variances cannot be accepted because quite a big difference is observed between 2 sample variances.

1. Student's t -test

In Student's t -test, the population variances are assumed equal. Therefore, we need only one common variance estimate for 2 groups. The common variance estimate is calculated as a pooled variance, a weighted average of 2 sample variances as follows;

where s 1 2 and s 2 2 are sample variances.

The resulting t -test statistic is a form that both the population variances, σ 1 2 and σ 1 2 , are exchanged with a common variance estimate, s p 2 . The df is given as n 1 + n 2 − 2 for the t -test statistic.

In Appendix 1 , ‘(E-1) Leven's test for equality of variances’ shows that the null hypothesis of equal variances was accepted by the high p value, 0.334 (under heading of Sig.). In ‘(E-2) t -test for equality of means t -values’, the upper line shows the result of Student's t -test. The t -value and df are shown −3.357 and 18. We can get the same figures using the formulas Eq. 2 and Eq. 3, and descriptive statistics in Table 1 , as follows.

GroupNo.MeanStandard deviation value
11010.280.59780.004
21011.080.4590

The result of calculation is a little different from that by SPSS (IBM Corp., Armonk, NY, USA) of Appendix 1 , maybe because of rounding errors.

2. Welch's t -test

Actually there are a lot of cases where the equal variance cannot be assumed. Even if it is unlikely to assume equal variances, we still compare 2 independent group means by performing the Welch's t -test. Welch's t -test is more reliable when the 2 samples have unequal variances and/or unequal sample sizes. We need to maintain the assumption of normality.

Because the population variances are not equal, we have to estimate them separately by 2 sample variances, s 1 2 and s 2 2 . As the result, the form of t -test statistic is given as follows;

where ν is Satterthwaite degrees of freedom.

In Appendix 1 , ‘(E-1) Leven's test for equality of variances’ shows an equal variance can be successfully assumed ( p = 0.334). Therefore, the Welch's t -test is inappropriate for this data. Only for the purpose of exercise, we can try to interpret the results of Welch's t -test shown in the lower line in ‘(E-2) t -test for equality of means t -values’. The t -value and df are shown as −3.357 and 16.875.

We've confirmed nearly same results by calculation using the formula and by SPSS software.

The t -test is one of frequently used analysis methods for comparing 2 group means. However, sometimes we forget the underlying assumptions such as normality assumption or miss the meaning of equal variance assumption. Especially when we have a small sample, we need to check normality assumption first and make a decision between the parametric t -test and the nonparametric Mann-Whitney U test. Also, we need to assess the assumption of equal variances and select either Student's t -test or Welch's t -test.

Procedure of t -test analysis using IBM SPSS

The procedure of t -test analysis using IBM SPSS Statistics for Windows Version 23.0 (IBM Corp., Armonk, NY, USA) is as follows.

An external file that holds a picture, illustration, etc.
Object name is rde-44-e26-a001.jpg

Testing The Mean For Business Data: Should One Use The Z-Test, T-Test, F-Test, The Chi-Square Test, Or The P-Value Method?

  • January 2011
  • Journal of College Teaching & Learning (TLC) 3(7)

Jiajuan Liang at University of New Haven

  • University of New Haven

Abstract and Figures

. Summary information for the tests

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Yassine Elhallaoui

  • SENSORS-BASEL

Amir Aieb

  • Dilafruz Sodikova

Todd Alan DeWees

  • Michael A. Golafshar
  • Amylou C. Dueck

Teshome Hailemeskel Abebe

  • Maciej Marcinowski
  • Vindex Domeh

Francis Obeng

  • Xiaotao Zheng
  • J OPER RES SOC
  • Ann Math Stat
  • S. S. Wilks
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

IMAGES

  1. Z-Test: Definition, Uses in Statistics, and Example

    research paper that uses z test

  2. Z-tests for Hypothesis testing: Formula & Examples

    research paper that uses z test

  3. One Sample Z-Test for the Hypothesis.

    research paper that uses z test

  4. One Sample Z Hypothesis Test

    research paper that uses z test

  5. Research methodology

    research paper that uses z test

  6. PPT

    research paper that uses z test

VIDEO

  1. One-Sample z-test

  2. Z test (part 1)

  3. Z Test in R

  4. 2 Proportions Pooled Hypothesis z-test & Confidence Intervals

  5. hypothesis testing z test

  6. Z-Test for One-Sample Group

COMMENTS

  1. 7484 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on Z-TEST. Find methods information, sources, references or conduct a literature review on Z-TEST

  2. Design of a new Z-test for the uncertainty of Covid-19 events under

    The proposed test Z N ϵ[Z L, Z U] is the extension of several existing tests. The proposed test reduces to the existing Z test under classical statistics when I ZN =0. The proposed test is also an extension of the Z test under fuzzy approach and interval statistics. The proposed test will be implemented as follows.

  3. (PDF) The Validity of t-test and Z-test for Small One ...

    In case of Population C and D, at = 20% , the validity of t-test rose to 54.3% from 29.2%, while for Population A and B, the validity rose to 76.1% from 49.6%. This suggests that there is need. to ...

  4. The use of weighted Z-tests in medical research

    Traditionally the un-weighted Z-tests, which follow the one-patient-one-vote principle, are standard for comparisons of treatment effects. We discuss two types of weighted Z-tests in this manuscript to incorporate data collected in two (or more) stages or in two (or more) regions. We use the type A weighted Z-test to exemplify the variance ...

  5. On the robustification of the z-test statistic

    Abstract and Figures. The z-test statistic is one of the most popular statistics. However, this conventional z-test has a serious pitfall when some of observations in a sample are contaminated. We ...

  6. Optimally weighted Z-test is a powerful method for combining

    The pieces n X X ¯ S ^ X and n Y Y ¯ S ^ Y can be recovered from P-values for the two samples by the inverse normal transformation.This statistic is the weighted Z-test for combining P-values.We can see that Z w approximates Z total when the weights w X, w Y are set to n X, n Y.The same argument holds for more than two samples. Regarding Lancaster's method, Chen noted cautiously that ...

  7. PDF A Study on Statistical Z Test to Analyse Behavioural Finance Using

    Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test which has separate critical values for each sample size.

  8. PDF A Parametric Approach Using Z-Test for Comparing 2 Means to ...

    discipline, z-test or z-score can be implement once the data attained is large sample size which is greater than 30. Conversely, t-test can be implement if the data obtained was below than 30 [15,16,17]. Indeed, most of the articles stand their point to use of equal and unequal t-test approach for their research but the finding can be argue.

  9. The Use of Weighted Z-Tests in Medical Research: Journal of

    We use the type A weighted Z-test to exemplify the variance spending approach in the first part of this manuscript. This approach has been applied to sample size re-estimation. In the second part of the manuscript, we introduce the type B weighted Z-tests and apply them to the design of bridging studies.

  10. Comparison of 2 means (independent z test or independent t test)

    In previous articles, I discussed how to calculate the probability of obtaining data as extreme as those we have observed if the null hypothesis were true. This probability, called the P value, was obtained by first calculating the test statistic. We used this information to conduct a 1-sample z test and a 1-sample t test to see whether there is evidence of a difference in age between our ...

  11. Analysis of imprecise measurement data utilizing z-test for correlation

    The conventional Z-test for correlation, grounded in classical statistics, is typically employed in situations devoid of vague information. However, real-world data often comes with inherent uncertainty, necessitating an adaptation of the Z-test using neutrosophic statistics. This paper introduces a modified Z-test for correlation designed to explore correlations in the presence of imprecise ...

  12. PDF Hypothesis Testing with z Tests

    The z Test: An Example μ= 156.5, 156.5, σ= 14.6, M = 156.11, N = 97 1. Populations, distributions, and assumptions Populations: 1.All students at UMD who have taken the test (not just our sample) 2.All students nationwide who have taken the test Distribution: Sample Ædistribution of means Test & Assumptions: z test 1. Data are interval 2.

  13. Z Test: Uses, Formula & Examples

    Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ. A Z test is a form of inferential statistics. It uses samples to draw conclusions about populations.

  14. PDF Hypothesis Testing with z Tests

    1. How to use a z table. 2. How to implement the basic steps of hypothesis testing. 3. How to conduct a z test to compare a single sample to a known population. The z Table In Chapter 6, we learned that (1) about 68% of scores fall within one score of the z mean, (2) about 96% of scores fall within two z scores of the mean, and (3) nearly all ...

  15. Z Scores, Standard Scores, and Composite Test Scores Explained

    would have a Z score of (12−18)/4, or −1.5; that is, one and a half SDs below the sample mean. Interpreting and Using the Z Scores The raw scores were in different units in the different cognitive tasks. Z scores are all in the same unit, that is, SD. The Z score distribution has a mean of 0 and an SD of 1. Z scores are useful because they

  16. The Derivation and Choice of Appropriate Test Statistic (Z, t, F and

    The main objective of this paper is to choose an appropriate test statistic for research methodology. Specifically, this article tries to explore the concept of statistical hypothesis test, derivation of the test statistic and its role on research methodology. It also try to show the basic formulating and testing of hypothesis using test statistic since choosing appropriate test statistic is ...

  17. Z-Test for Statistical Hypothesis Testing Explained

    A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution. The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations' mean. Z-tests are the most common statistical tests ...

  18. PDF The Z-test

    The Z-test January 9, 2021 Contents Example 1: (one tailed z-test) Example 2: (two tailed z-test) Questions Answers The z-test is a hypothesis test to determine if a single observed mean is signi cantly di erent (or greater or less than) the mean under the null hypothesis, hypwhen you know the standard deviation of the population.

  19. (PDF) Use of Z Score for the Standardization of ...

    Standardization of laboratory test results and their reporting in the form of z score enables the. following: It prevents any reference range differences among regions, races and laboratories. It ...

  20. Statistical notes for clinical researchers: the independent samples t -test

    INDEPENDENT SAMPLES T-TEST. To adopt z- or t-distribution for inference using small samples, a basic assumption is that the distribution of population is not significantly different from normal distribution.As seen in Appendix 1, the normality assumption needs to be tested in advance.If normality assumption cannot be met and we have a small sample (n < 25), then we are not permitted to use ...

  21. PDF THE ONE-SAMPLE z TEST distribute

    independent samples. eCOMPUTING THE z TEST STATISTICThe formula used for computing the value for the one-sample. z test is shown in Formula 10.1. Remember that we are testing whether a sample mean belongs to or is. a fair estimate of a population. The diference between the sample mean (X ) and the population mean (μ) makes up the numer.

  22. Testing The Mean For Business Data: Should One Use The Z-Test, T-Test

    Both the t-test and the z-test are usually used for continuous populations, and the chi-square test is used for categorical data. The F-test is used for comparing more than two means.

  23. PDF One sample Z and t Tests

    3) When conducting a hypothesis test to check the means of samples, if the population standard deviation is known, we can use a z- test. When the population standard deviation is unknown, we use a t-test. 4) It will be 1 -tailed if we are expect ing the sample mean to be either significantly higher or significantly lower than the population mean.