- Open access
- Published: 06 April 2022
Design of a new Z -test for the uncertainty of Covid-19 events under Neutrosophic statistics
- Muhammad Aslam ORCID: orcid.org/0000-0003-0644-1950 1
BMC Medical Research Methodology volume 22 , Article number: 99 ( 2022 ) Cite this article
4826 Accesses
6 Citations
Metrics details
The existing Z-test for uncertainty events does not give information about the measure of indeterminacy/uncertainty associated with the test.
This paper introduces the Z-test for uncertainty events under neutrosophic statistics. The test statistic of the existing test is modified under the philosophy of the Neutrosophy. The testing process is introduced and applied to the Covid-19 data.
Based on the information, the proposed test is interpreted as the probability that there is no reduction in uncertainty of Covid-19 is accepted with a probability of 0.95, committing a type-I error is 0.05 with the measure of an indeterminacy 0.10. Based on the analysis, it is concluded that the proposed test is informative than the existing test. The proposed test is also better than the Z-test for uncertainty under fuzzy-logic as the test using fuzz-logic gives the value of the statistic from 2.20 to 2.42 without any information about the measure of indeterminacy. The test under interval statistic only considers the values within the interval rather than the crisp value.
Conclusions
From the Covid-19 data analysis, it is found that the proposed Z-test for uncertainty events under the neutrosophic statistics is efficient than the existing tests under classical statistics, fuzzy approach, and interval statistics in terms of information, flexibility, power of the test, and adequacy.
Peer Review reports
The Z-test is playing an important role in analyzing the data. The main aim of the Z-test is to test the mean of the unknown population in decision-making. The Z-test for uncertainty events is applied to test the reduction in the uncertainty of past events. This type of test is applied to test the null hypothesis that there is no reduction in uncertainty against the alternative hypothesis that there is a significant reduction in uncertainty of past events. The Z-test for uncertainty events uses the information of the past events for testing the reduction of uncertainty [ 1 ]. discussed the performance of the statistical test under uncertainty [ 2 ]. discussed the design of the Z-test for uncertainty events [ 3 ]. worked on the test in the presence of uncertainty [ 4 ]. worked on the modification of non-parametric test. The applications of [ 5 ], [ 6 ], [ 7 ] and [ 8 ].
[ 9 ] mentioned that “statistical data are frequently not precise numbers but more or less non-precise also called fuzzy. Measurements of continuous variables are always fuzzy to a certain degree”. In such cases, the existing Z-tests cannot be applied for the testing of the mean of population or reduction in uncertainty. Therefore, the existing Z-tests are modified under the fuzzy-logic to deal with uncertain, fuzzy, and vague data [ 10 ]., [ 11 ], [ 12 ], [ 13 ], [ 14 ], [ 15 ], [ 16 ], [ 17 ], [ 18 ], [ 19 ] worked on the various statistical tests using the fuzzy-logic.
Nowadays, neutrosophic logic attracts researchers due to its many applications in a variety of fields. The neutrosophic logic counters the measure of indeterminacy that is considered by the fuzzy logic, see [ 20 ] [ 21 ]. proved that neutrosophic logic is efficient than interval-based analysis. More applications of neutrosophic logic can be seen in [ 22 ], [ 23 ], [ 24 ] and [ 25 ] [ 26 ]. applied the neutrosophic statistics to deal with uncertain data [ 27 ]. and [ 28 ] presented neutrosophic statistical methods to analyze the data. Some applications of neutrosophic tests can be seen in [ 29 ], [ 30 ] and [ 31 ].
The existing Z-test for uncertainty events under classical statistics does not consider the measure of indeterminacy when testing the reduction in events. By exploring the literature and according to the best of our knowledge, there is no work on Z-test for uncertainty events under neutrosophic statistics. In this paper, the medication of Z-test for uncertainty events under neutrosophic statistics will be introduced. The application of the proposed test will be given using the Covid-19 data. It is expected that the proposed Z-test for uncertainty events under neutrosophic statistics will be more efficient than the existing tests in terms of the power of the test, information, and adequacy.
The existing Z-test for uncertainty events can be applied only when the probability of events is known. The existing test does not evaluate the effect of the measure of indeterminacy/uncertainty in the reduction of uncertainty of past events. We now introduce the modification of the Z-test for uncertainty events under neutrosophic statistics. With the aim that the proposed test will be more effective than the existing Z-test for uncertainty events under classical statistics. Let \({A}_N={A}_L+{A}_U{I}_{A_N};{I}_{A_N}\epsilon \left[{I}_{A_L},{I}_{A_U}\right]\) and \({B}_N={B}_L+{B}_U{I}_{B_N};{I}_{B_N}\epsilon \left[{I}_{B_L},{I}_{B_U}\right]\) be two neutrosophic events, where lower values A L , B L denote the determinate part of the events, upper values \({A}_U{I}_{A_N}\) , \({B}_U{I}_{B_N}\) be the indeterminate part, and \({I}_{A_N}\epsilon \left[{I}_{A_L},{I}_{A_U}\right]\) , \({I}_{B_N}\epsilon \left[{I}_{B_L},{I}_{B_U}\right]\) be the measure of indeterminacy associated with these events. Note here that the events A N ϵ [ A L , A U ] and B N ϵ [ B L , B U ] reduces to events under classical statistics (determinate parts) proposed by [ 2 ] if \({I}_{A_L}={I}_{B_L}\) =0. Suppose n N = n L + n U I N ; I N ϵ [ I L , I U ] be a neutrosophic random sample where n L is the lower (determinate) sample size and n U I N be the indeterminate part and I N ϵ [ I L , I U ] be the measure of uncertainty in selecting the sample size. The neutrosophic random sample reduces to random sample if no uncertainty is found in the sample size. The methodology of the proposed Z-test for uncertainty events is explained as follows.
Suppose that the probability that an event A N ϵ [ A L , A U ] occurs (probability of truth) is P ( A N ) ϵ [ P ( A L ), P ( A U )], the probability that an event A N ϵ [ A L , A U ] does not occur (probability of false) is \(P\left({A}_N^c\right)\epsilon \left[P\left({A}_L^c\right),P\left({A}_U^c\right)\right]\) , the probability that an event B N ϵ [ B L , B U ] occurs (probability of truth) is P ( B N ) ϵ [ P ( B L ), P ( B U )], the probability that an event B N ϵ [ B L , B U ] does not occur (probability of false) is \(P\left({B}_N^c\right)\epsilon \left[P\left({B}_L^c\right),P\left({B}_U^c\right)\right]\) . It is important to note that sequential analysis is done to reduce the uncertainty by using past events information. The purpose of the proposed test is whether the reduction of uncertainty is significant or not. Let Z N ϵ [ Z L , Z U ] be neutrosophic test statistic, where Z L and Z U are the lower and upper values of statistic, respectively and defined by.
Note that P ( B + kN | A N ) = P ( B N | A N ) at lag k N , where P ( B N | A N ) ϵ [ P ( B L | A L ), P ( B U | A U )] denotes the conditional probability. It means that the probability of event P ( B N ) ϵ [ P ( B L ), P ( B U )] will be calculated when the event A N ϵ [ A L , A U ] has occurred.
The neutrosophic form of the proposed test statistic, say Z N ϵ [ Z L , Z U ] is defined by.
The alternative form of Eq. ( 2 ) can be written as.
The proposed test Z N ϵ [ Z L , Z U ] is the extension of several existing tests. The proposed test reduces to the existing Z test under classical statistics when I ZN =0. The proposed test is also an extension of the Z test under fuzzy approach and interval statistics.
The proposed test will be implemented as follows.
Step-1: state the null hypothesis H 0 : there is no reduction in uncertainty vs. the alternative hypothesis H 1 : there is a significant reduction in uncertainty.
Step-2: Calculate the statistic Z N ϵ [ Z L , Z U . ]
Step-3: Specify the level of significance α and select the critical value from [ 2 ].
Step-4: Do not accept the null hypothesis if the value of Z N ϵ [ Z L , Z U ] is larger than the critical value.
The application of the proposed test is given in the medical field. The decision-makers are interested to test the reduction in uncertainty of Covid-19 when the measure of indeterminacy/uncertainty is I ZN ϵ [0,0.10]. The decision-makers are interested to test that the reduction in death due to Covid-19 (event A N ) with the increase in Covid-19 vaccines (event B N ). By following [ 2 ], the sequence in which both events occur is given as
where n N ϵ [12, 12], k N ϵ [1, 1], P ( A N ) = 6/12 = 0.5 and P ( B N ) = 6/12 = 0.5.
Note here that event A N occurs 6 times and that of these 6 times B N occurs immediately after A N five times. Given that A N has occurred, we get
( B + kN | A N ) = P ( B N | A N ) = 5/6 = 0.83 at lag 1. The value of Z N ϵ [ Z L , Z U ] is calculated as
\({Z}_N=\left(1+0.1\right)\frac{0.83-0.50}{\sqrt{\frac{0.50\left[1-0.50\right]\left[1-0.50\right]}{\left(12-1\right)0.50}}}=2.42;{I}_{ZN}\epsilon \left[\mathrm{0,0.1}\right]\) . From [ 2 ], the critical value is 1.96.
The proposed test for the example will be implemented as follows
Step-1: state the null hypothesis H 0 : there is no reduction in uncertainty of Covid-19 vs. the alternative hypothesis H 1 : there is a significant reduction in uncertainty of Covid-19.
Step-2: the value of the statistic is 2.42.
Step-3: Specify the level of significance α = 0.05 and select the critical value from [ 2 ] which is 1.96.
Step-4: Do not accept the null hypothesis as the value of Z N is larger than the critical value.
From the analysis, it can be seen that the calculated value of Z N ϵ [ Z L , Z U ] is larger than the critical value of 1.96. Therefore, the null hypothesis H 0 : there is no reduction in uncertainty of Covid-19 will be rejected in favor of H 1 : there is a significant reduction in uncertainty of Covid-19. Based on the study, it is concluded that there is a significant reduction in the uncertainty of Covid-19.
Simulation study
In this section, a simulation study is performed to see the effect of the measure of indeterminacy on statistic Z N ϵ [ Z L , Z U ]. For this purpose, a neutrosophic form of Z N ϵ [ Z L , Z U ] obtained from the real data will be used. The neutrosophic form of Z N ϵ [ Z L , Z U ] is given as
To analyze the effect on H 0 , the various values of I ZN ϵ [ I ZL , I ZU ] are considered. The computed values of Z N ϵ [ Z L , Z U ] along with the decision on H 0 are reported in Table 1 . For this study α = 0.05 and the critical value is 1.96. The null hypothesis H 0 will be accepted if the calculated value of Z N is less than 1.96. From Table 1 , it can be seen that as the values of I ZN ϵ [ I ZL , I ZU ] increases from 0.01 to 2, the values of Z N ϵ [ Z L , Z U ] increases. Although, a decision about H 0 remains the same at all values of measure of indeterminacy I ZN ϵ [ I ZL , I ZU ] but the difference between Z N ϵ [ Z L , Z U ] and the critical value of 1.96 increases as I ZU increases. From the study, it can be concluded that the measure of indeterminacy I ZN ϵ [ I ZL , I ZU ] affects the values of Z N ϵ [ Z L , Z U ].
Comparative studies
As mentioned earlier, the proposed Z-test for uncertainty events is an extension of several tests. In this section, a comparative study is presented in terms of measure of indeterminacy, flexibility and information. We will compare the efficiency of the proposed Z-test for uncertainty with the proposed Z-test for uncertainty under classical statistics, proposed Z-test for uncertainty under fuzzy logic and proposed Z-test for uncertainty under interval statistics. The neutrosophic form of the proposed statistic Z N ϵ [ Z L , Z U ] is expressed as Z N = 2.20 + 2.20 I ZN ; I ZN ϵ [0,0.1]. Note that the first 2.20 presents the existing Z-test for uncertainty under classical statistics, the second part 2.20 I ZN is an indeterminate part and 0.1 is a measure of indeterminacy associated with the test. From the neutrosophic form, it can be seen that the proposed test is flexible as it gives the values of Z N ϵ [ Z L , Z U ] in an interval from 2.20 to 2.42 when I ZU =0. On the other hand, the existing test gives the value of 2.20. In addition, the proposed test uses information about the measure of indeterminacy that the existing test does not consider. Based on the information, the proposed test is interpreted as the probability that H 0 : there is no reduction in uncertainty of Covid-19 is accepted with a probability of 0.95, committing a type-I error is 0.05 with the measure of an indeterminacy 0.10. Based on the analysis, it is concluded that the proposed test is informative than the existing test. The proposed test is also better than the Z-test for uncertainty under fuzzy-logic as the test using fuzz-logic gives the value of the statistic from 2.20 to 2.42 without any information about the measure of indeterminacy. The test under interval statistic only considers the values within the interval rather than the crisp value. On the other hand, the analysis based on neutrosophic considers any type of set. Based on the analysis, it is concluded that the proposed Z-test is efficient than the existing tests in terms of information, flexibility, and indeterminacy.
Comparison using power of the test
In this section, the efficiency of the proposed test is compared with the existing test in terms of the power of the test. The power of the test is defined as the probability of rejecting H 0 when it is false and it is denoted by β . As mentioned earlier, the probability of rejecting H 0 when it is true is known as a type-I error is denoted by α . The values of Z N ϵ [ Z L , Z U ] are simulated using the classical standard normal distribution and neutrosophic standard normal distribution. During the simulation 100 values of Z N ϵ [ Z L , Z U ] are generated from a classical standard normal distribution and neutrosophic standard normal distribution with mean \({\mu}_N={\mu}_L+{\mu}_U{I}_{\mu_N};{I}_{\mu_N}\epsilon \left[{I}_{\mu_L},{I}_{\mu_U}\right]\) , where μ L = 0 presents the mean of classical standard normal distribution, \({\mu}_U{I}_{\mu_N}\) denote the indeterminate value and \({I}_{\mu_N}\epsilon \left[{I}_{\mu_L},{I}_{\mu_U}\right]\) is a measure of indeterminacy. Note that when \({I}_{\mu_L}\) =0, μ N reduces to μ L . The values of Z N ϵ [ Z L , Z U ] are compared with the tabulated value at α =0.05. The values of the power of the test for the existing test and for the proposed test for various values of \({I}_{\mu_U}\) are shown in Table 2 . From Table 2, it is clear that the existing test under classical statistics provides smaller values of the power of the test as compared to the proposed test at all values of \({I}_{\mu_U}\) . For example, when \({I}_{\mu_U}\) =0.1, the power of the test provided by the Z-test for uncertainty events under classical statistics is 0.94 and the power of the test provided by the proposed Z-test for uncertainty events is 0.96. The values of the power of the test for Z-test for uncertainty events under classical statistics and Z-test for uncertainty events under neutrosophic statistics are plotted in Fig. 1 . From Fig. 1, it is quite clear that the power curve of the proposed test is higher than the power curve of the existing test. Based on the analysis, it can be concluded that the proposed Z-test for uncertainty events under neutrosophic statistics is efficient than the existing Z-test for uncertainty events.
The power curves of the two tests
The Z-test of uncertainty was introduced under neutrosophic statistics in this paper. The proposed test was a generalization of the existing Z-test of uncertain events under classical statistics, fuzzy-based test, and interval statistics. The performance of the proposed test was compared with the listed existing tests. From the real data and simulation study, the proposed test was found to be more efficient in terms of information and power of the test. Based on the information, it is recommended to apply the proposed test to check the reduction in uncertainty under an indeterminate environment. The proposed test for big data can be considered as future research. The proposed test using double sampling can also be studied as future research. The estimation of sample size and other properties of the proposed test can be studied in future research.
Availability of data and materials
All data generated or analysed during this study are included in this published article
DOLL H, CARNEY S. Statistical approaches to uncertainty: p values and confidence intervals unpacked. BMJ evidence-based medicine. 2005;10(5):133–4.
Article Google Scholar
Kanji, G.K, 100 statistical tests 2006: Sage.
Lele SR. How should we quantify uncertainty in statistical inference? Front Ecol Evol. 2020;8:35.
Wang F, et al. Re-evaluation of the power of the mann-kendall test for detecting monotonic trends in hydrometeorological time series. Front Earth Sci. 2020;8:14.
Maghsoodloo S, Huang C-Y. Comparing the overlapping of two independent confidence intervals with a single confidence interval for two normal population parameters. J Stat Plan Inference. 2010;140(11):3295–305.
Rono BK, et al. Application of paired student t-test on impact of anti-retroviral therapy on CD4 cell count among HIV Seroconverters in serodiscordant heterosexual relationships: a case study of Nyanza region. Kenya.
Zhou X-H. Inferences about population means of health care costs. Stat Methods Med Res. 2002;11(4):327–39.
Niwitpong S, Niwitpong S-a. Confidence interval for the difference of two normal population means with a known ratio of variances. Appl Math Sci. 2010;4(8):347–59.
Google Scholar
Viertl R. Univariate statistical analysis with fuzzy data. Comput Stat Data Anal. 2006;51(1):133–47.
Filzmoser P, Viertl R. Testing hypotheses with fuzzy data: the fuzzy p-value. Metrika. 2004;59(1):21–9.
Tsai C-C, Chen C-C. Tests of quality characteristics of two populations using paired fuzzy sample differences. Int J Adv Manuf Technol. 2006;27(5):574–9.
Taheri SM, Arefi M. Testing fuzzy hypotheses based on fuzzy test statistic. Soft Comput. 2009;13(6):617–25.
Jamkhaneh EB, Ghara AN. Testing statistical hypotheses with fuzzy data. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics: IEEE; 2010.
Chachi J, Taheri SM, Viertl R. Testing statistical hypotheses based on fuzzy confidence intervals. Austrian J Stat. 2012;41(4):267–86.
Kalpanapriya D, Pandian P. Statistical hypotheses testing with imprecise data. Appl Math Sci. 2012;6(106):5285–92.
Parthiban, S. and P. Gajivaradhan, A Comparative Study of Two-Sample t-Test Under Fuzzy Environments Using Trapezoidal Fuzzy Numbers.
Montenegro M, et al. Two-sample hypothesis tests of means of a fuzzy random variable. Inf Sci. 2001;133(1-2):89–100.
Park S, Lee S-J, Jun S. Patent big data analysis using fuzzy learning. Int J Fuzzy Syst. 2017;19(4):1158–67.
Garg H, Arora R. Generalized Maclaurin symmetric mean aggregation operators based on Archimedean t-norm of the intuitionistic fuzzy soft set information. Artif Intell Rev. 2020:1–41.
Smarandache F. Neutrosophy. Neutrosophic probability, set, and logic, ProQuest Information & Learning, vol. 105. Michigan: Ann Arbor; 1998. p. 118–23.
Broumi S, Smarandache F. Correlation coefficient of interval neutrosophic set. In: Applied mechanics and materials: Trans Tech Publ; 2013.
Abdel-Basset M, et al. A novel group decision making model based on neutrosophic sets for heart disease diagnosis. Multimed Tools Appl. 2019:1–26.
Alhasan KFH, Smarandache F. Neutrosophic Weibull distribution and Neutrosophic Family Weibull Distribution2019. Infinite Study.
Das SK, Edalatpanah S. A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci. 2020;4(2).
El Barbary G, O. and R. Abu Gdairi, Neutrosophic logic-based document summarization. J Undergrad Math. 2021.
Smarandache, F., Introduction to neutrosophic statistic 014: Infinite Study.
Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9(10):208.
Chen J, et al. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9(7):123.
Sherwani RAK, et al. A new neutrosophic sign test: an application to COVID-19 data. PLoS One. 2021;16(8):e0255671.
Article CAS Google Scholar
Aslam M. Neutrosophic statistical test for counts in climatology. Sci Rep. 2021;11(1):1–5.
Albassam M, Khan N, Aslam M. Neutrosophic D’Agostino test of normality: an application to water data. J Undergrad Math. 2021;2021.
Download references
Acknowledgements
We are thankful to the editor and reviewers for their valuable suggestions to improve the quality of the paper.
Author information
Authors and affiliations.
Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah, 21551, Saudi Arabia
Muhammad Aslam
You can also search for this author in PubMed Google Scholar
Contributions
MA wrote the paper.
Corresponding author
Correspondence to Muhammad Aslam .
Ethics declarations
Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
About this article
Cite this article.
Aslam, M. Design of a new Z -test for the uncertainty of Covid-19 events under Neutrosophic statistics. BMC Med Res Methodol 22 , 99 (2022). https://doi.org/10.1186/s12874-022-01593-x
Download citation
Received : 27 September 2021
Accepted : 31 March 2022
Published : 06 April 2022
DOI : https://doi.org/10.1186/s12874-022-01593-x
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Uncertainty
- Classical statistics
BMC Medical Research Methodology
ISSN: 1471-2288
- General enquiries: [email protected]
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- My Bibliography
- Collections
- Citation manager
Save citation to file
Email citation, add to collections.
- Create a new collection
- Add to an existing collection
Add to My Bibliography
Your saved search, create a file for external citation management software, your rss feed.
- Search in PubMed
- Search in NLM Catalog
- Add to Search
The use of weighted Z-tests in medical research
Affiliation.
- 1 Sanofi-aventis, Bridgewater, New Jersey, USA. [email protected]
- PMID: 16022168
- DOI: 10.1081/BIP-200062284
Traditionally the un-weighted Z-tests, which follow the one-patient-one-vote principle, are standard for comparisons of treatment effects. We discuss two types of weighted Z-tests in this manuscript to incorporate data collected in two (or more) stages or in two (or more) regions. We use the type A weighted Z-test to exemplify the variance spending approach in the first part of this manuscript. This approach has been applied to sample size re-estimation. In the second part of the manuscript, we introduce the type B weighted Z-tests and apply them to the design of bridging studies. The weights in the type A weighted Z-tests are pre-determined, independent of the prior observed data, and controls alpha at the desired level. To the contrary, the weights in the type B weighted Z-tests may depend on the prior observed data; and the type I error rate for the bridging study is usually inflated to a level higher than that of a full-scale study. The choice of the weights provides a simple statistical framework for communication between the regulatory agency and the sponsor. The negotiation process may involve practical constrains and some characteristics of prior studies.
PubMed Disclaimer
- Search in MeSH
LinkOut - more resources
Full text sources.
- Taylor & Francis
- Citation Manager
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
- Open access
- Published: 02 January 2024
Analysis of imprecise measurement data utilizing z-test for correlation
- Muhammad Aslam 1
Journal of Big Data volume 11 , Article number: 4 ( 2024 ) Cite this article
1052 Accesses
2 Citations
Metrics details
The conventional Z-test for correlation, grounded in classical statistics, is typically employed in situations devoid of vague information. However, real-world data often comes with inherent uncertainty, necessitating an adaptation of the Z-test using neutrosophic statistics. This paper introduces a modified Z-test for correlation designed to explore correlations in the presence of imprecise data. We will present the simulation to check the effect of the measure of indeterminacy on the evolution of type-I error and the power of the test. The application of this modification is illustrated through an examination of heartbeat and temperature data. Upon analyzing the heartbeat and temperature data, it is determined that, in the face of indeterminacy, the correlation between heartbeat and temperature emerges as significant. This highlights the importance of accounting for imprecise data when investigating relationships between variables.
Introduction
In medical science, correlation analysis has been used to investigate the degree of dependence between two medical variables. The correlation analysis tells about the strength of the relationship between two variables. The Z-test of correlation has been applied to investigating the significance of the correlation between two variables. For example, in medical science, the decision-makers are interested to see the significance of the relationship between blood pressure and diet. Based on correlation analysis, the medical decision-makers can suggest a suitable medication. Therefore, the statistical test investigates the correlation between the significance of the two variables under study. The null hypothesis that the correlation between two variables is insignificant is tested versus the alternative hypothesis that two variables are significantly correlated. Statistical tests have been widely used in medical science for decision-making. Gordon Lan et al. [ 13 ] introduced the weighted Z-tests and applied them in medical science. Bellolio et al. [ 7 ] provided a detailed discussion on the suitability of the statistical tests for medical studies. Mukaka [ 19 ] discussed the suitability of correlation analysis for medical data. Pandey et al. [ 21 ] discussed the importance of t-tests in medical-related problems. Schober et al. [ 23 ] discussed the correlation analysis for anesthesia data. Kc [ 17 ] wrote a review on the applications of statistical methods for medical science. Janse et al. [ 15 ] discussed the limitations of correlation analysis using medical data. More applications of statistical methods in the medical field can be seen in [ 10 , 18 , 29 , 32 ].
Statistical tests have been widely used for the analysis of measurement data. Grzesiek et al. [ 14 ] used the statistical test for the analysis of the temperature data. Avuçlu [ 6 ] presented the work on the detection covid-19 using statistical measurements. More applications of statistical analysis for the measurement data can be seen in [ 22 , 27 , 31 ].
The neutrosophic statistics were developed by [ 26 ] using the idea of neutrosophic logic developed by [ 25 ], and its efficiency of fuzzy logic and interval-analysis is shown by [ 9 ]. The applications of neutrosophic logic in medical science can be read in [ 8 , 30 ]. Neutrosophic statistics are used for the collection of imprecise and interval data, analysis and interpretation of the imprecise data. The efficiency of neutrosophic statistics over classical statistics was discussed by [ 5 , 11 , 12 ]. Later on, the applications of neutrosophic statistics in the field of medical science were given by [ 1 , 3 , 5 , 24 ].
The existing Z-test for correlation cannot be applied when the data is expressed in intervals or when uncertainty in parameters or level of significance is noted. To overcome this issue, in this paper, the Z-test for a single correlation coefficient using neutrosophic statistics will be presented. The test statistic for the proposed test will be developed and the application will be given using the heartbeat and body temperature data. It is expected that the proposed test will be efficient in investigating the significance of the correlation between variables expressed in intervals.
Let \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) and \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) be two neutrosophic random variables of size \({n}_{N}={n}_{L}+{n}_{U}{I}_{{n}_{N}};{I}_{{n}_{N}}\epsilon \left[{I}_{{n}_{L}},{I}_{{n}_{U}}\right]\) follow the neutrosophic normal distribution with the neutrosophic means \({\mu }_{XN}={\mu }_{XL}+{\mu }_{XU}{I}_{{\mu }_{XN}};{I}_{{\mu }_{XN}}\epsilon \left[{I}_{{\mu }_{XL}},{I}_{{\mu }_{XU}}\right]\) and \({\mu }_{YN}={\mu }_{YL}+{\mu }_{YU}{I}_{{\mu }_{YN}};{I}_{{\mu }_{YN}}\epsilon \left[{I}_{{\mu }_{YL}},{I}_{{\mu }_{YU}}\right]\) , and neutrosophic standard deviation \({\sigma }_{XN}={\sigma }_{XL}+{\sigma }_{XU}{I}_{{\sigma }_{XN}};{I}_{{\sigma }_{XN}}\epsilon \left[{I}_{{\sigma }_{XL}},{I}_{{\sigma }_{XU}}\right]\) and \({\sigma }_{YN}={\sigma }_{YL}+{\sigma }_{YU}{I}_{{\sigma }_{YN}};{I}_{{\sigma }_{YN}}\epsilon \left[{I}_{{\sigma }_{YL}},{I}_{{\sigma }_{YU}}\right]\) , respectively. Note that \({X}_{L},{Y}_{L},{n}_{L},{\mu }_{XL}\) , \({\mu }_{YL}\) are the determinate parts denote the classical statistics, \({X}_{U}{I}_{{X}_{N}},{Y}_{U}{I}_{{Y}_{N}},{I}_{{n}_{N}}\epsilon \left[{I}_{{n}_{L}},{I}_{{n}_{U}}\right], {n}_{U}{I}_{{n}_{N}},{\mu }_{XU}{I}_{{\mu }_{XN}},{\mu }_{YU}{I}_{{\mu }_{YN}}\) are indeterminate parts and \({I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right],{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right],{I}_{{\mu }_{XN}}\epsilon \left[{I}_{{\mu }_{XL}},{I}_{{\mu }_{XU}}\right],{I}_{{\mu }_{YN}}\epsilon \left[{I}_{{\mu }_{YL}},{I}_{{\mu }_{YU}}\right]\) are measures of indeterminacy. For designing of the proposed Z-test of a correlation coefficient, it is assumed that variance in \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) should be independent from the variance in \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) . Suppose that \({r}_{N}={r}_{L}+{r}_{U}{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[{I}_{{r}_{L}},{I}_{{r}_{u}}\right]\) is neutrosophic correlation between \({X}_{N}={X}_{L}+{X}_{U}{I}_{{X}_{N}};{I}_{{X}_{N}}\epsilon \left[{I}_{{X}_{L}},{I}_{{X}_{U}}\right]\) and \({Y}_{N}={Y}_{L}+{Y}_{U}{I}_{{Y}_{N}};{I}_{{Y}_{N}}\epsilon \left[{I}_{{Y}_{L}},{I}_{{Y}_{U}}\right]\) that is given by the following [ 2 ] as
where \({\overline{X} }_{L}\) and \({\overline{X} }_{U}\) are the lower and upper values of the neutrosophic sample average. To investigate either correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) differs significantly from the specified correlation \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) , the null hypothesis that the correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is at least \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) vs. the alternative hypothesis correlation coefficient \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is at most \({r}_{0N}\epsilon \left[{r}_{0L},{r}_{0U}\right]\) . Using the Fisher’s transformation, the value of quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is calculated as
The quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) can be written as
Note that \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) follows the neutrosophic normal distribution. The mean, say \({\mu }_{{Z}_{1N}}\) of \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is given by
where \({\rho }_{0N}\) represents the specified value of the correlation coefficient.
The variance, say \({\sigma }_{{Z}_{1N}}\) of \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is given by
The neutrosophic test statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) is defined as
The statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) can be written as
Note that the proposed statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) is an extension of the existing \(Z\) -test. The proposed \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) reduces to the existing \(Z\) -test under classical statistics when \({I}_{{Z}_{L}}\) =0.
Simulation studies
In this section, we will simulate the impact of the measure of indeterminacy on the type-I error, denoted by \(\alpha\) , which represents the probability of rejecting the null hypothesis when it is true. Additionally, we will investigate the way in which indeterminacy affects the power of the test \((1-\beta )\) , with \(\beta\) representing the probability of failing to reject the null hypothesis when it is false. Following the approach outlined by [ 28 ], we define the neutrosophic variable \({\rho }_{0N}\) as \({\rho }_{0L}+{\rho }_{0U}{I}_{N}\) , where \({\rho }_{0L}\) represents the determinate value of the correlation coefficient, \({\rho }_{0U}{I}_{N}\) is the indeterminate part, and \({I}_{N}\) belongs to the interval \(\left[{I}_{L},{I}_{U}\right]\) , representing the degree of indeterminacy. Our analysis will initially focus on the impact of \({I}_{N}\) on the type-I error and subsequently examine its effect on the power of the test.
Effect of \({{\varvec{I}}}_{{\varvec{N}}}\) on type-I error
We will examine the impact of the measure of indeterminacy on the type-I error through a simulation conducted 10 6 times. Following the approach outlined in [ 20 ], the type-I error is computed as the ratio of rejecting the null hypothesis to the total number of replicates. The type-I error values for both the classical statistics test and the proposed test, across various values of \({I}_{N}\) , are depicted in Fig. 1 . The lower curve in Fig. 1 represents the type-I error for the test using classical statistics, while the upper curve displays the values for the proposed test. The observation from Fig. 1 is that, for the classical statistics test, the type-I error remains consistent across all levels of indeterminacy. Conversely, the higher curve indicates an increase in the type-I error as the values of \({I}_{N}\) rise. This suggests a significant effect of the measure of indeterminacy on the evaluation of the type-I error, cautioning decision-makers to exercise care when making decisions regarding hypothesis testing in the presence of uncertainty.
Graphs illustrating the type-I error for both tests
Effect of \({{\varvec{I}}}_{{\varvec{N}}}\) on type-II error
We will assess how the degree of indeterminacy influences the test's efficacy through a simulation conducted a million times. Following the methodology outlined by [ 20 ], the type-II error is calculated as the ratio of incorrect decisions to the total number of replicates. Table 1 presents the type-II error values for both the conventional statistical test and the proposed test across various levels of \({I}_{N}\) . Figure 2 illustrates the trends in test power. In Fig. 2 , the lower curve represents the power of the test using classical statistics, while the upper curve depicts the power of the test for the proposed method. Figure 2 reveals that, for the classical statistics test, the power remains consistent regardless of the level of indeterminacy. In contrast, the higher curve indicates a decline in test power as \({I}_{N}\) values increase. This implies a significant impact of the measure of indeterminacy on test performance. This study suggests that, unlike the classical statistics test, the proposed test's power is affected by the degree of indeterminacy. Consequently, it is concluded that relying on the existing test under classical statistics may lead decision-makers astray when making decisions in the presence of uncertainty.
Power of the test curves
Application
This section presents the application of the proposed \(Z\) -test for correlation using the heartbeat (HBT) and temperature (TMP) data. The medical decision-makers are interested in investigating the relationship between the HBT and TMP. The primary and secondary healthcare department is responsible for the delivery of essential and effective health services in the province of Punjab, Pakistan. Punjab Health Facilities Management Company (PHFMC) on behalf of the health department engages in providing the required services. Basic Health Unit (BHU) is the first level health care unit under the supervision of qualified doctors which usually covers around 10,000 to 25,000 population. Three months (June 2021 to August 2021) patients’ daily data who visited BHU with reporting gastritis (authenticated and reported by a qualified medical doctor). The minimum and maximum values of the patients visited in a day are recorded and the data of two variables is arranged in intervals. The schematic diagram is shown in Fig. 3 . The interval data of HBT and TMP is recorded and reported in Table 2 . From the data given in Table 2 , it can be seen that the medical decision-makers cannot apply the existing Z-test to investigate the significance of the correlation between HBT and TMP. The use of the proposed Z-test for correlation seems suitable for analyzing HBT and TMP data.
The schematic diagram
The proposed Z-test for correlation using the HBT and TMP data is carried out as: the neutrosophic correlation \({r}_{N}\epsilon \left[{r}_{L},{r}_{U}\right]\) is calculated as follows \({r}_{N}\epsilon \left[\mathrm{0.2024,0.4333}\right]\) and expressed in neutrosophic form as \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) . The quantity \({Z}_{1N}\epsilon \left[{Z}_{1L},{Z}_{1U}\right]\) is calculated as \({Z}_{1N}\epsilon \left[\mathrm{0.2052,0.4640}\right]\) . The mean and standard deviation are calculated as \({\mu }_{{Z}_{1N}}\epsilon \left[\mathrm{0.5493,0.5493}\right]\) and \({\sigma }_{{Z}_{1N}}\epsilon \left[\mathrm{0.1222,0.1222}\right]\) , respectively. The proposed test statistic \(\left|{Z}_{N}\epsilon \left[{Z}_{L},{Z}_{U}\right]\right|\) is calculated as \({Z}_{N}=2.8162-0.6981{I}_{{Z}_{N}};{I}_{{Z}_{N}}\epsilon \left[\mathrm{0,3.0341}\right]\) . Suppose that the value of the level of significance \(\alpha\) = 0.05. The proposed test for investigating the relationship between HBT and TMP is implemented as.
Step 1: State null hypothesis \({H}_{0N}:\) correlation between HBT and TMP is \({r}_{0N}=0.50\) vs. the alternative hypothesis \({H}_{1N}:\) correlation between HBT and TMP is less than \({r}_{0N}=0.50\) .
Step 2: For \(\alpha\) =0.10, the tabulated value from [ 16 ] is 1.64.
Step 3: Compare \({Z}_{N}\epsilon [\mathrm{2.8162,0.6981}]\) with 1.64 and reject \({H}_{0N}\) if \({Z}_{N}\epsilon \left[\mathrm{2.8162,0.6981}\right]>1.64\) .
By comparing the values of \({Z}_{N}\epsilon \left[\mathrm{2.8162,0.6981}\right]\) with 1.64, it is clear that the lower value of \({Z}_{L}\) is larger than 1.64, so, the null hypothesis \({H}_{0N}\) will rejected in favor of \({H}_{1N}\) . On the other hand, the upper value of statistic \({Z}_{U}\) is smaller than 1.64 which leads to rejection of \({H}_{1N}\) . From the analysis, it is clear that the determinate part which presents the statistic using classical statistics indicates that the correlation between HBT and TMP is less than 0.50. On the other hand, the indeterminate part shows that the correlation between HBT and TMP is 0.50. Under uncertainty, it is expected that there will be significant correlation between HBT and TMP.
Comparative studies based on HBT and TMP data
Based on the analysis of HBT and TMP data, the comparisons of the proposed Z-test for correlation are carried out with the existing Z-test for correlation using the classical statistics, fuzzy-based test and interval-statistics in terms of information and flexibility of the results. The neutrosophic forms of the correlation \({r}_{N}\) and the test statistic \(\left|{Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\right|\) are presented as \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) and \({Z}_{N}=2.8162-0.6981{I}_{{Z}_{N}};{I}_{{Z}_{N}}\epsilon \left[\mathrm{0,3.0341}\right]\) , respectively. From the results, it can be analyzed that under indeterminacy, the correlation between HBT and TMP may vary from 0.2024 to 0.4333. The values of statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) may vary from \(2.8162\) to \(0.6981\) . Note that as mentioned before the first values 0.2024, and \(2.8162\) present the results of the Z-test for correlation using classical statistics. The statistical test using classical statistics states that the probability of rejecting \({H}_{0N}:\) the correlation between HBT and TMP is \({r}_{0N}=0.50\) when it is true is 0.05 and the probability of accepting \({H}_{0N}:\) a correlation between HBT and TMP is \({r}_{0N}=0.50\) is 0.95. On the other hand, the proposed test for correlation states that the probability of rejecting \({H}_{0N}:\) the correlation between HBT and TMP is \({r}_{0N}=0.50\) when it is true is 0.05, the probability of accepting \({H}_{0N}:\) correlation between HBT and TMP is \({r}_{0N}=0.50\) is 0.95 and the measure of indeterminacy/uncertainty associated with the decision is \(3.0341\) . Similarly, the Z-test using fuzzy-logic gives the information about the statistic \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) in intervals only. According to the fuzzy-based statistical test, it can be expected that the values of \({Z}_{N}\in \left[{Z}_{L},{Z}_{U}\right]\) may vary from \(2.8162\) to \(0.6981\) . The fuzzy-based analysis and interval-analysis only give information in intervals and are unable to give any information about the measure of indeterminacy. From the comparative studies, it is concluded that the proposed Z-test for correlation is more informative than the test using classical statistics, fuzzy-based analysis and interval-based analysis.
Discussions based on HBT and TMP data
The main aim of the paper is to investigate the significance of the relationship between HBT and TMP. The neutrosophic form correlation analysis of HBT and TMP is \({r}_{N}=0.2024+0.4333{I}_{{r}_{N}};{I}_{{r}_{N}}\epsilon \left[\mathrm{0,0.5329}\right]\) . The correlation analysis of HBT and TMP shows that the correlation between HBT and TMP may vary from 0.2024 to 0.4333. As mentioned earlier, the correlation value 0.2024 denotes the correlation using classical statistics. The value \(0.4333{I}_{{r}_{N}}\) denotes the correlation related to the indeterminate part. From the correlation analysis of the determined part that is 0.2024, it can be seen that there is weak correlation between HBT and TMP. It means that the increase in TMP does not increase the HBT significantly. Under indeterminacy, the correlation of the indeterminate part is 0.4333. This means that there is a moderate correlation between HBT and TMP. It means that the increase in TMP may increase HBT. From the correlation analysis, it can be seen that although the correlation of the determinate part is insignificant as the measure of indeterminacy increases, it can increase the correlation between HBT and TMP. Therefore, the decision makers should be careful in dealing with patents having the diseases of HBT and TMP.
Concluding remarks
The paper discussed the adaptation of the Z-test of correlation through the application of neutrosophic statistics. It provided an explanation of the rationale behind employing the proposed Z-test and detailed the neutrosophic test statistic along with the corresponding implementation steps. The simulation study conducted in the paper led to the conclusion that there is a notable impact of indeterminacy on both the type-I error and the power of the test. The paper demonstrated the application of the proposed test using data from HBT and TMP intervals. The findings revealed that, in the presence of indeterminacy or when dealing with interval data, the correlation between HBT and TMP increases as the measure of indeterminacy rises. The analysis suggests that decision-makers can effectively use the proposed test to explore correlations between variables in diverse fields such as medical science, business, and industry. Additionally, the paper suggested avenues for future research, including the exploration of the proposed test using a resampling scheme. It also recommended further investigation into additional statistical properties as potential areas for future research.
Availability of data and materials
The data is given in the paper.
Aslam M. A new method to analyze rock joint roughness coefficient based on neutrosophic statistics. Measurement. 2019;146:65–71.
Article Google Scholar
Aslam M, Albassam M. Application of neutrosophic logic to evaluate correlation between prostate cancer mortality and dietary fat assumption. Symmetry. 2019;11(3):330.
Aslam M, Arif OH, Sherwani RAK. New diagnosis test under the neutrosophic statistics: an application to diabetic patients. BioMed Res Int. 2020. https://doi.org/10.1155/2020/2086185 .
Aslam M. Chi-square test under indeterminacy: an application using pulse count data. BMC Med Res Methodol. 2021;21:1–5.
Aslam M. Assessing the significance of relationship between metrology variables under indeterminacy. Mapan. 2021;37:119–24.
Avuçlu E. COVID-19 detection using X-ray images and statistical measurements. Measurement. 2022;201: 111702.
Bellolio MF, Serrano LA, Stead LG. Understanding statistical tests in the medical literature: which test should I use? Int J Emerg Med. 2008;1(3):197–9.
Broumi S, Deli I. Correlation measure for neutrosophic refined sets and its application in medical diagnosis. Infinite Study; 2015.
Google Scholar
Broumi S, Smarandache F. Correlation coefficient of interval neutrosophic set. Appl Mech Mater. 2013;436:511–7.
Chen J, Talha M. Audit data analysis and application based on correlation analysis algorithm. Comput Math Methods Med. 2021;2021:1–11.
Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9(10):208.
Chen J, Ye J, Du S, Yong R. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9(7):123.
Gordon Lan K, Soo Y, Siu C, Wang M. The use of weighted Z-tests in medical research. J Biopharm Stat. 2005;15:625–39.
Article MathSciNet Google Scholar
Grzesiek A, Zimroz R, Śliwiński P, Gomolla N, Wyłomańska A. Long term belt conveyor gearbox temperature data analysis–statistical tests for anomaly detection. Measurement. 2020;165: 108124.
Janse RJ, Hoekstra T, Jager KJ, Zoccali C, Tripepi G, Dekker FW, van Diepen M. Conducting correlation analysis: important limitations and pitfalls. Clin Kidney J. 2021;14:2332–7.
Kanji GK. 100 statistical tests. Sage; 2006.
Book Google Scholar
Kc B. A note on the application of advanced statistical methods in medical research. Biomed J Sci Tech Res. 2018;11(2):8476–9.
Lin L, Wu F, Chen W, Zhu C, Huang T. Research on urban medical and health services efficiency and its spatial correlation in china: based on panel data of 13 cities in Jiangsu Province. Paper presented at the Healthcare; 2021.
Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71.
Nosakhare UH, Bright AF. Statistical analysis of strength of W/S test of normality against non-normal distribution using Monte Carlo simulation. Am J Theor Appl Stat. 2017;6(5–1):62–5.
Pandey R. Commonly used t-tests in medical research. J Pract Cardiovasc Sci. 2015;1(2):185.
Rivas T, Pozo-Antonio J, Barral D, Martínez J, Cardell C. Statistical analysis of colour changes in tempera paints mock-ups exposed to urban and marine environment. Measurement. 2018;118:298–310.
Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and interpretation. Anesth Analg. 2018;126(5):1763–8.
Sherwani RAK, Shakeel H, Saleem M, Awan WB, Aslam M, Farooq M. A new neutrosophic sign test: an application to COVID-19 data. PLoS ONE. 2021;16(8): e0255671.
Smarandache F. Neutrosophy Neutrosophic probability, set, and logic, proQuest information & learning, vol. 105. Infinty Study; 1998. p. 118–23.
Smarandache F. Introduction to neutrosophic statistics. Infinite Study; 2014.
Sofińska K, Cieśla M, Barbasz J, Wilkosz N, Lipiec E, Szymoński M, Białas P. Double-strand breaks quantification by statistical length analysis of DNA fragments imaged with AFM. Measurement. 2022;198: 111362.
Wu B, Chang S-K. On testing hypothesis of fuzzy sample mean. Jpn J Ind Appl Math. 2007;24:197–209.
Yazici AC, Öğüş E, Ankarali H, Gürbüz F. An application of nonlinear canonical correlation analysis on medical data. Turk J Med Sci. 2010;40(3):503–10.
Zhang D, Zhao M, Wei G, Chen X. Single-valued neutrosophic TODIM method based on cumulative prospect theory for multi-attribute group decision making and its application to medical emergency management evaluation. Econ Res Ekonomska Istraživanja. 2021;35:4520–36.
Zhang Y, Chen Z, Zhu Z, Wang X. A sampling method for blade measurement based on statistical analysis of profile deviations. Measurement. 2020;163: 107949.
Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Map. 2020;41(13):3807–33.
Download references
Acknowledgements
We express our gratitude to the editor and reviewers for their valuable suggestions, which have significantly contributed to enhancing the quality and presentation of the paper.
Author information
Authors and affiliations.
Department of Statistics, Faculty of Science, King Abdulaziz University, 21551, Jeddah, Saudi Arabia
Muhammad Aslam
You can also search for this author in PubMed Google Scholar
Contributions
MA wrote the paper.
Corresponding author
Correspondence to Muhammad Aslam .
Ethics declarations
Ethics approval and consent to participate.
Not applicable.
Consent for publication
Competing interests.
No conflict of interest regarding the paper.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Aslam, M. Analysis of imprecise measurement data utilizing z-test for correlation. J Big Data 11 , 4 (2024). https://doi.org/10.1186/s40537-023-00873-7
Download citation
Received : 14 February 2023
Accepted : 20 December 2023
Published : 02 January 2024
DOI : https://doi.org/10.1186/s40537-023-00873-7
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Correlation
- Medical data
- Imprecise data
- Skip to secondary menu
- Skip to main content
- Skip to primary sidebar
Statistics By Jim
Making statistics intuitive
Z Test: Uses, Formula & Examples
By Jim Frost Leave a Comment
What is a Z Test?
Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ.
A Z test is a form of inferential statistics . It uses samples to draw conclusions about populations.
For example, use Z tests to assess the following:
- One sample : Do students in an honors program have an average IQ score different than a hypothesized value of 100?
- Two sample : Do two IQ boosting programs have different mean scores?
In this post, learn about when to use a Z test vs T test. Then we’ll review the Z test’s hypotheses, assumptions, interpretation, and formula. Finally, we’ll use the formula in a worked example.
Related post : Difference between Descriptive and Inferential Statistics
Z test vs T test
Z tests and t tests are similar. They both assess the means of one or two groups, have similar assumptions, and allow you to draw the same conclusions about population means.
However, there is one critical difference.
Z tests require you to know the population standard deviation, while t tests use a sample estimate of the standard deviation. Learn more about Population Parameters vs. Sample Statistics .
In practice, analysts rarely use Z tests because it’s rare that they’ll know the population standard deviation. It’s even rarer that they’ll know it and yet need to assess an unknown population mean!
A Z test is often the first hypothesis test students learn because its results are easier to calculate by hand and it builds on the standard normal distribution that they probably already understand. Additionally, students don’t need to know about the degrees of freedom .
Z and T test results converge as the sample size approaches infinity. Indeed, for sample sizes greater than 30, the differences between the two analyses become small.
William Sealy Gosset developed the t test specifically to account for the additional uncertainty associated with smaller samples. Conversely, Z tests are too sensitive to mean differences in smaller samples and can produce statistically significant results incorrectly (i.e., false positives).
When to use a T Test vs Z Test
Let’s put a button on it.
When you know the population standard deviation, use a Z test.
When you have a sample estimate of the standard deviation, which will be the vast majority of the time, the best statistical practice is to use a t test regardless of the sample size.
However, the difference between the two analyses becomes trivial when the sample size exceeds 30.
Learn more about a T-Test Overview: How to Use & Examples and How T-Tests Work .
Z Test Hypotheses
This analysis uses sample data to evaluate hypotheses that refer to population means (µ). The hypotheses depend on whether you’re assessing one or two samples.
One-Sample Z Test Hypotheses
- Null hypothesis (H 0 ): The population mean equals a hypothesized value (µ = µ 0 ).
- Alternative hypothesis (H A ): The population mean DOES NOT equal a hypothesized value (µ ≠ µ 0 ).
When the p-value is less or equal to your significance level (e.g., 0.05), reject the null hypothesis. The difference between your sample mean and the hypothesized value is statistically significant. Your sample data support the notion that the population mean does not equal the hypothesized value.
Related posts : Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels
Two-Sample Z Test Hypotheses
- Null hypothesis (H 0 ): Two population means are equal (µ 1 = µ 2 ).
- Alternative hypothesis (H A ): Two population means are not equal (µ 1 ≠ µ 2 ).
Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant. Your sample data support the idea that the two population means are different.
These hypotheses are for two-sided analyses. You can use one-sided, directional hypotheses instead. Learn more in my post, One-Tailed and Two-Tailed Hypothesis Tests Explained .
Related posts : How to Interpret P Values and Statistical Significance
Z Test Assumptions
For reliable results, your data should satisfy the following assumptions:
You have a random sample
Drawing a random sample from your target population helps ensure that the sample represents the population. Representative samples are crucial for accurately inferring population properties. The Z test results won’t be valid if your data do not reflect the population.
Related posts : Random Sampling and Representative Samples
Continuous data
Z tests require continuous data . Continuous variables can assume any numeric value, and the scale can be divided meaningfully into smaller increments, such as fractional and decimal values. For example, weight, height, and temperature are continuous.
Other analyses can assess additional data types. For more information, read Comparing Hypothesis Tests for Continuous, Binary, and Count Data .
Your sample data follow a normal distribution, or you have a large sample size
All Z tests assume your data follow a normal distribution . However, due to the central limit theorem, you can ignore this assumption when your sample is large enough.
The following sample size guidelines indicate when normality becomes less of a concern:
- One-Sample : 20 or more observations.
- Two-Sample : At least 15 in each group.
Related posts : Central Limit Theorem and Skewed Distributions
Independent samples
For the two-sample analysis, the groups must contain different sets of items. This analysis compares two distinct samples.
Related post : Independent and Dependent Samples
Population standard deviation is known
As I mention in the Z test vs T test section, use a Z test when you know the population standard deviation. However, when n > 30, the difference between the analyses becomes trivial.
Related post : Standard Deviations
Z Test Formula
These Z test formulas allow you to calculate the test statistic. Use the Z statistic to determine statistical significance by comparing it to the appropriate critical values and use it to find p-values.
The correct formula depends on whether you’re performing a one- or two-sample analysis. Both formulas require sample means (x̅) and sample sizes (n) from your sample. Additionally, you specify the population standard deviation (σ) or variance (σ 2 ), which does not come from your sample.
I present a worked example using the Z test formula at the end of this post.
Learn more about Z-Scores and Test Statistics .
One Sample Z Test Formula
The one sample Z test formula is a ratio.
The numerator is the difference between your sample mean and a hypothesized value for the population mean (µ 0 ). This value is often a strawman argument that you hope to disprove.
The denominator is the standard error of the mean. It represents the uncertainty in how well the sample mean estimates the population mean.
Learn more about the Standard Error of the Mean .
Two Sample Z Test Formula
The two sample Z test formula is also a ratio.
The numerator is the difference between your two sample means.
The denominator calculates the pooled standard error of the mean by combining both samples. In this Z test formula, enter the population variances (σ 2 ) for each sample.
Z Test Critical Values
As I mentioned in the Z vs T test section, a Z test does not use degrees of freedom. It evaluates Z-scores in the context of the standard normal distribution. Unlike the t-distribution , the standard normal distribution doesn’t change shape as the sample size changes. Consequently, the critical values don’t change with the sample size.
To find the critical value for a Z test, you need to know the significance level and whether it is one- or two-tailed.
0.01 | Two-Tailed | ±2.576 |
0.01 | Left Tail | –2.326 |
0.01 | Right Tail | +2.326 |
0.05 | Two-Tailed | ±1.960 |
0.05 | Left Tail | +1.650 |
0.05 | Right Tail | –1.650 |
Learn more about Critical Values: Definition, Finding & Calculator .
Z Test Worked Example
Let’s close this post by calculating the results for a Z test by hand!
Suppose we randomly sampled subjects from an honors program. We want to determine whether their mean IQ score differs from the general population. The general population’s IQ scores are defined as having a mean of 100 and a standard deviation of 15.
We’ll determine whether the difference between our sample mean and the hypothesized population mean of 100 is statistically significant.
Specifically, we’ll use a two-tailed analysis with a significance level of 0.05. Looking at the table above, you’ll see that this Z test has critical values of ± 1.960. Our results are statistically significant if our Z statistic is below –1.960 or above +1.960.
The hypotheses are the following:
- Null (H 0 ): µ = 100
- Alternative (H A ): µ ≠ 100
Entering Our Results into the Formula
Here are the values from our study that we need to enter into the Z test formula:
- IQ score sample mean (x̅): 107
- Sample size (n): 25
- Hypothesized population mean (µ 0 ): 100
- Population standard deviation (σ): 15
The Z-score is 2.333. This value is greater than the critical value of 1.960, making the results statistically significant. Below is a graphical representation of our Z test results showing how the Z statistic falls within the critical region.
We can reject the null and conclude that the mean IQ score for the population of honors students does not equal 100. Based on the sample mean of 107, we know their mean IQ score is higher.
Now let’s find the p-value. We could use technology to do that, such as an online calculator. However, let’s go old school and use a Z table.
To find the p-value that corresponds to a Z-score from a two-tailed analysis, we need to find the negative value of our Z-score (even when it’s positive) and double it.
In the truncated Z-table below, I highlight the cell corresponding to a Z-score of -2.33.
The cell value of 0.00990 represents the area or probability to the left of the Z-score -2.33. We need to double it to include the area > +2.33 to obtain the p-value for a two-tailed analysis.
P-value = 0.00990 * 2 = 0.0198
That p-value is an approximation because it uses a Z-score of 2.33 rather than 2.333. Using an online calculator, the p-value for our Z test is a more precise 0.0196. This p-value is less than our significance level of 0.05, which reconfirms the statistically significant results.
See my full Z-table , which explains how to use it to solve other types of problems.
Share this:
Reader Interactions
Comments and questions cancel reply.
Journals By Subject
- Proceedings
Information
The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology
Teshome Hailemeskel Abebe
Department of Economics, Ambo University, Ambo, Ethiopia
Contributor Roles: Teshome Hailemeskel Abebe is the sole author. The author read and approved the final manuscript.
Add to Mendeley
The main objective of this paper is to choose an appropriate test statistic for research methodology. Specifically, this article tries to explore the concept of statistical hypothesis test, derivation of the test statistic and its role on research methodology. It also try to show the basic formulating and testing of hypothesis using test statistic since choosing appropriate test statistic is the most important tool of research. To test a hypothesis various statistical test like Z-test, Student’s t-test, F test (like ANOVA), Chi square test were identified. In testing the mean of a population or comparing the means from two continuous populations, the z-test and t-test were used, while the F test is used for comparing more than two means and equality of variance. The chi-square test was used for testing independence, goodness of fit and population variance of single sample in categorical data. Therefore, choosing an appropriate test statistic gives valid results about hypothesis testing.
Published in | ( ) |
DOI | |
Page(s) | 33-40 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License ( ), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright | Copyright © The Author(s), 2020. Published by Science Publishing Group |
Test Statistic, Z-test, Student’s t-test, F Test (Like ANOVA), Chi Square Test, Research Methodology
[1] | Alena Košťálová. (2013). Proceedings of the 10th International Conference “Reliability and Statistics in Transportation and Communication” (RelStat’10), 20–23 October 2010, Riga, Latvia, p. 163-171. ISBN 978-9984-818-34-4 Transport and Telecommunication Institute, Lomonosova 1, LV-1019, Riga, Latvia. |
[2] | Banda Gerald. (2018). A Brief Review of Independent, Dependent and One Sample t-test. International Journal of Applied Mathematics and Theoretical Physics. 4 (2), pp. 50-54. doi: 10.11648/j.ijamtp.20180402.13. |
[3] | David, J. Pittenger. (2001). Hypothesis Testing as a Moral Choice. Ethics & Behavior, 11 (2), 151-162, DOI: 10.1207/S15327019EB1102_3. |
[4] | DOWNWARD, L. B., LUKENS, W. W. & BRIDGES, F. (2006). A Variation of the F-test for Determining Statistical Relevance of Particular Parameters in EXAFS Fits. 13th International Conference on X-ray Absorption Fine Structure, 129-131. |
[5] | J. P. Verma. (2013). Data Analysis in Management with SPSS Software, DOI 10.1007/978-81-322-0786-3_7, Springer, India. |
[6] | Joginder Kaur. (2013). Techniques Used in Hypothesis Testing in Research Methodology A Review. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value: 6.14 | Impact Factor: 4.438. |
[7] | Kousar, J. B. and Azeez Ahmed. (2015). The Importance of Statistical Tools in Research Work. International Journal of Scientific and Innovative Mathematical Research (IJSIMR) Volume 3, Issue 12, PP 50-58 ISSN 2347-307X (Print) & ISSN 2347-3142 (Online). |
[8] | Liang, J. (2011). Testing the Mean for Business Data: Should One Use the Z-Test, T-Test, F-Test, The Chi-Square Test, Or The P-Value Method? Journal of College Teaching and Learning, 3 (7), doi: 10.19030/tlc.v3i7.1704. |
[9] | LING, M. (2009b). Compendium of Distributions: Beta, Binomial, Chi-Square, F, Gamma, Geometric, Poisson, Student's t, and Uniform. The Python Papers Source Codes 1:4. |
[10] | MCDONALD, J. H. (2008). Handbook of Biological Statistics. Baltimore, Sparky House Publishing. |
[11] | Pallant, J. (2007). SPSS Survival Manual: A Step by Step to Data Analysis Using SPSS for Windows (Version 15). Sydney: Allen and Unwin. |
[12] | Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables that arisen from random sampling. Philos. Mag. 1900, 50, 157-175. Reprinted in K. Pearson (1956), pp. 339-357. |
[13] | Philip, E. Crewson.(2014). Applied Statistics, First Edition. United States Department of Justice. https://www.researchgate.net/publication/297394168. |
[14] | Sorana, D. B., Lorentz, J., Adriana, F. S., Radu, E. S. and Doru, C. P. (2011). Pearson-Fisher Chi-Square Statistic Revisited. Journal of Information; 2, 528-545; doi: 10.3390/info2030528; ISSN 2078-2489, www.mdpi.com/journal/information. |
[15] | Sureiman Onchiri.(2013). Conceptual model on application of chi-square test in education and social sciences. Academic Journals, Educational Research and Reviews; 8 (15); 1231-1241; DOI: 10.5897/ERR11.0305; ISSN 1990-3839.http://www.academicjournals.org/ERR. |
[16] | Tae Kyun Kim. (2015). T test as a parametric statistic. Korean Society of Anesthesiologists. pISSN 2005-6419, eISSN 2005-7563. |
[17] | Vinay Pandit. (2015). A Study on Statistical Z Test to Analyses Behavioral Finance Using Psychological Theories, Journal of Research in Applied Mathematics Volume 2~ Issue 1, pp: 01-07 ISSN (Online), 2394-0743 ISSN (Print): 2394-0735. www.questjournals.org. |
Teshome Hailemeskel Abebe. (2020). The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Mathematics Letters , 5 (3), 33-40. https://doi.org/10.11648/j.ml.20190503.11
Teshome Hailemeskel Abebe. The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Math. Lett. 2020 , 5 (3), 33-40. doi: 10.11648/j.ml.20190503.11
Teshome Hailemeskel Abebe. The Derivation and Choice of Appropriate Test Statistic (Z, t, F and Chi-Square Test) in Research Methodology. Math Lett . 2020;5(3):33-40. doi: 10.11648/j.ml.20190503.11
Cite This Article
- Author Information
Verification Code/
The captcha is required.
Captcha is not valid.
Verification Code
Science Publishing Group (SciencePG) is an Open Access publisher, with more than 300 online, peer-reviewed journals covering a wide range of academic disciplines.
Learn More About SciencePG
- Special Issues
- AcademicEvents
- ScholarProfiles
- For Authors
- For Reviewers
- For Editors
- For Conference Organizers
- For Librarians
- Article Processing Charges
- Special Issues Guidelines
- Editorial Process
- Peer Review at SciencePG
- Open Access
- Ethical Guidelines
Important Link
- Manuscript Submission
- Propose a Special Issue
- Join the Editorial Board
- Become a Reviewer
Z-Test for Statistical Hypothesis Testing Explained
The Z-test is a statistical hypothesis test that determines where the distribution of the statistic we are measuring, like the mean, is part of the normal distribution.
The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .
There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.
What Is a Z-Test?
A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.
The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.
Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.
Requirements for a Z-Test
In order to conduct a Z-test, your statistics need to meet a few requirements, including:
- A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
- The standard deviation and mean of the population is known .
- The sample data is collected/acquired randomly .
More on Data Science: What Is Bootstrapping Statistics?
Z-Test Steps
There are four steps to complete a Z-test. Let’s examine each one.
4 Steps to a Z-Test
- State the null hypothesis.
- State the alternate hypothesis.
- Choose your critical value.
- Calculate your Z-test statistics.
1. State the Null Hypothesis
The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :
2. State the Alternate Hypothesis
Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:
3. Choose Your Critical Value
Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:
This critical value is based on confidence intervals.
4. Calculate Your Z-Test Statistic
Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :
If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.
Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.
More on Data Science: Basic Probability Theory and Statistics Terms to Know
Z-Test Example
Let’s go through an example to fully understand the one-sample mean Z-test.
A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?
The null and alternate hypotheses are:
Where we are saying that our sample, the school, has a higher mean IQ than the population mean.
Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.
If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.
Computing our Z-test statistic, we see:
Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.
Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.
Recent Data Science Articles
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- Restor Dent Endod
- v.44(3); 2019 Aug
Statistical notes for clinical researchers: the independent samples t -test
Hae-young kim.
Department of Health Policy and Management, College of Health Science, and Department of Public Health Science, Graduate School, Korea University, Seoul, Korea.
The t -test is frequently used in comparing 2 group means. The compared groups may be independent to each other such as men and women. Otherwise, compared data are correlated in a case such as comparison of blood pressure levels from the same person before and after medication ( Figure 1 ). In this section we will focus on independent t -test only. There are 2 kinds of independent t -test depending on whether 2 group variances can be assumed equal or not. The t -test is based on the inference using t -distribution.
T -DISTRIBUTION
The t -distribution was invented in 1908 by William Sealy Gosset, who was working for the Guinness brewery in Dublin, Ireland. As the Guinness brewery did not permit their employee's publishing the research results related to their work, Gosset published his findings by a pseudonym, “Student.” Therefore, the distribution he suggested was called as Student's t -distribution. The t -distribution is a distribution similar to the standard normal distribution, z -distribution, but has lower peak and higher tail compared to it ( Figure 2 ).
According to the sampling theory, when samples are drawn from a normal-distributed population, the distribution of sample means is expected to be a normal distribution. When we know the variance of population, σ 2 , we can define the distribution of sample means as a normal distribution and adopt z -distribution in statistical inference. However, in reality, we generally never know σ 2 , we use sample variance, s 2 , instead. Although the s 2 is the best estimator for σ 2 , the degree of accuracy of s 2 depends on the sample size. When the sample size is large enough ( e.g. , n = 300), we expect that the sample variance would be very similar to the population variance. However, when sample size is small, such as n = 10, we could guess that the accuracy of sample variance may be not that high. The t -distribution reflects this difference of uncertainty according to sample size. Therefore the shape of t -distribution changes by the degree of freedom (df), which is sample size minus one (n − 1) when one sample mean is tested.
The t -distribution appears to be a family of distribution of which shape varies according to its df ( Figure 2 ). When df is smaller, the t -distribution has lower peak and higher tail compared to those with higher df. The shape of t -distribution approaches to z -distribution as df increases. When df gets large enough, e.g. , n = 300, t -distribution is almost identical with z -distribution. For the inferences of means using small samples, it is necessary to apply t -distribution, while similar inference can be obtain by either t -distribution or z -distribution for a case with a large sample. For inference of 2 means, we generally use t -test based on t -distribution regardless of the sizes of sample because it is always safe, not only for a test with small df but also for that with large df.
INDEPENDENT SAMPLES T -TEST
To adopt z - or t -distribution for inference using small samples, a basic assumption is that the distribution of population is not significantly different from normal distribution. As seen in Appendix 1 , the normality assumption needs to be tested in advance. If normality assumption cannot be met and we have a small sample ( n < 25), then we are not permitted to use ‘parametric’ t -test. Instead, a non-parametric analysis such as Mann-Whitney U test should be selected.
For comparison of 2 independent group means, we can use a z -statistic to test the hypothesis of equal population means only if we know the population variances of 2 groups, σ 1 2 and σ 2 2 , as follows;
where X ̄ 1 and X ̄ 2 , σ 1 2 and σ 2 2 , and n 1 and n 2 are sample means, population variances, and the sizes of 2 groups.
Again, as we never know the population variances, we need to use sample variances as their estimates. There are 2 methods whether 2 population variances could be assumed equal or not. Under assumption of equal variances, the t -test devised by Gosset in 1908, Student's t -test, can be applied. The other version is Welch's t -test introduced in 1947, for the cases where the assumption of equal variances cannot be accepted because quite a big difference is observed between 2 sample variances.
1. Student's t -test
In Student's t -test, the population variances are assumed equal. Therefore, we need only one common variance estimate for 2 groups. The common variance estimate is calculated as a pooled variance, a weighted average of 2 sample variances as follows;
where s 1 2 and s 2 2 are sample variances.
The resulting t -test statistic is a form that both the population variances, σ 1 2 and σ 1 2 , are exchanged with a common variance estimate, s p 2 . The df is given as n 1 + n 2 − 2 for the t -test statistic.
In Appendix 1 , ‘(E-1) Leven's test for equality of variances’ shows that the null hypothesis of equal variances was accepted by the high p value, 0.334 (under heading of Sig.). In ‘(E-2) t -test for equality of means t -values’, the upper line shows the result of Student's t -test. The t -value and df are shown −3.357 and 18. We can get the same figures using the formulas Eq. 2 and Eq. 3, and descriptive statistics in Table 1 , as follows.
Group | No. | Mean | Standard deviation | value |
---|---|---|---|---|
1 | 10 | 10.28 | 0.5978 | 0.004 |
2 | 10 | 11.08 | 0.4590 |
The result of calculation is a little different from that by SPSS (IBM Corp., Armonk, NY, USA) of Appendix 1 , maybe because of rounding errors.
2. Welch's t -test
Actually there are a lot of cases where the equal variance cannot be assumed. Even if it is unlikely to assume equal variances, we still compare 2 independent group means by performing the Welch's t -test. Welch's t -test is more reliable when the 2 samples have unequal variances and/or unequal sample sizes. We need to maintain the assumption of normality.
Because the population variances are not equal, we have to estimate them separately by 2 sample variances, s 1 2 and s 2 2 . As the result, the form of t -test statistic is given as follows;
where ν is Satterthwaite degrees of freedom.
In Appendix 1 , ‘(E-1) Leven's test for equality of variances’ shows an equal variance can be successfully assumed ( p = 0.334). Therefore, the Welch's t -test is inappropriate for this data. Only for the purpose of exercise, we can try to interpret the results of Welch's t -test shown in the lower line in ‘(E-2) t -test for equality of means t -values’. The t -value and df are shown as −3.357 and 16.875.
We've confirmed nearly same results by calculation using the formula and by SPSS software.
The t -test is one of frequently used analysis methods for comparing 2 group means. However, sometimes we forget the underlying assumptions such as normality assumption or miss the meaning of equal variance assumption. Especially when we have a small sample, we need to check normality assumption first and make a decision between the parametric t -test and the nonparametric Mann-Whitney U test. Also, we need to assess the assumption of equal variances and select either Student's t -test or Welch's t -test.
Procedure of t -test analysis using IBM SPSS
The procedure of t -test analysis using IBM SPSS Statistics for Windows Version 23.0 (IBM Corp., Armonk, NY, USA) is as follows.
Testing The Mean For Business Data: Should One Use The Z-Test, T-Test, F-Test, The Chi-Square Test, Or The P-Value Method?
- January 2011
- Journal of College Teaching & Learning (TLC) 3(7)
- University of New Haven
Abstract and Figures
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
- SENSORS-BASEL
- Dilafruz Sodikova
- Michael A. Golafshar
- Amylou C. Dueck
- Maciej Marcinowski
- Vindex Domeh
- Xiaotao Zheng
- J OPER RES SOC
- Ann Math Stat
- S. S. Wilks
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
IMAGES
VIDEO
COMMENTS
Explore the latest full-text research PDFs, articles, conference papers, preprints and more on Z-TEST. Find methods information, sources, references or conduct a literature review on Z-TEST
The proposed test Z N ϵ[Z L, Z U] is the extension of several existing tests. The proposed test reduces to the existing Z test under classical statistics when I ZN =0. The proposed test is also an extension of the Z test under fuzzy approach and interval statistics. The proposed test will be implemented as follows.
In case of Population C and D, at = 20% , the validity of t-test rose to 54.3% from 29.2%, while for Population A and B, the validity rose to 76.1% from 49.6%. This suggests that there is need. to ...
Traditionally the un-weighted Z-tests, which follow the one-patient-one-vote principle, are standard for comparisons of treatment effects. We discuss two types of weighted Z-tests in this manuscript to incorporate data collected in two (or more) stages or in two (or more) regions. We use the type A weighted Z-test to exemplify the variance ...
Abstract and Figures. The z-test statistic is one of the most popular statistics. However, this conventional z-test has a serious pitfall when some of observations in a sample are contaminated. We ...
The pieces n X X ¯ S ^ X and n Y Y ¯ S ^ Y can be recovered from P-values for the two samples by the inverse normal transformation.This statistic is the weighted Z-test for combining P-values.We can see that Z w approximates Z total when the weights w X, w Y are set to n X, n Y.The same argument holds for more than two samples. Regarding Lancaster's method, Chen noted cautiously that ...
Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test which has separate critical values for each sample size.
discipline, z-test or z-score can be implement once the data attained is large sample size which is greater than 30. Conversely, t-test can be implement if the data obtained was below than 30 [15,16,17]. Indeed, most of the articles stand their point to use of equal and unequal t-test approach for their research but the finding can be argue.
We use the type A weighted Z-test to exemplify the variance spending approach in the first part of this manuscript. This approach has been applied to sample size re-estimation. In the second part of the manuscript, we introduce the type B weighted Z-tests and apply them to the design of bridging studies.
In previous articles, I discussed how to calculate the probability of obtaining data as extreme as those we have observed if the null hypothesis were true. This probability, called the P value, was obtained by first calculating the test statistic. We used this information to conduct a 1-sample z test and a 1-sample t test to see whether there is evidence of a difference in age between our ...
The conventional Z-test for correlation, grounded in classical statistics, is typically employed in situations devoid of vague information. However, real-world data often comes with inherent uncertainty, necessitating an adaptation of the Z-test using neutrosophic statistics. This paper introduces a modified Z-test for correlation designed to explore correlations in the presence of imprecise ...
The z Test: An Example μ= 156.5, 156.5, σ= 14.6, M = 156.11, N = 97 1. Populations, distributions, and assumptions Populations: 1.All students at UMD who have taken the test (not just our sample) 2.All students nationwide who have taken the test Distribution: Sample Ædistribution of means Test & Assumptions: z test 1. Data are interval 2.
Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ. A Z test is a form of inferential statistics. It uses samples to draw conclusions about populations.
1. How to use a z table. 2. How to implement the basic steps of hypothesis testing. 3. How to conduct a z test to compare a single sample to a known population. The z Table In Chapter 6, we learned that (1) about 68% of scores fall within one score of the z mean, (2) about 96% of scores fall within two z scores of the mean, and (3) nearly all ...
would have a Z score of (12−18)/4, or −1.5; that is, one and a half SDs below the sample mean. Interpreting and Using the Z Scores The raw scores were in different units in the different cognitive tasks. Z scores are all in the same unit, that is, SD. The Z score distribution has a mean of 0 and an SD of 1. Z scores are useful because they
The main objective of this paper is to choose an appropriate test statistic for research methodology. Specifically, this article tries to explore the concept of statistical hypothesis test, derivation of the test statistic and its role on research methodology. It also try to show the basic formulating and testing of hypothesis using test statistic since choosing appropriate test statistic is ...
A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution. The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations' mean. Z-tests are the most common statistical tests ...
The Z-test January 9, 2021 Contents Example 1: (one tailed z-test) Example 2: (two tailed z-test) Questions Answers The z-test is a hypothesis test to determine if a single observed mean is signi cantly di erent (or greater or less than) the mean under the null hypothesis, hypwhen you know the standard deviation of the population.
Standardization of laboratory test results and their reporting in the form of z score enables the. following: It prevents any reference range differences among regions, races and laboratories. It ...
INDEPENDENT SAMPLES T-TEST. To adopt z- or t-distribution for inference using small samples, a basic assumption is that the distribution of population is not significantly different from normal distribution.As seen in Appendix 1, the normality assumption needs to be tested in advance.If normality assumption cannot be met and we have a small sample (n < 25), then we are not permitted to use ...
independent samples. eCOMPUTING THE z TEST STATISTICThe formula used for computing the value for the one-sample. z test is shown in Formula 10.1. Remember that we are testing whether a sample mean belongs to or is. a fair estimate of a population. The diference between the sample mean (X ) and the population mean (μ) makes up the numer.
Both the t-test and the z-test are usually used for continuous populations, and the chi-square test is used for categorical data. The F-test is used for comparing more than two means.
3) When conducting a hypothesis test to check the means of samples, if the population standard deviation is known, we can use a z- test. When the population standard deviation is unknown, we use a t-test. 4) It will be 1 -tailed if we are expect ing the sample mean to be either significantly higher or significantly lower than the population mean.