Bookmark this page

  • A Model for the National Assessment of Higher Order Thinking
  • International Critical Thinking Essay Test
  • Online Critical Thinking Basic Concepts Test
  • Online Critical Thinking Basic Concepts Sample Test

Consequential Validity: Using Assessment to Drive Instruction

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

critical thinking instrument

Critical Thinking Testing and Assessment









The purpose of assessment in instruction is improvement. The purpose of assessing instruction for critical thinking is improving the teaching of discipline-based thinking (historical, biological, sociological, mathematical, etc.) It is to improve students’ abilities to think their way through content using disciplined skill in reasoning. The more particular we can be about what we want students to learn about critical thinking, the better we can devise instruction with that particular end in view.

critical thinking instrument

The Foundation for Critical Thinking offers assessment instruments which share in the same general goal: to enable educators to gather evidence relevant to determining the extent to which instruction is teaching students to think critically (in the process of learning content). To this end, the Fellows of the Foundation recommend:

that academic institutions and units establish an oversight committee for critical thinking, and

that this oversight committee utilizes a combination of assessment instruments (the more the better) to generate incentives for faculty, by providing them with as much evidence as feasible of the actual state of instruction for critical thinking.

The following instruments are available to generate evidence relevant to critical thinking teaching and learning:

Course Evaluation Form : Provides evidence of whether, and to what extent, students perceive faculty as fostering critical thinking in instruction (course by course). Machine-scoreable.

Online Critical Thinking Basic Concepts Test : Provides evidence of whether, and to what extent, students understand the fundamental concepts embedded in critical thinking (and hence tests student readiness to think critically). Machine-scoreable.

Critical Thinking Reading and Writing Test : Provides evidence of whether, and to what extent, students can read closely and write substantively (and hence tests students' abilities to read and write critically). Short-answer.

International Critical Thinking Essay Test : Provides evidence of whether, and to what extent, students are able to analyze and assess excerpts from textbooks or professional writing. Short-answer.

Commission Study Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Based on the California Commission Study . Short-answer.

Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Short-answer.

Protocol for Interviewing Students Regarding Critical Thinking : Provides evidence of whether, and to what extent, students are learning to think critically at a college or university. Can be adapted for high school). Short-answer. 

Criteria for Critical Thinking Assignments : Can be used by faculty in designing classroom assignments, or by administrators in assessing the extent to which faculty are fostering critical thinking.

Rubrics for Assessing Student Reasoning Abilities : A useful tool in assessing the extent to which students are reasoning well through course content.  

All of the above assessment instruments can be used as part of pre- and post-assessment strategies to gauge development over various time periods.

Consequential Validity

All of the above assessment instruments, when used appropriately and graded accurately, should lead to a high degree of consequential validity. In other words, the use of the instruments should cause teachers to teach in such a way as to foster critical thinking in their various subjects. In this light, for students to perform well on the various instruments, teachers will need to design instruction so that students can perform well on them. Students cannot become skilled in critical thinking without learning (first) the concepts and principles that underlie critical thinking and (second) applying them in a variety of forms of thinking: historical thinking, sociological thinking, biological thinking, etc. Students cannot become skilled in analyzing and assessing reasoning without practicing it. However, when they have routine practice in paraphrasing, summariz­ing, analyzing, and assessing, they will develop skills of mind requisite to the art of thinking well within any subject or discipline, not to mention thinking well within the various domains of human life.

For full copies of this and many other critical thinking articles, books, videos, and more, join us at the Center for Critical Thinking Community Online - the world's leading online community dedicated to critical thinking!   Also featuring interactive learning activities, study groups, and even a social media component, this learning platform will change your conception of intellectual development.

ORIGINAL RESEARCH article

Performance assessment of critical thinking: conceptualization, design, and implementation.

\r\nHenry I. Braun*

  • 1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
  • 2 Graduate School of Education, Stanford University, Stanford, CA, United States
  • 3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

www.frontiersin.org

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

  • ^ https://www.ipal-rd.com/
  • ^ https://www.aacu.org/value
  • ^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Critical Thinking

How can one assess, for purposes of instruction or research, the degree to which a person possesses the dispositions, skills and knowledge of a critical thinker?

In psychometrics, assessment instruments are judged according to their validity and reliability.

Roughly speaking, an instrument is valid if it measures accurately what it purports to measure, given standard conditions. More precisely, the degree of validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (American Educational Research Association 2014: 11). In other words, a test is not valid or invalid in itself. Rather, validity is a property of an interpretation of a given score on a given test for a specified use. Determining the degree of validity of such an interpretation requires collection and integration of the relevant evidence, which may be based on test content, test takers’ response processes, a test’s internal structure, relationship of test scores to other variables, and consequences of the interpretation (American Educational Research Association 2014: 13–21). Criterion-related evidence consists of correlations between scores on the test and performance on another test of the same construct; its weight depends on how well supported is the assumption that the other test can be used as a criterion. Content-related evidence is evidence that the test covers the full range of abilities that it claims to test. Construct-related evidence is evidence that a correct answer reflects good performance of the kind being measured and an incorrect answer reflects poor performance.

An instrument is reliable if it consistently produces the same result, whether across different forms of the same test (parallel-forms reliability), across different items (internal consistency), across different administrations to the same person (test-retest reliability), or across ratings of the same answer by different people (inter-rater reliability). Internal consistency should be expected only if the instrument purports to measure a single undifferentiated construct, and thus should not be expected of a test that measures a suite of critical thinking dispositions or critical thinking abilities, assuming that some people are better in some of the respects measured than in others (for example, very willing to inquire but rather closed-minded). Otherwise, reliability is a necessary but not a sufficient condition of validity; a standard example of a reliable instrument that is not valid is a bathroom scale that consistently under-reports a person’s weight.

Assessing dispositions is difficult if one uses a multiple-choice format with known adverse consequences of a low score. It is pretty easy to tell what answer to the question “How open-minded are you?” will get the highest score and to give that answer, even if one knows that the answer is incorrect. If an item probes less directly for a critical thinking disposition, for example by asking how often the test taker pays close attention to views with which the test taker disagrees, the answer may differ from reality because of self-deception or simple lack of awareness of one’s personal thinking style, and its interpretation is problematic, even if factor analysis enables one to identify a distinct factor measured by a group of questions that includes this one (Ennis 1996). Nevertheless, Facione, Sánchez, and Facione (1994) used this approach to develop the California Critical Thinking Dispositions Inventory (CCTDI). They began with 225 statements expressive of a disposition towards or away from critical thinking (using the long list of dispositions in Facione 1990a), validated the statements with talk-aloud and conversational strategies in focus groups to determine whether people in the target population understood the items in the way intended, administered a pilot version of the test with 150 items, and eliminated items that failed to discriminate among test takers or were inversely correlated with overall results or added little refinement to overall scores (Facione 2000). They used item analysis and factor analysis to group the measured dispositions into seven broad constructs: open-mindedness, analyticity, cognitive maturity, truth-seeking, systematicity, inquisitiveness, and self-confidence (Facione, Sánchez, and Facione 1994). The resulting test consists of 75 agree-disagree statements and takes 20 minutes to administer. A repeated disturbing finding is that North American students taking the test tend to score low on the truth-seeking sub-scale (on which a low score results from agreeing to such statements as the following: “To get people to agree with me I would give any reason that worked”. “Everyone always argues from their own self-interest, including me”. “If there are four reasons in favor and one against, I’ll go with the four”.) Development of the CCTDI made it possible to test whether good critical thinking abilities and good critical thinking dispositions go together, in which case it might be enough to teach one without the other. Facione (2000) reports that administration of the CCTDI and the California Critical Thinking Skills Test (CCTST) to almost 8,000 post-secondary students in the United States revealed a statistically significant but weak correlation between total scores on the two tests, and also between paired sub-scores from the two tests. The implication is that both abilities and dispositions need to be taught, that one cannot expect improvement in one to bring with it improvement in the other.

A more direct way of assessing critical thinking dispositions would be to see what people do when put in a situation where the dispositions would reveal themselves. Ennis (1996) reports promising initial work with guided open-ended opportunities to give evidence of dispositions, but no standardized test seems to have emerged from this work. There are however standardized aspect-specific tests of critical thinking dispositions. The Critical Problem Solving Scale (Berman et al. 2001: 518) takes as a measure of the disposition to suspend judgment the number of distinct good aspects attributed to an option judged to be the worst among those generated by the test taker. Stanovich, West and Toplak (2011: 800–810) list tests developed by cognitive psychologists of the following dispositions: resistance to miserly information processing, resistance to myside thinking, absence of irrelevant context effects in decision-making, actively open-minded thinking, valuing reason and truth, tendency to seek information, objective reasoning style, tendency to seek consistency, sense of self-efficacy, prudent discounting of the future, self-control skills, and emotional regulation.

It is easier to measure critical thinking skills or abilities than to measure dispositions. The following eight currently available standardized tests purport to measure them: the Watson-Glaser Critical Thinking Appraisal (Watson & Glaser 1980a, 1980b, 1994), the Cornell Critical Thinking Tests Level X and Level Z (Ennis & Millman 1971; Ennis, Millman, & Tomko 1985, 2005), the Ennis-Weir Critical Thinking Essay Test (Ennis & Weir 1985), the California Critical Thinking Skills Test (Facione 1990b, 1992), the Halpern Critical Thinking Assessment (Halpern 2016), the Critical Thinking Assessment Test (Center for Assessment & Improvement of Learning 2017), the Collegiate Learning Assessment (Council for Aid to Education 2017), the HEIghten Critical Thinking Assessment (https://territorium.com/heighten/), and a suite of critical thinking assessments for different groups and purposes offered by Insight Assessment (https://www.insightassessment.com/products). The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students’ critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level certificates in critical thinking on the basis of an examination (OCR 2011). Many of these standardized tests have received scholarly evaluations at the hands of, among others, Ennis (1958), McPeck (1981), Norris and Ennis (1989), Fisher and Scriven (1997), Possin (2008, 2013a, 2013b, 2013c, 2014, 2020) and Hatcher and Possin (2021). Their evaluations provide a useful set of criteria that such tests ideally should meet, as does the description by Ennis (1984) of problems in testing for competence in critical thinking: the soundness of multiple-choice items, the clarity and soundness of instructions to test takers, the information and mental processing used in selecting an answer to a multiple-choice item, the role of background beliefs and ideological commitments in selecting an answer to a multiple-choice item, the tenability of a test’s underlying conception of critical thinking and its component abilities, the set of abilities that the test manual claims are covered by the test, the extent to which the test actually covers these abilities, the appropriateness of the weighting given to various abilities in the scoring system, the accuracy and intellectual honesty of the test manual, the interest of the test to the target population of test takers, the scope for guessing, the scope for choosing a keyed answer by being test-wise, precautions against cheating in the administration of the test, clarity and soundness of materials for training essay graders, inter-rater reliability in grading essays, and clarity and soundness of advance guidance to test takers on what is required in an essay. Rear (2019) has challenged the use of standardized tests of critical thinking as a way to measure educational outcomes, on the grounds that  they (1) fail to take into account disputes about conceptions of critical thinking, (2) are not completely valid or reliable, and (3) fail to evaluate skills used in real academic tasks. He proposes instead assessments based on discipline-specific content.

There are also aspect-specific standardized tests of critical thinking abilities. Stanovich, West and Toplak (2011: 800–810) list tests of probabilistic reasoning, insights into qualitative decision theory, knowledge of scientific reasoning, knowledge of rules of logical consistency and validity, and economic thinking. They also list instruments that probe for irrational thinking, such as superstitious thinking, belief in the superiority of intuition, over-reliance on folk wisdom and folk psychology, belief in “special” expertise, financial misconceptions, overestimation of one’s introspective powers, dysfunctional beliefs, and a notion of self that encourages egocentric processing. They regard these tests along with the previously mentioned tests of critical thinking dispositions as the building blocks for a comprehensive test of rationality, whose development (they write) may be logistically difficult and would require millions of dollars.

A superb example of assessment of an aspect of critical thinking ability is the Test on Appraising Observations (Norris & King 1983, 1985, 1990a, 1990b), which was designed for classroom administration to senior high school students. The test focuses entirely on the ability to appraise observation statements and in particular on the ability to determine in a specified context which of two statements there is more reason to believe. According to the test manual (Norris & King 1985, 1990b), a person’s score on the multiple-choice version of the test, which is the number of items that are answered correctly, can justifiably be given either a criterion-referenced or a norm-referenced interpretation.

On a criterion-referenced interpretation, those who do well on the test have a firm grasp of the principles for appraising observation statements, and those who do poorly have a weak grasp of them. This interpretation can be justified by the content of the test and the way it was developed, which incorporated a method of controlling for background beliefs articulated and defended by Norris (1985). Norris and King synthesized from judicial practice, psychological research and common-sense psychology 31 principles for appraising observation statements, in the form of empirical generalizations about tendencies, such as the principle that observation statements tend to be more believable than inferences based on them (Norris & King 1984). They constructed items in which exactly one of the 31 principles determined which of two statements was more believable. Using a carefully constructed protocol, they interviewed about 100 students who responded to these items in order to determine the thinking that led them to choose the answers they did (Norris & King 1984). In several iterations of the test, they adjusted items so that selection of the correct answer generally reflected good thinking and selection of an incorrect answer reflected poor thinking. Thus they have good evidence that good performance on the test is due to good thinking about observation statements and that poor performance is due to poor thinking about observation statements. Collectively, the 50 items on the final version of the test require application of 29 of the 31 principles for appraising observation statements, with 13 principles tested by one item, 12 by two items, three by three items, and one by four items. Thus there is comprehensive coverage of the principles for appraising observation statements. Fisher and Scriven (1997: 135–136) judge the items to be well worked and sound, with one exception. The test is clearly written at a grade 6 reading level, meaning that poor performance cannot be attributed to difficulties in reading comprehension by the intended adolescent test takers. The stories that frame the items are realistic, and are engaging enough to stimulate test takers’ interest. Thus the most plausible explanation of a given score on the test is that it reflects roughly the degree to which the test taker can apply principles for appraising observations in real situations. In other words, there is good justification of the proposed interpretation that those who do well on the test have a firm grasp of the principles for appraising observation statements and those who do poorly have a weak grasp of them.

To get norms for performance on the test, Norris and King arranged for seven groups of high school students in different types of communities and with different levels of academic ability to take the test. The test manual includes percentiles, means, and standard deviations for each of these seven groups. These norms allow teachers to compare the performance of their class on the test to that of a similar group of students.

Copyright © 2022 by David Hitchcock < hitchckd @ mcmaster . ca >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Filter Instruments by Category

  • Student Well-Being (138)
  • Academic Knowledge and Skills (92)
  • Schooling (78)
  • Home and Community (25)

Sub-Category

  • Social-Emotional Competence (109)
  • Teaching (52)
  • English Language Arts (40)
  • School Climate (26)
  • Parenting (25)
  • College Readiness (22)
  • Belonging (14)
  • Mental Health (7)
  • Neurodiversity (7)
  • Child Development (6)
  • < 3 Years (17)
  • Pre-Kindergarten (45)
  • Kindergarten (116)
  • 1st Grade (126)
  • 2nd Grade (129)
  • 3rd Grade (145)
  • 4th Grade (145)
  • 5th Grade (152)
  • 6th Grade (177)
  • 7th Grade (179)
  • 8th Grade (188)
  • 9th Grade (187)
  • 10th Grade (188)
  • 11th Grade (192)
  • 12th Grade (193)
  • Post secondary (102)

Critical Thinking Assessment Test (CAT)

The CAT instrument is a unique tool designed to assess and promote the improvement of critical thinking and real-world problem solving skills. Most of the questions require short answer essay responses, and a detailed scoring guide helps ensure good scoring reliability. The CAT instrument is scored by the institution's own faculty using the detailed scoring guide. During the scoring process faculty are able to see their students' weaknesses and understand areas that need improvement. Faculty are encouraged to use the CAT instrument as a model for developing authentic assessments and learning activities in their own discipline that improve students' critical thinking and real-world problem solving skills. These features help close the loop in assessment and quality improvement.

Critical Thinking, Problem Solving

Student Well-Being

Administration Information

Must be faculty at university or researcher to utilize. Grading training also required: Each institution must send 2-3 reps to a 2 day grading workshop:  https://www.tntech.edu/cat/training.php

Access and Use

$9.95/test (50 test minimum) + $300 Annual Fee

Assistant Director of Testing Elizabeth Lisic [email protected] 931-372-3611

Grant, M., & Smith, M. (2018). Quantifying assessment of undergraduate critical thinking.  Journal of College Teaching & Learning ,  15 (1), 27-38.  https://doi.org/10.19030/tlc.v15i1.10199

Haynes, A., Lisic, E., Goltz, M., Stein, B., & Harris, K. (2016). Moving beyond assessment to improving students’ critical thinking skills: a model for implementing change. Journal of the Scholarship of Teaching and Learning, 16 (4), 44–61. https://doi.org/10.14434/josotl.v16i4.19407 .

Styers, M. L., Van Zandt, P. A., & Hayden, K. L. (2018). Active learning in flipped life science courses promotes development of critical thinking skills.  CBE—Life Sciences Education ,  17 (3), ar39.  https://doi.org/10.1187/cbe.16-11-0332

Psychometrics

Technical Manual - Tennessee Technological University (2016) CAT© Instrument Technical Information: https://www.tntech.edu/cat/pdf/reports/CAT_Technical_Information_V8.pdf

National Science Foundation Final Report -  https://www.tntech.edu/cat/pdf/reports/Project_CAT_Final_Report.pdf

Psychometric Considerations

Psychometrics is the science of psychological assessment. A primary goal of EdInstruments is to provide information on crucial psychometric topics including Validity and Reliability – essential concepts of evaluation, which indicate how well an instrument measures a construct - as well as additional properties that are worthy of consideration when selecting an instrument of measurement.

 Learn more

University of Louisville

  • Programs & Services
  • Delphi Center

Ideas to Action (i2a)

  • Critical Thinking Inventories

Two instruments to assess critical thinking learning environments that were developed and validated by faculty and staff at the University of Louisville.

Combined Forms Image

Quick Links to Resources:

  • Teaching Critical Thinking Inventory [PDF]
  • Learning Critical Thinking Inventory [PDF]
  • Sample CTI Feedback Report [PDF]
  • LCTI Survey Deployment Instructions [PDF]
  • Detailed Instructor FAQ [PDF]
“There has been a lot of value gained from using the critical thinking inventories. It helps faculty members compare their own perspectives on what is happening in the classroom with the perspectives of their students. It gives you a way to address issues in a course and decide what you want to tweak or change in your teaching.” -Alan Attaway Professor in Department of Accountancy, College of Business

What are the Critical Thinking Inventories?

The Critical Thinking Inventories (CTIs) are short, Likert-item instruments that assess a course learning environment as it relates to critical thinking skill-building. There are two separate instruments:

  • This inventory asks students to report their perception of critical thinking skill building as facilitated by their instructor in a specific course learning environment.
  • This instrument asks instructors to report on their facilitation of critical thinking skills within a specific course learning environment.

The LCTI and TCTI are validated instruments that provide you with a quick, anonymous way to self-assess the critical thinking characteristics of your course from your own perspective and the perspective of your students. The results from these inventories may be used by instructors or by academic programs to help inform how instructors can facilitate critical thinking skill building within a specific course and/or by the university to assess and improve the integration of critical thinking within the undergraduate educational environment.

Why were the CTIs developed?

Despite a nationwide emphasis on critical thinking in higher education by both higher education institutions and potential employees of college graduates for the last three decades, there are no standardized instruments available to assess actual or perceived abilities of instructors to develop students’ critical thinking skills (van Zyl, Bays, & Gilchrist, 2013). The CTIs were developed here at UofL to address this gap in the field and to support our institution’s self-identified goal of fostering our students’ critical thinking skills. Appropriate statistical analyses conducted at UofL showed the instruments to be both reliable and valid. You can read more about the development and validation of the CTIs in the following peer-reviewed article:

  • Van Zyl, M.A., Bays, C.L., & Gilchrist, C. (2013). Assessing teaching critical thinking with validated critical thinking inventories: The learning critical thinking inventory (LCTI) and the teaching critical thinking inventory (TCTI). Inquiry: Critical Thinking Across The Discipline , 28(3), 40-50.
  • Download a copy of the journal article here [PDF] .

How can I easily administer the LCTI to my students?

Both the LCTI and TCTI contain 11 Likert items and should each take no more than 5 minutes to complete. The LCTI student instrument can be deployed and is viewable within the “Assignments” section under “Assessments” -> “Survey” in your Blackboard course shell. The instructor can control visibility and access of the instrument via standard Blackboard control functions. All student responses from the LCTI remain anonymous. Please refer to the document titled “LCTI Survey Deployment” [PDF] for detailed instructions on making the assessment visible to students.

  • Download a copy of the Learning Critical Thinking Inventory here [PDF] .

How do I complete the TCTI that is designed specifically for instructors?

The TCTI instructor instrument is for your use only and is not located in your Blackboard course shell. Instructors can access and download an Adobe copy of the TCTI below. You can fill out the inventory at the beginning or end of the semester. Ideally, you will compare your self-assessment scores with the aggregated student scores at the end of the semester. You can then affirm the alignment of or identify possible gaps between your own perceptions and your students’ perceptions in order to make adjustments to the learning environment.

  • Download a copy of the Teaching Critical Thinking Inventory here [PDF] .

How do I review my results from students?

You will have complete access to student responses on the LCTI within your Blackboard Learn course shell. The course grade center will record which student completed the LCTI, but will only report out individual responses in aggregated form. Detailed instructions for accessing the student data is located here [PDF] . You will be given the opportunity to submit their data to the Quality Enhancement Plan team to have those data converted to a CTI feedback report. IL Barrow, QEP Specialist for Assessment, at the Delphi Center for Teaching and Learning is available upon request to assist you in organizing and using your data for continuous improvement.

  • Download an example feedback report here [PDF] .

Where can I find additional information on the use of the CTIs?

For additional questions, please download an exhaustive Frequently Asked Questions (FAQ) document for instructors here [PDF] .

Who can I contact for additional information on the CTIs?

IL Barrow, QEP Specialist for Assessment [email protected]

  • SACS & QEP
  • Planning and Implementation
  • Critical Thinking
  • Culminating Undergraduate Experience
  • 2014-2016 Assessment Plan
  • Past Evaluation Plans
  • Assessment Methods
  • Webliography
  • Community Engagement
  • Frequently Asked Questions
  • What is i2a?

Copyright © 2012 - University of Louisville , Delphi Center

Introduction to Critical Thinking Skills

  • First Online: 04 September 2024

Cite this chapter

critical thinking instrument

  • K. Venkat Reddy 3 &
  • G. Suvarna Lakshmi 4  

This chapter contains summaries of six articles that are machine generated. The summaries discuss the multitude ways in which the field of critical thinking has been understood and defined. Mostly the summaries included in the chapter project the view that critical thinking is all about certain cognitive abilities belonging to the higher order of thinking. The first summary explains the definition of critical thinking using a meta-level approach; it uses this approach because the problem of defining critical thinking is a meta-problem. The authors argue that the definitions proposed earlier were either subject-specific or skill-specific resulting in definitions that are neither universally applicable nor acceptable. The authors therefore have attempted to propose an approach that has three proper criteria that the definition should satisfy. They are: (1) rely on criteria, (2) self-correcting, and (3) sensitive to context. The summary of the second article on the skills required for the twenty-first-century education is based on the lists of skills proposed by various bodies that are broadly categorized as productive, critical, and creative thinking along with digital skills. The author proposes that the curriculum should incorporate skills that are required as per the current pace of change and the need of the hour.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

McPeck, J. (1981). Critical thinking and education . St. Martin’s Press.

Google Scholar  

Ennis, R. (1987). A conception of critical thinking—With some curriculum suggestions. APA Newsletter on Teaching Philosophy Summer , 1–5.

Ennis, R. (1989). Critical thinking and subject-specificity: Clarification and needed research. Educational Researcher, 18 , 4–10.

Article   Google Scholar  

Paul, R. (1995). Critical thinking: How to prepare students for a rapidly changing world . Foundation for Critical Thinking.

Lipman, M. (1988). Critical thinking: What can it be? Educational Leadership, 46 (September), 38–43.

Burkhardt, G., Monsour, M., Valdez, G., Gunn, C., Dawson, M., Lemke, C., & Martin, C. (2003). EnGauge 21st century skills: Literacy in the digital age . NCREL. http://www.pict.sdsu.edu/engauge21st.pdf

ISTE [International Society for Technology in Education]. (2007). National educational technology standards for students. (2nd rev. ed.). : ISTE. www.iste.org.

Pithers, R. T., & Soden, R. (2000). Critical thinking in education: A review. Educational Research, 42 (3), 237–249.

Higgins, S., & Baumfield, V. (1998). A defence of teaching general thinking skills. Journal of Philosophy of Education, 32 (3), 391–398. https://doi.org/10.1111/1467-9752.00103

Colwill, I., & Gallagher, C. (2007). Developing a curriculum for the twenty-first century: The experiences of England and Northern Ireland. Prospects, 37 (4), 411–425.

Benjamin, H. R. W. (1939). Saber-tooth curriculum, including other lectures in the history of Paleolithic education . McGraw-Hill.

Bahar, M., & Tongac, E. (2009). The effect of teaching approaches on the pattern of pupils’ cognitive structure: Some evidence from the field. The Asia-Pacific Education Researcher, 18 (1), 21–45.

McPeck, J. E. (1990). Teaching critical thinking . Chapman and Hall.

Norris, S. P. (1985). Synthesis of research on critical thinking. Educational Leadership, 42 (8), 40–45.

Cottrell, S. (2005). Critical thinking skills: Developing effective analysis and argument . Palgrave Macmillan.

Novak, J., & Gowin, D. (1984). Learning how to learn . Cambridge University Press.

Book   Google Scholar  

Heritage, M. (2008). Learning progressions: Supporting instruction and formative assessment . Council of Chief State School Officers.

UNESCO IBE. (2013b). Statement on learning in the post-2015 education and development agenda . UNESCO IBE.

UNESCO IBE [International Bureau of Education]. (2013a). Key curricular and learning issues in the post-2015 education and development agenda. Document prepared for the UNESCO IBE international experts’ meeting, 23–25 September, Geneva. UNESCO IBE.

U. S. Office of Education. (1991). America 2000: An education strategy . U. S. Government Printing Office.

Ennis, R. (1996). Critical thinking . Prentice-Hall.

Bailin, S., & Battersby, M. (2010). Reason in the balance: An inquiry approach to critical thinking . McGraw-Hill Ryerson.

Capon, N., & Kuhn, D. (2004). What’s so good about problem-based learning? Cognition and Instruction, 22 (1), 61–79.

Pease, M., & Kuhn, D. (2011). Experimental analysis of the effective components of problem-based learning. Science Education, 95 , 57–86.

Wirkala, C., & Kuhn, D. (2011). Problem-based learning in K-12 education: Is it effective and how does it achieve its effects? American Educational Research Journal, 48 , 1157–1186.

Ennis, R. (1984). Problems in testing informal logic, critical thinking, reasoning ability. Inf Logic, 6 , 3–9.

Ennis, R. (2003). Critical thinking assessment. In D. Fasko (Ed.), Critical thinking and reasoning: Current theories, research, and practice . Hampton.

Ennis, R. (2008). Nationwide testing of critical thinking for higher education: Vigilance required. Teaching Philosophy, 31 (1), 1–26.

Ennis, R. (2009). Investigating and assessing multiple-choice critical thinking tests. In J. Sobocan & L. Groarke (Eds.), Critical thinking education and assessment: Can higher order thinking be tested? Althouse.

Ennis, R., & Norris, S. (1990). Critical thinking assessment: Status, issues, needs. In S. Legg & J. Algina (Eds.), Cognitive assessment of language and math outcomes . Ablex.

Fisher, A., & Scriven, M. (1997). Critical thinking: Its definition and assessment . Edgepress.

Norris, S., & Ennis, R. (1989). Evaluating critical thinking . Midwest Publications.

Groarke, L. (2009). What’s wrong with the California critical thinking skills test? CT testing and accountability. In J. Sobocan & L. Groarke (Eds.), Critical thinking education and assessment: Can higher order thinking be tested? Althouse Press.

Possin, K. (2008). A field guide to critical thinking assessment. Teaching Philosophy, 31 (3), 201–228.

Possin K (2013a) A serious flaw in the collegiate learning assessment [CLA] test. Inf Log 33(3):390–405. Also posted in Italian at http://unibec.wordpress.com/2013/05/13/un-grave-difetto-del-test-colligiate-learning-assessment-cla/

Possin, K. (2013b). Some problems with the Halpern critical thinking assessment [HCTA] test. Inquiry, 28 (3), 4–12.

Possin, K. (2013c). A fatal flaw in the collegiate learning assessment test. Assessment Update, 25 (1), 8–11.

Possin, K. (2014). Critique of the Watson-Glaser critical thinking appraisal test: The more you know, the lower your score. Inf Log, 34 (4), 393–416.

Sobocan, J., & Groarke, L. (Eds.). (2009). Critical thinking education and assessment: Can higher order thinking be tested? Althouse.

Pascarella, E., & Terenzini, P. (2005). How college affects students: Findings and insights from twenty years of research, vol 2: A third decade of research . Jossey Bass.

Solon, T. (2007). Critical thinking infusion and course content learning in introductory psychology. Journal of Instructional Psychology, 34 (2), 95–109.

Johnson, R. H., & Hamby, B. (2015). A meta-level approach to the problem of defining ‘critical thinking’. Argumentation, 29 , 417–430. https://doi.org/10.1007/s10503-015-9356-4

Higgins, S. (2014). Critical thinking for 21 st -century education: A cyber-tooth curriculum? Prospects, 44 , 559–574. https://doi.org/10.1007/s11125-014-9323-0

Battersby, M., & Bailin, S. (2011). Critical inquiry: Considering the context. Argumentation, 25 , 243–253. https://doi.org/10.1007/s10503-011-9205-z

Yu, K.-C., Lin, K.-Y., & Fan, S.-C. (2014). An exploratory study on the application of conceptual knowledge and critical thinking to technological issues. International Journal of Technology and Design Education, 25 , 339–361. https://doi.org/10.1007/s10798-014-9289-5

Acedo, C., & Hughes, C. (2014). Principles for learning and competences in the 21st-century curriculum. Prospects, 44 , 503–525. https://doi.org/10.1007/s11125-014-9330-1

Ennis, R. H. (2016). Critical thinking across the curriculum: A vision. Topoi, 37 , 165–184. https://doi.org/10.1007/s11245-016-9401-4

Download references

Author information

Authors and affiliations.

Department of Training and Development, The English and Foreign Languages University, Hyderabad, Telangana, India

K. Venkat Reddy

Department of English Language Teaching, The English and Foreign Languages University, Hyderabad, Telangana, India

G. Suvarna Lakshmi

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Department of English Language Teaching, English and Foreign Languages University, Hyderabad, Telangana, India

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Reddy, K.V., Lakshmi, G.S. (2024). Introduction to Critical Thinking Skills. In: Reddy, K.V., Lakshmi, G.S. (eds) Critical Thinking for Professional and Language Education. Springer, Cham. https://doi.org/10.1007/978-3-031-37951-2_1

Download citation

DOI : https://doi.org/10.1007/978-3-031-37951-2_1

Published : 04 September 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-37950-5

Online ISBN : 978-3-031-37951-2

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Center for Assessment & Improvement of Learning

Attention: CAT and COVID-19 Given the current circumstances with COVID-19 and the closure of many campuses, we will be allowing the online administration of the CAT outside of a proctored setting. We will work with institutions to setup proctor accounts and specific blocks of time during which a student can log in and complete the CAT. We are also offering online, virtual trainings to engage faculty in the evaluation of student performance on the CAT and the development of CAT Apps. If have questions about an online CAT administration or virtual trainings, please contact  [email protected] .

The Critical-thinking Assessment Test (CAT) was developed with input from faculty across a wide range of institutions and disciplines, with guidance from colleagues in the cognitive/ learning sciences and assessment and with support from the National Science Foundation (NSF).

CAT and NSF Logos

to assess a broad range of skills that faculty across the country feel are important components of critical thinking and real world problem solving.

to emulate real world problems. All questions derived from real world situations with most questions requiring short answer essay responses.

faculty in the assessment and improvement of student critical thinking skills and connects faculty to a teaching community.

CAT Assessment diagram

We encourage faculty involvement in the scoring process to help them understand student's strengths and weaknesses. Faculty can also use the CAT instrument as a model for constructing better course assessments using their own discipline content.

Active Learning

Over 350 institutions across the country have used the CAT for course, program, and general education assessment. NSF support also helped establish the Center for Assessment and Improvement of Learning to distribute the CAT and provide training, consultation, and statistical support to users.

View Our User Experiences   

See the Narrated Video Below for an Overview of the CAT Instrument

The Critical Thinking Assessment Test was developed with support from the National Science Foundation TUES (CCLI) Division (under grants 0404911, 0717654, and 1022789 to Barry Stein, PI; Ada Haynes, Co-PI; & Michael Redding, Co-PI). Any opinions, findings, and conclusions or recommendations expressed here do not necessarily reflect the views of the National Science Foundation.

  • Getting Started
  • About the CAT
  • Administration Options
  • Faculty Development
  • Training & Services
  • Ordering CAT Materials
  • Returning CAT Materials
  • Reports & Publications
  • User Experiences & Successful Projects
  • Frequently Asked Questions

MORE INFORMATION

931-372-3252 [email protected].

cat logo

Experience Tech For Yourself

Visit us to see what sets us apart.

Quick Links

  • Tech at a Glance
  • Majors & Concentrations
  • Colleges & Schools
  • Student Life
  • Research at Tech
  • Tech Express
  • Current Students
  • Faculty & Staff
  • Mission and Vision
  • Facts about Tech
  • University Rankings
  • Accreditation & Memberships
  • Maps & Directions
  • Board of Trustees
  • Office of the President
  • Strategic Plan
  • History of Tech
  • Parents & Family
  • International
  • Military & Veteran Affairs
  • Tuition & Fees
  • Financial Aid
  • Visit Campus
  • Scholarships
  • Dual Enrollment
  • Request Information
  • Office of the Provost
  • Academic Calendar
  • Undergraduate Catalog
  • Graduate Catalog
  • Volpe Library
  • Student Success Centers
  • Honors Program
  • Study Abroad
  • Living On Campus
  • Health & Wellness
  • Get Involved
  • Student Organizations
  • Safety & Security
  • Services for Students
  • Upcoming Events
  • Diversity Resources
  • Student Affairs
  • Featured Researchers
  • Research Centers
  • ttusports.com
  • Social Media
  • Student Resources
  • Faculty & Staff Resources
  • Bookstore/Dining/Parking
  • Pay Online - Eagle Pay
  • IT Help Desk
  • Strategic Planning
  • Office of IARE
  • Student Complaints
  • ADEA Connect

' src=

  • Communities
  • Career Opportunities
  • New Thinking
  • ADEA Governance
  • House of Delegates
  • Board of Directors
  • Advisory Committees
  • Sections and Special Interest Groups
  • Governance Documents and Publications
  • Dental Faculty Code of Conduct
  • ADEAGies Foundation
  • About ADEAGies Foundation
  • ADEAGies Newsroom
  • Gies Awards
  • Press Center
  • Strategic Directions
  • 2023 Annual Report
  • ADEA Membership
  • Institutions
  • Faculty and Staff
  • Individuals
  • Corporations
  • ADEA Members
  • Predoctoral Dental
  • Allied Dental
  • Nonfederal Advanced Dental
  • U.S. Federal Dental
  • Students, Residents and Fellows
  • Corporate Members
  • Member Directory
  • Directory of Institutional Members (DIM)
  • 5 Questions With
  • ADEA Member to Member Recruitment
  • Students, Residents, and Fellows
  • Information For
  • Deans & Program Directors
  • Current Students & Residents
  • Prospective Students
  • Educational Meetings
  • Upcoming Events
  • 2025 Annual Session & Exhibition
  • eLearn Webinars
  • Past Events
  • Professional Development
  • eLearn Micro-credentials
  • Leadership Institute
  • Leadership Institute Alumni Association (LIAA)
  • Faculty Development Programs
  • ADEA Scholarships, Awards and Fellowships
  • Academic Fellowship
  • For Students
  • For Dental Educators
  • For Leadership Institute Fellows
  • Teaching Resources
  • ADEA weTeach®
  • MedEdPORTAL

Critical Thinking Skills Toolbox

  • Resources for Teaching
  • Policy Topics
  • Task Force Report
  • Opioid Epidemic
  • Financing Dental Education
  • Holistic Review
  • Sex-based Health Differences
  • Access, Diversity and Inclusion
  • ADEA Commission on Change and Innovation in Dental Education
  • Tool Resources
  • Campus Liaisons
  • Policy Resources
  • Policy Publications
  • Holistic Review Workshops
  • Leading Conversations Webinar Series
  • Collaborations
  • Summer Health Professions Education Program
  • Minority Dental Faculty Development Program
  • Federal Advocacy
  • Dental School Legislators
  • Policy Letters and Memos
  • Legislative Process
  • Federal Advocacy Toolkit
  • State Information
  • Opioid Abuse
  • Tracking Map
  • Loan Forgiveness Programs
  • State Advocacy Toolkit
  • Canadian Information
  • Dental Schools
  • Provincial Information
  • ADEA Advocate
  • Books and Guides
  • About ADEA Publications
  • 2023-24 Official Guide
  • Dental School Explorer
  • Dental Education Trends
  • Ordering Publications
  • ADEA Bookstore
  • Newsletters
  • About ADEA Newsletters
  • Bulletin of Dental Education
  • Charting Progress
  • Subscribe to Newsletter
  • Journal of Dental Education
  • Subscriptions
  • Submissions FAQs
  • Data, Analysis and Research
  • Educational Institutions
  • Applicants, Enrollees and Graduates
  • Dental School Seniors
  • ADEA AADSAS® (Dental School)
  • AADSAS Applicants
  • Admissions Officers
  • Health Professions Advisors
  • ADEA CAAPID® (International Dentists)
  • CAAPID Applicants
  • Program Finder
  • ADEA DHCAS® (Dental Hygiene Programs)
  • DHCAS Applicants
  • Program Directors
  • ADEA PASS® (Advanced Dental Education Programs)
  • PASS Applicants
  • PASS Evaluators
  • DentEd Jobs
  • Information For:

critical thinking instrument

  • Introduction
  • Overview of Critical Thinking Skills
  • Teaching Observations
  • Avenues for Research

CTS Tools for Faculty and Student Assessment

  • Critical Thinking and Assessment
  • Conclusions
  • Bibliography
  • Helpful Links
  • Appendix A. Author's Impressions of Vignettes

A number of critical thinking skills inventories and measures have been developed:

     Watson-Glaser Critical Thinking Appraisal (WGCTA)      Cornell Critical Thinking Test      California Critical Thinking Disposition Inventory (CCTDI)      California Critical Thinking Skills Test (CCTST)      Health Science Reasoning Test (HSRT)      Professional Judgment Rating Form (PJRF)      Teaching for Thinking Student Course Evaluation Form      Holistic Critical Thinking Scoring Rubric      Peer Evaluation of Group Presentation Form

Excluding the Watson-Glaser Critical Thinking Appraisal and the Cornell Critical Thinking Test, Facione and Facione developed the critical thinking skills instruments listed above. However, it is important to point out that all of these measures are of questionable utility for dental educators because their content is general rather than dental education specific. (See Critical Thinking and Assessment .)

Table 7. Purposes of Critical Thinking Skills Instruments

Watson-Glaser Critical Thinking Appraisal- FS (WGCTA-FS) Assesses participants' skills in five subscales: inference, recognition of assumptions, deduction, interpretation, and evaluation of arguments.
Cornell Critical Thinking Test (CCTT) Measures test takers' skills in induction, credibility, prediction and experimental planning, fallacies, and deduction.
California Critical Thinking Disposition Inventory (CCTDI)
Assesses test takers' consistent internal motivations to engage in critical thinking skills.
California Critical Thinking Skills Test
(CCTST)
Provides objective measures of participants' skills in six subscales (analysis, inference, explanation, interpretation, self-regulation, and evaluation) and an overall score for critical thinking.
The Health Science Reasoning Test (HSRT) Assesses critical thinking skills of health science professionals and students.
Measures analysis, evaluation, inference, and inductive and deductive reasoning.
Professional Judgment Rating Form (PJRF) Measures extent to which novices approach problems with CTS. Can be used to assess effectiveness of training programs for individual or group evaluation.
Teaching for Thinking Student Course Evaluation Form
Used by students to rate the perceived critical thinking skills content in secondary and postsecondary classroom experiences.
Holistic Critical Thinking Scoring Rubric
Used by professors and students to rate learning outcomes or presentations on critical thinking skills and dispositions. The rubric can capture the type of target behaviors, qualities, or products that professors are interested in evaluating.
Peer Evaluation of Group Presentation Form
A common set of criteria used by peers and the instructor to evaluate student-led group presentations.

  Reliability and Validity

Reliability means that individual scores from an instrument should be the same or nearly the same from one administration of the instrument to another. The instrument can be assumed to be free of bias and measurement error (68). Alpha coefficients are often used to report an estimate of internal consistency. Scores of .70 or higher indicate that the instrument has high reliability when the stakes are moderate. Scores of .80 and higher are appropriate when the stakes are high.

Validity means that individual scores from a particular instrument are meaningful, make sense, and allow researchers to draw conclusions from the sample to the population that is being studied (69) Researchers often refer to "content" or "face" validity. Content validity or face validity is the extent to which questions on an instrument are representative of the possible questions that a researcher could ask about that particular content or skills.

Watson-Glaser Critical Thinking Appraisal-FS (WGCTA-FS)

The WGCTA-FS is a 40-item inventory created to replace Forms A and B of the original test, which participants reported was too long.70 This inventory assesses test takers' skills in:

     (a) Inference: the extent to which the individual recognizes whether assumptions are clearly stated      (b) Recognition of assumptions: whether an individual recognizes whether assumptions are clearly stated      (c) Deduction: whether an individual decides if certain conclusions follow the information provided      (d) Interpretation: whether an individual considers evidence provided and determines whether generalizations from data are warranted      (e) Evaluation of arguments: whether an individual distinguishes strong and relevant arguments from weak and irrelevant arguments

Researchers investigated the reliability and validity of the WGCTA-FS for subjects in academic fields. Participants included 586 university students. Internal consistencies for the total WGCTA-FS among students majoring in psychology, educational psychology, and special education, including undergraduates and graduates, ranged from .74 to .92. The correlations between course grades and total WGCTA-FS scores for all groups ranged from .24 to .62 and were significant at the p < .05 of p < .01. In addition, internal consistency and test-retest reliability for the WGCTA-FS have been measured as .81. The WGCTA-FS was found to be a reliable and valid instrument for measuring critical thinking (71).

Cornell Critical Thinking Test (CCTT)

There are two forms of the CCTT, X and Z. Form X is for students in grades 4-14. Form Z is for advanced and gifted high school students, undergraduate and graduate students, and adults. Reliability estimates for Form Z range from .49 to .87 across the 42 groups who have been tested. Measures of validity were computed in standard conditions, roughly defined as conditions that do not adversely affect test performance. Correlations between Level Z and other measures of critical thinking are about .50.72 The CCTT is reportedly as predictive of graduate school grades as the Graduate Record Exam (GRE), a measure of aptitude, and the Miller Analogies Test, and tends to correlate between .2 and .4.73

California Critical Thinking Disposition Inventory (CCTDI)

Facione and Facione have reported significant relationships between the CCTDI and the CCTST. When faculty focus on critical thinking in planning curriculum development, modest cross-sectional and longitudinal gains have been demonstrated in students' CTS.74 The CCTDI consists of seven subscales and an overall score. The recommended cut-off score for each scale is 40, the suggested target score is 50, and the maximum score is 60. Scores below 40 on a specific scale are weak in that CT disposition, and scores above 50 on a scale are strong in that dispositional aspect. An overall score of 280 shows serious deficiency in disposition toward CT, while an overall score of 350 (while rare) shows across the board strength. The seven subscales are analyticity, self-confidence, inquisitiveness, maturity, open-mindedness, systematicity, and truth seeking (75).

In a study of instructional strategies and their influence on the development of critical thinking among undergraduate nursing students, Tiwari, Lai, and Yuen found that, compared with lecture students, PBL students showed significantly greater improvement in overall CCTDI (p = .0048), Truth seeking (p = .0008), Analyticity (p =.0368) and Critical Thinking Self-confidence (p =.0342) subscales from the first to the second time points; in overall CCTDI (p = .0083), Truth seeking (p= .0090), and Analyticity (p =.0354) subscales from the second to the third time points; and in Truth seeking (p = .0173) and Systematicity (p = .0440) subscales scores from the first to the fourth time points (76). California Critical Thinking Skills Test (CCTST)

Studies have shown the California Critical Thinking Skills Test captured gain scores in students' critical thinking over one quarter or one semester. Multiple health science programs have demonstrated significant gains in students' critical thinking using site-specific curriculum. Studies conducted to control for re-test bias showed no testing effect from pre- to post-test means using two independent groups of CT students. Since behavioral science measures can be impacted by social-desirability bias-the participant's desire to answer in ways that would please the researcher-researchers are urged to have participants take the Marlowe Crowne Social Desirability Scale simultaneously when measuring pre- and post-test changes in critical thinking skills. The CCTST is a 34-item instrument. This test has been correlated with the CCTDI with a sample of 1,557 nursing education students. Results show that, r = .201, and the relationship between the CCTST and the CCTDI is significant at p< .001. Significant relationships between CCTST and other measures including the GRE total, GRE-analytic, GRE-Verbal, GRE-Quantitative, the WGCTA, and the SAT Math and Verbal have also been reported. The two forms of the CCTST, A and B, are considered statistically significant. Depending on the testing, context KR-20 alphas range from .70 to .75. The newest version is CCTST Form 2000, and depending on the testing context, KR-20 alphas range from .78-.84.77

The Health Science Reasoning Test (HSRT)

Items within this inventory cover the domain of CT cognitive skills identified by a Delphi group of experts whose work resulted in the development of the CCTDI and CCTST. This test measures health science undergraduate and graduate students' CTS. Although test items are set in health sciences and clinical practice contexts, test takers are not required to have discipline-specific health sciences knowledge. For this reason, the test may have limited utility in dental education (78).

Preliminary estimates of internal consistency show that overall KR-20 coefficients range from .77 to .83.79 The instrument has moderate reliability on analysis and inference subscales, although the factor loadings appear adequate. The low K-20 coefficients may be result of small sample size, variance in item response, or both (see following table).

Table 8. Estimates of Internal Consistency and Factor Loading by Subscale for HSRT

Inductive
.76 .332-.769
Deductive .71 .366-.579
Analysis .54 .369-.599
Inference .52 .300-.664
Evaluation .77 .359-.758

Professional Judgment Rating Form (PJRF)

The scale consists of two sets of descriptors. The first set relates primarily to the attitudinal (habits of mind) dimension of CT. The second set relates primarily to CTS.

A single rater should know the student well enough to respond to at least 17 or the 20 descriptors with confidence. If not, the validity of the ratings may be questionable. If a single rater is used and ratings over time show some consistency, comparisons between ratings may be used to assess changes. If more than one rater is used, then inter-rater reliability must be established among the raters to yield meaningful results. While the PJRF can be used to assess the effectiveness of training programs for individuals or groups, the evaluation of participants' actual skills are best measured by an objective tool such as the California Critical Thinking Skills Test.

Teaching for Thinking Student Course Evaluation Form

Course evaluations typically ask for responses of "agree" or "disagree" to items focusing on teacher behavior. Typically the questions do not solicit information about student learning. Because contemporary thinking about curriculum is interested in student learning, this form was developed to address differences in pedagogy and subject matter, learning outcomes, student demographics, and course level characteristic of education today. This form also grew out of a "one size fits all" approach to teaching evaluations and a recognition of the limitations of this practice. It offers information about how a particular course enhances student knowledge, sensitivities, and dispositions. The form gives students an opportunity to provide feedback that can be used to improve instruction.

Holistic Critical Thinking Scoring Rubric

This assessment tool uses a four-point classification schema that lists particular opposing reasoning skills for select criteria. One advantage of a rubric is that it offers clearly delineated components and scales for evaluating outcomes. This rubric explains how students' CTS will be evaluated, and it provides a consistent framework for the professor as evaluator. Users can add or delete any of the statements to reflect their institution's effort to measure CT. Like most rubrics, this form is likely to have high face validity since the items tend to be relevant or descriptive of the target concept. This rubric can be used to rate student work or to assess learning outcomes. Experienced evaluators should engage in a process leading to consensus regarding what kinds of things should be classified and in what ways.80 If used improperly or by inexperienced evaluators, unreliable results may occur.

Peer Evaluation of Group Presentation Form

This form offers a common set of criteria to be used by peers and the instructor to evaluate student-led group presentations regarding concepts, analysis of arguments or positions, and conclusions.81 Users have an opportunity to rate the degree to which each component was demonstrated. Open-ended questions give users an opportunity to cite examples of how concepts, the analysis of arguments or positions, and conclusions were demonstrated.

Table 8. Proposed Universal Criteria for Evaluating Students' Critical Thinking Skills 

     Accuracy
     Adequacy
     Clarity
     Completeness
     Consistency
     Depth
     Fairness
     Logic
     Precision
     Realism
     Relevance
     Significance
     Specificity

Aside from the use of the above-mentioned assessment tools, Dexter et al. recommended that all schools develop universal criteria for evaluating students' development of critical thinking skills (82).

Their rationale for the proposed criteria is that if faculty give feedback using these criteria, graduates will internalize these skills and use them to monitor their own thinking and practice (see Table 4).

' src=

  • Application Information
  • ADEA GoDental
  • ADEA AADSAS
  • ADEA CAAPID
  • Events & Professional Development
  • Scholarships, Awards & Fellowships
  • Publications & Data
  • Official Guide to Dental Schools
  • Data, Analysis & Research
  • Follow Us On:

' src=

  • ADEA Privacy Policy
  • Terms of Use
  • Website Feedback
  • Website Help

critical thinking instrument

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 31 August 2024

Development and validation of a higher-order thinking skills (HOTS) scale for major students in the interior design discipline for blended learning

  • Dandan Li 1 ,
  • Xiaolei Fan 2 &
  • Lingchao Meng 3  

Scientific Reports volume  14 , Article number:  20287 ( 2024 ) Cite this article

Metrics details

  • Environmental social sciences

Assessing and cultivating students’ HOTS are crucial for interior design education in a blended learning environment. However, current research has focused primarily on the impact of blended learning instructional strategies, learning tasks, and activities on the development of HOTS, whereas few studies have specifically addressed the assessment of these skills through dedicated scales in the context of blended learning. This study aimed to develop a comprehensive scale for assessing HOTS in interior design major students within the context of blended learning. Employing a mixed methods design, the research involved in-depth interviews with 10 education stakeholders to gather qualitative data, which informed the development of a 66-item soft skills assessment scale. The scale was administered to a purposive sample of 359 undergraduate students enrolled in an interior design program at a university in China. Exploratory and confirmatory factor analyses were also conducted to evaluate the underlying factor structure of the scale. The findings revealed a robust four-factor model encompassing critical thinking skills, problem-solving skills, teamwork skills, and practical innovation skills. The scale demonstrated high internal consistency (Cronbach's alpha = 0.948–0.966) and satisfactory convergent and discriminant validity. This scale provides a valuable instrument for assessing and cultivating HOTS among interior design major students in blended learning environments. Future research can utilize a scale to examine the factors influencing the development of these skills and inform instructional practices in the field.

Similar content being viewed by others

critical thinking instrument

A meta-analysis of the effects of design thinking on student learning

critical thinking instrument

Blended knowledge sharing model in design professional

critical thinking instrument

Using design thinking for interdisciplinary curriculum design and teaching: a case study in higher education

Introduction.

In the contemporary landscape of the twenty-first century, students face numerous challenges that necessitate the development of competitive skills, with a particular emphasis on the cultivation of HOTS 1 , 2 , 3 , this has become a crucial objective in educational reform. Notably, it is worth noting that the National Education Association (NEA, 2012) has clearly identified critical thinking and problem-solving, communication, collaboration, creativity, and innovation as key competencies that students must possess in the current era, which are considered important components of twenty-first century skills 4 , 5 , 6 , 7 . As learners in the fields of creativity and design, students in the interior design profession also need to possess HOTS to address complex design problems and the evolving demands of the industry 8 , 9 .

Currently, blended learning has become an important instructional model in interior design education 10 , 11 . It serves as a teaching approach that combines traditional face-to-face instruction with online learning, providing students with a more flexible and personalized learning experience 12 , 13 . Indeed, several scholars have recognized the benefits of blended learning in providing students with diverse learning resources, activities, and opportunities for interaction, thereby fostering HOTS 14 , 15 , 16 , 17 . For example, blended learning, as evidenced by studies conducted by Anthony et al. 10 and Castro 11 , has demonstrated its efficacy in enhancing students' HOTS. The integration of online resources, virtual practices, and online discussions in blended learning fosters active student engagement and improves critical thinking, problem solving, and creative thinking skills. Therefore, teachers need to determine appropriate assessment methods and construct corresponding assessment tasks to assess students' expected learning outcomes. This decision requires teachers to have a clear understanding of students' learning progress and the development of various skills, whereas students have knowledge of only their scores and lack awareness of their individual skill development 18 , 19 .

Nevertheless, the precise assessment of students' HOTS in the blended learning milieu poses a formidable challenge. The dearth of empirically validated assessment tools impedes researchers from effectively discerning students' levels of cognitive aptitude and developmental growth within the blended learning realm 20 , 21 , 22 . In addition, from the perspective of actual research topics, current studies on blended learning focus mainly on the "concept, characteristics, mechanisms, models, and supporting technologies of blended learning 23 . " Research on "measuring students' HOTS in blended learning" is relatively limited, with most of the focus being on elementary, middle, and high school students 24 , 25 . Few studies have specifically examined HOTS measurement in the context of university students 26 , 27 , particularly in practical disciplines such as interior design. For example, Bervell et al. 28 suggested that the lack of high-quality assessment scales inevitably impacts the quality of research. Additionally, Schmitt 29 proposed the “Three Cs” principle for measurement, which includes clarity, coherence, and consistency. He highlighted that high-quality assessment scales should possess clear and specific measurement objectives, logically coherent items, and consistent measurement results to ensure the reliability and validity of the data. This reflects the importance of ensuring the alignment of the measurement goals of assessment scales with the research questions and the content of the discipline in the design of assessments.

The development of an assessment scale within the blended learning environment is expected to address the existing gap in measuring and assessing HOTS scores in interior design education. This scale not only facilitates the assessment of students' HOTS but also serves as a guide for curriculum design, instructional interventions, and student support initiatives. Ultimately, the integration of this assessment scale within the blended learning environment has the potential to optimize the development of HOTS among interior design students, empowering them to become adept critical thinkers, creative problem solvers, and competent professionals in the field.

Therefore, this study follows a scientific scale development procedure to develop an assessment scale specifically designed to measure the HOTS of interior design students in blended learning environments. This endeavor aims to provide educators with a reliable instrument for assessing students' progress in cultivating and applying HOTS, thus enabling the implementation of more effective teaching strategies and enhancing the overall quality of interior design education. The research questions are as follows:

What key dimensions should be considered when developing a HOTS assessment scale to accurately capture students' HOTS in an interior design major blended learning environment?

How can an advanced thinking skills assessment scale for blended learning in interior design be developed?

How can the reliability and validity of the HOTS assessment scale be verified and ensured, and is it reliable and effective in the interior design of major blended learning environments?

Key dimensions of HOTS assessment scale in an interior design major blended learning environment

The research results indicate that in the blended learning environment of interior design, this study identified 16 initial codes representing key dimensions for enhancing students' HOTS. These codes were further categorized into 8 main categories and 4 overarching themes: critical thinking, problem-solving, teamwork skills and practical innovation skills. They provide valuable insights for data comprehension and analysis, serving as a comprehensive framework for the HOTS scale. Analyzing category frequency and assessing its significance and universality in a qualitative dataset hold significant analytical value 30 , 31 . High-frequency terms indicate the central position of specific categories in participants' narratives, texts, and other data forms 32 . Through interviews with interior design experts and teachers, all core categories were mentioned more than 20 times, providing compelling evidence of their universality and importance within the field of interior design's HOTS dimensions. As shown in Table 1 .

Themes 1: critical thinking skills

Critical thinking skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. This discovery emphasizes the importance of critical thinking in interior design learning. This mainly includes the categories of logical reasoning and judgment, doubt and reflection, with a frequency of more than 8, highlighting the importance of critical thinking skills. Therefore, a detailed discussion of each feature is warranted. As shown in Table 2 .

Category 1: logical reasoning and judgment

The research results indicate that in a blended learning environment for interior design, logical reasoning and judgment play a key role in cultivating critical thinking skills. Logical reasoning refers to inferring reasonable conclusions from information through analysis and evaluation 33 . Judgment is based on logic and evidence for decision-making and evaluation. The importance of these concepts lies in their impact on the development and enhancement of students' HOTS. According to the research results, interior design experts and teachers unanimously believe that logical reasoning and judgment are very important. For example, as noted by Interviewee 1, “For students, logical reasoning skills are still very important. Especially in indoor space planning, students use logical reasoning to determine whether the layout of different functional areas is reasonable”. Similarly, Interviewee 2 also stated that “logical reasoning can help students conduct rational analysis of various design element combinations during the conceptual design stage, such as color matching, material selection, and lighting application”.

As emphasized by interviewees 1 and 2, logical reasoning and judgment are among the core competencies of interior designers in practical applications. These abilities enable designers to analyze and evaluate design problems and derive reasonable solutions from them. In the interior design industry, being able to conduct accurate logical reasoning and judgment is one of the key factors for success. Therefore, through targeted training and practice, students can enhance their logical thinking and judgment, thereby better addressing design challenges and providing innovative solutions.

Category 2: skepticism and reflection

Skepticism and reflection play crucial roles in cultivating students' critical thinking skills in a blended learning environment for interior design. Doubt can prompt students to question and explore information and viewpoints, whereas reflection helps students think deeply and evaluate their own thinking process 34 . These abilities are crucial for cultivating students' higher-order thinking skills. According to the research findings, most interior design experts and teachers agree that skepticism and reflection are crucial. For example, as noted by interviewees 3, “Sometimes, when facing learning tasks, students will think about how to better meet the needs of users”. Meanwhile, Interviewee 4 also agreed with this viewpoint. As emphasized by interviewees 3 and 4, skepticism and reflection are among the core competencies of interior designers in practical applications. These abilities enable designers to question existing perspectives and practices and propose innovative design solutions through in-depth thinking and evaluation. Therefore, in the interior design industry, designers with the ability to doubt and reflect are better able to respond to complex design needs and provide clients with unique and valuable design solutions.

Themes 2: problem-solving skills

The research findings indicate that problem-solving skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. This discovery emphasizes the importance of problem-solving skills in interior design learning. Specifically, categories such as identifying and defining problems, as well as developing and implementing plans, have been studied more than 8 times, highlighting the importance of problem-solving skills. Therefore, it is necessary to discuss each function in detail to better understand and cultivate students' problem-solving skills. As shown in Table 3 .

Category 1: identifying and defining issues

The research findings indicate that in a blended learning environment for interior design, identifying and defining problems play a crucial role in fostering students' problem-solving skills. Identifying and defining problems require students to possess the ability to analyze and evaluate problems, enabling them to accurately determine the essence of the problems and develop effective strategies and approaches to solve them 35 . Interior design experts and teachers widely recognize the importance of identifying and defining problems as core competencies in interior design practice. For example, Interviewee 5 emphasized the importance of identifying and defining problems, stating, "In interior design, identifying and defining problems is the first step in addressing design challenges. Students need to be able to clearly identify the scope, constraints, and objectives of the problems to engage in targeted thinking and decision-making in the subsequent design process." Interviewee 6 also supported this viewpoint. As stressed by Interviewees 5 and 6, identifying and defining problems not only require students to possess critical thinking abilities but also necessitate broad professional knowledge and understanding. Students need to comprehend principles of interior design, spatial planning, human behavior, and other relevant aspects to accurately identify and define problems associated with design tasks.

Category 2: developing and implementing a plan

The research results indicate that in a blended learning environment for interior design, developing and implementing plans plays a crucial role in cultivating students' problem-solving abilities. The development and implementation of a plan refers to students identifying and defining problems, devising specific solutions, and translating them into concrete implementation plans. Specifically, after determining the design strategy, students refine it into specific implementation steps and timelines, including drawing design drawings, organizing PPT reports, and presenting design proposals. For example, Interviewee 6 noted, “Students usually break down design strategies into specific tasks and steps by refining them.” Other interviewees also unanimously support this viewpoint. As emphasized by respondent 6, developing and implementing plans can help students maintain organizational, systematic, and goal-oriented problem-solving skills, thereby enhancing their problem-solving skills.

Themes 3: teamwork skills

The research results indicate that teamwork skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. This discovery emphasizes the importance of teamwork skills in interior design learning. This mainly includes communication and coordination and division of labor and collaboration, which are mentioned frequently in the interview documents. Therefore, it is necessary to discuss each function in detail to better understand and cultivate students' teamwork skills. As shown in Table 4 .

Category 1: communication and coordination

The research results indicate that communication and collaboration play crucial roles in cultivating students' teamwork abilities in a blended learning environment for interior design. Communication and collaboration refer to the ability of students to effectively share information, understand each other's perspectives, and work together to solve problems 36 . Specifically, team members need to understand each other's resource advantages integrate and share these resources to improve work efficiency and project quality. For example, Interviewee 7 noted, “In interior design, one member may be skilled in spatial planning, while another member may be skilled in color matching. Through communication and collaboration, team members can collectively utilize this expertise to improve work efficiency and project quality.” Other interviewees also unanimously believe that this viewpoint can promote students' teamwork skills, thereby promoting the development of their HOTS. As emphasized by the viewpoints of these interviewees, communication and collaboration enable team members to collectively solve problems and overcome challenges. Through effective communication, team members can exchange opinions and suggestions with each other, provide different solutions, and make joint decisions. Collaboration and cooperation among team members contribute to brainstorming and finding the best solution.

Category 2: division of labor and collaboration

The research results indicate that in the blended learning environment of interior design, the division of labor and collaboration play crucial roles in cultivating students' teamwork ability. The division of labor and collaboration refer to the ability of team members to assign different tasks and roles in a project based on their respective expertise and responsibilities and work together to complete the project 37 . For example, Interviewee 8 noted, “In an internal design project, some students are responsible for space planning, some students are responsible for color matching, and some students are responsible for rendering production.” Other interviewees also support this viewpoint. As emphasized by interviewee 8, the division of labor and collaboration help team members fully utilize their respective expertise and abilities, promote resource integration and complementarity, cultivate a spirit of teamwork, and enable team members to collaborate, support, and trust each other to achieve project goals together.

Themes 4: practical innovation skills

The research results indicate that practical innovation skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. This discovery emphasizes the importance of practical innovation skills in interior design learning. This mainly includes creative conception and design expression, as well as innovative application of materials and technology, which are often mentioned in interview documents. Therefore, it is necessary to discuss each function in detail to better understand and cultivate students' practical innovation skills. As shown in Table 5 .

Category 1: creative conception and design expression

The research results indicate that in the blended learning environment of interior design, creative ideation and design expression play crucial roles in cultivating students' practical and innovative skills. Creative ideation and design expression refer to the ability of students to break free from traditional thinking frameworks and try different design ideas and methods through creative ideation, which helps stimulate their creativity and cultivate their ability to think independently and solve problems. For example, interviewee 10 noted that "blended learning environments combine online and offline teaching modes, allowing students to acquire knowledge and skills more flexibly. Through learning and practice, students can master various expression tools and techniques, such as hand-drawn sketches, computer-aided design software, model making, etc., thereby more accurately conveying their design concepts." Other interviewees also expressed the importance of this viewpoint, emphasizing the importance of creative ideas and design expression in blended learning environments that cannot be ignored. As emphasized by interviewee 10, creative ideation and design expression in the blended learning environment of interior design can not only enhance students' creative thinking skills and problem-solving abilities but also strengthen their application skills in practical projects through diverse expression tools and techniques. The cultivation of these skills is crucial for students' success in their future careers.

Category 2: innovative application of materials and technology

Research findings indicate that the innovative application of materials and technology plays a crucial role in developing students' practical and creative skills within a blended learning environment for interior design. The innovative application of materials and technology refers to students' exploration and utilization of new materials and advanced technologies, enabling them to overcome the limitations of traditional design thinking and experiments with diverse design methods and approaches. This process not only stimulates their creativity but also significantly enhances their problem-solving skills. Specifically, the innovative application of materials and technology involves students gaining a deep understanding of the properties of new materials and their application methods in design, as well as becoming proficient in various advanced technological tools and equipment, such as 3D printing, virtual reality (VR), and augmented reality (AR). These skills enable students to more accurately realize their design concepts and effectively apply them in real-world projects.

For example, Interviewee 1 stated, "The blended learning environment combines online and offline teaching modes, allowing students to flexibly acquire the latest knowledge on materials and technology and apply these innovations in real projects." Other interviewees also emphasized the importance of this view. Therefore, the importance of the innovative application of materials and technology in a blended learning environment cannot be underestimated. As emphasized by interviewee 1, the innovative application of materials and technologies is crucial in the blended learning environment of interior design. This process not only enables students to flexibly acquire the latest materials and technical knowledge but also enables them to apply these innovations to practice in practical projects, thereby improving their practical abilities and professional ethics.

In summary, through research question 1 research, the dimensions of the HOTS assessment scale in blended learning for interior design include four main aspects: critical thinking skills, problem-solving skills, teamwork skills, and practical innovation skills. Based on the assessment scales developed by previous scholars in various dimensions, the researcher developed a HOTS assessment scale suitable for blended learning environments in interior design and collected feedback from interior design experts through interviews.

Development of the HOTS assessment scale

The above research results indicate that the dimensions of the HOTS scale mainly include critical thinking, problem-solving, teamwork skills and practical innovation skills. The dimensions of a scale represent the abstract characteristics and structure of the concept being measured. Since these dimensions are often abstract and difficult to measure directly, they need to be converted into several concrete indicators that can be directly observed or self-reported 38 . These concrete indicators, known as dimension items, operationalize the abstract dimensions, allowing for the measurement and evaluation of various aspects of the concept. This process transforms the abstract dimensions into specific, measurable components. The following content is based on the results of research question 1 to develop an advanced thinking skills assessment scale for mixed learning in interior design.

Dimension 1: critical thinking skills

The research results indicate that critical thinking skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. Critical thinking skills refer to the ability to analyze information objectively and make a reasoned judgment 39 . Scholars tend to emphasize this concept as a method of general skepticism, rational thinking, and self-reflection 7 , 40 . For example, Goodsett 26 suggested that it should be based on rational skepticism and careful thought about external matters as well as open self-reflection about internal thoughts and actions. Moreover, the California Critical Thinking Disposition Inventory (CCTDI) is widely used to measure critical thinking skills, including dimensions such as seeking truth, confidence, questioning and courage to seek truth, curiosity and openness, as well as analytical and systematic methods 41 . In addition, maturity means continuous adjustment and improvement of a person's cognitive system and learning activities through continuous awareness, reflection, and self-awareness 42 . Moreover, Nguyen 43 confirmed that critical thinking and cognitive maturity can be achieved through these activities, emphasizing that critical thinking includes cognitive skills such as analysis, synthesis, and evaluation, as well as emotional tendencies such as curiosity and openness.

In addition, in a blended learning environment for interior design, critical thinking skills help students better understand, evaluate, and apply design knowledge and skills, cultivating independent thinking and innovation abilities 44 . If students lack these skills, they may accept superficial information and solutions without sufficient thinking and evaluation, resulting in the overlooking of important details or the selection of inappropriate solutions in the design process. Therefore, for the measurement of critical thinking skills, the focus should be on cognitive skills such as analysis, synthesis, and evaluation, as well as curiosity and open mindedness. The specific items for critical thinking skills are shown in Table 6 .

Dimension 2: problem-solving skills

Problem-solving skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. Problem-solving skills involve the ability to analyze and solve problems by understanding them, identifying their root causes, and developing appropriate solutions 45 . According to the 5E-based STEM education approach, problem-solving skills encompass the following abilities: problem identification and definition, formulation of problem-solving strategies, problem representation, resource allocation, and monitoring and evaluation of solution effectiveness 7 , 46 . Moreover, D'zurilla and Nezu 47 and Tan 48 indicated that attitudes, beliefs, and knowledge skills during problem solving, as well as the quality of proposed solutions and observable outcomes, are demonstrated. In addition, D'Zurilla and Nezu devised the Social Problem-Solving Inventory (SPSI), which comprises seven subscales: cognitive response, emotional response, behavioral response, problem identification, generation of alternative solutions, decision-making, and solution implementation. Based on these research results, the problem-solving skills dimension questions designed in this study are shown in Table 7 .

Dimension 3: teamwork skills

The research results indicate that teamwork skills constitute a key core category in blended learning environments for interior design and are crucial for cultivating students' HOTS. Teamwork skills refer to the ability to effectively collaborate, coordinate, and communicate with others in a team environment 49 . For example, the Teamwork Skills Assessment Tool (TWKSAT) developed by Stevens and Campion 50 identifies five core dimensions of teamwork: conflict management; collaborative problem-solving; communication; goal setting; performance management; decision-making; and task coordination. The design of this tool highlights the essential skills in teamwork and provides a structured approach for evaluating these skills. In addition, he indicated that successful teams need to have a range of skills for problem solving, including situational control, conflict management, decision-making and coordination, monitoring and feedback, and an open mindset. These skills help team members effectively address complex challenges and demonstrate the team’s collaboration and flexibility. Therefore, the assessment of learners' teamwork skills needs to cover the above aspects. As shown in Table 8 .

Dimension 4: practice innovative skills

The research results indicate that practical innovation skills constitute a key core category in blended learning environments for interior design, which is crucial for cultivating students' HOTS. The practice of innovative skills encompasses the utilization of creative cognitive processes and problem-solving strategies to facilitate the generation of original ideas, solutions, and approaches 51 . This practice places significant emphasis on two critical aspects: creative conception and design expression, as well as the innovative application of materials and technology. Tang et al. 52 indicated that creative conception and design expression involve the generation and articulation of imaginative and inventive ideas within a given context. With the introduction of concepts such as 21st-century learning skills, the "5C" competency framework, and core student competencies, blended learning has emerged as the goal and direction of educational reform. It aims to promote the development of students' HOTS, equipping them with the essential qualities and key abilities needed for lifelong development and societal advancement. Blended learning not only emphasizes the mastery of core learning content but also requires students to develop critical thinking, complex problem-solving, creative thinking, and practical innovation skills. To adapt to the changes and developments in the blended learning environment, this study designed 13 preliminary test items based on 21st-century learning skills, the "5C" competency framework, core student competencies, and the TTCT assessment scale developed by Torrance 53 . These items aim to assess students' practice of innovative skills within a blended learning environment, as shown in Table 9 .

The researchers' results indicate that the consensus among the interviewed expert participants is that the structural integrity of the scale is satisfactory and does not require modification. However, certain measurement items have been identified as problematic and require revision. The primary recommendations are as follows: Within the domain of problem-solving skills, the item "I usually conduct classroom and online learning with questions and clear goals" was deemed biased because of its emphasis on the "online" environment. Consequently, the evaluation panel advised splitting this item into two separate components: (1) "I am adept at frequently adjusting and reversing a negative team atmosphere" and (2) "I consistently engage in praising and encouraging others, fostering harmonious relationships. “The assessment process requires revisions and adjustments to specific projects, forming a pilot test scale consisting of 66 observable results from the original 65 items. In addition, there were other suggestions about linguistic formulation and phraseology, which are not expounded upon herein.

Verify the effectiveness of the HOTS assessment scale

The research results indicate that there are significant differences in the average scores of the four dimensions of the HOTS, including critical thinking skills (A1–A24 items), problem-solving skills (B1–B13 items), teamwork skills (C1–C16 items), and practical innovation skills (D1–D13 items). Moreover, this also suggests that each item has discriminative power. Specifically, this will be explained through the following aspects.

Project analysis based on the CR value

The critical ratio (CR) method, which uses the CR value (decision value) to remove measurement items with poor discrimination, is the most used method in project analysis. The specific process involves the use of the CR value (critical value) to identify and remove such items. First, the modified pilot test scale data are aggregated and sorted. Individuals representing the top and bottom 27% of the distribution were subsequently selected, constituting 66 respondents in each group. The high-score group comprises individuals with a total score of 127 or above (including 127), whereas the low-score group comprises individuals with a total score of 99 or below (including 99). Finally, an independent sample t test was conducted to determine the significant differences in the mean scores for each item between the high-score and low-score groups. The statistical results are presented in Table 10 .

The above table shows that independent sample t tests were conducted for all the items; their t values were greater than 3, and their p values were less than 0.001, indicating that the difference between the highest and lowest 27% of the samples was significant and that each item had discriminative power.

In summary, based on previous research and relevant theories, the HOTS scale for interior design was revised. This revision process involved interviews with interior design experts, teachers, and students, followed by item examination and homogeneity testing via the critical ratio (CR) method. The results revealed significant correlations ( p  < 0.01) between all the items and the total score, with correlation coefficients (R) above 0.4. Therefore, the scale exhibits good accuracy and internal consistency in capturing measured HOTS. These findings provide a reliable foundation for further research and practical applications.

Pilot study exploratory factor analysis

This study used SPSS (version 28) to conduct the KMO and Bartlett tests on the scale. The total HOTS test scale as well as the KMO and Bartlett sphericities were first calculated for the four subscales to ensure that the sample data were suitable for factor analysis 7 . The overall KMO value is 0.946, indicating that the data are highly suitable for factor analysis. Additionally, Bartlett's test of sphericity was significant, further supporting the appropriateness of conducting factor analysis ( p  < 0.05). All the values are above 0.7, indicating that the data for these subscales are also suitable for factor analysis. According to Javadi et al. 54 , these results suggest the presence of shared factors among the items within the subscales, as shown in Table 11 .

For each subscale, exploratory factor analysis was conducted to extract factors with eigenvalues greater than 1 while eliminating items with communalities less than 0.30, loadings less than 0.50, and items that cross multiple (more than one) common factors 55 , 56 . Additionally, items that were inconsistent with the assumed structure of the measure were identified and eliminated to ensure the best structural validity. These principles were applied to the factor analysis of each subscale, ensuring that the extracted factor structure and observed items are consistent with the hypothesized measurement structure and analysis results, as shown in the table 55 , 58 . In the exploratory factor analysis (EFA), the latent variables were effectively interpreted and demonstrated a significant response, with cumulative explained variances of the common factors exceeding 60%. This finding confirms the alignment between the scale structure, comprising the remaining items, and the initial theoretical framework proposed in this study. Additionally, the items were systematically reorganized to construct the final questionnaire. Consequently, items A1 to A24 were associated with the critical thinking skills dimension, items B25 to B37 were linked to problem-solving skills, items C38 to C53 were indicative of teamwork skills, and items D54 to D66 were reflective of practical innovation skills. As shown in Table 12 below.

In addition, the criterion for extracting principal components in factor analysis is typically based on eigenvalues, with values greater than 1 indicating greater explanatory power than individual variables. The variance contribution ratio reflects the proportion of variance explained by each principal component relative to the total variance and signifies the ability of the principal component to capture comprehensive information. The cumulative variance contribution ratio measures the accumulated proportion of variance explained by the selected principal components, aiding in determining the optimal number of components to retain while minimizing information loss. The above table shows that four principal components can be extracted from the data, and their cumulative variance contribution rate reaches 59.748%.

However, from the scree plot (as shown in Fig.  1 ), the slope flattens starting from the fifth factor, indicating that no distinct factors can be extracted beyond that point. Therefore, retaining four factors seems more appropriate. The factor loading matrix is the core of factor analysis, and the values in the matrix represent the factor loading of each item on the common factors. Larger values indicate a stronger correlation between the item variable and the common factor. For ease of analysis, this study used the maximum variance method to rotate the initial factor loading matrix, redistributing the relationships between the factors and original variables and making the correlation coefficients range from 0 to 1, which facilitates interpretation. In this study, factor loadings with absolute values less than 0.4 were filtered out. According to the analysis results, the items of the HOTS assessment scale can be divided into four dimensions, which is consistent with theoretical expectations.

figure 1

Gravel plot of factors.

Through the pretest of the scale and selection of measurement items, 66 measurement items were ultimately determined. On this basis, a formal scale for assessing HOTS in a blended learning environment was developed, and the reliability and validity of the scale were tested to ultimately confirm its usability.

Confirmatory factor analysis of final testing

Final test employed that AMOS (version 26.0), a confirmatory factor analysis (CFA) was conducted on the retested sample data to validate the stability of the HOTS structural model obtained through exploratory factor analysis. This analysis aimed to assess the fit between the measurement results and the actual data, confirming the robustness of the derived HOTS structure and its alignment with the empirical data. The relevant model was constructed based on the factor structure of each component obtained through EFA and the observed variables, as shown in the diagram. The model fit indices are presented in Fig.  2 (among them, A represents critical thinking skills, B represents problem-solving skills, C represents teamwork skills, and D represents practical innovation skills). The models strongly support the "4-dimensional" structure of the HOTS, which includes four first-order factors: critical thinking skills, problem-solving skills, teamwork skills, and practical innovation skills. Critical thinking skills play a pivotal role in the blended learning environment of interior design, connecting problem-solving skills, teamwork skills, and innovative practices. These four dimensions form the assessment structure of HOTS, with critical thinking skills serving as the core element, inspiring individuals to assess problems and propose innovative solutions. By providing appropriate learning resources, diverse learning activities, and learning tasks, as well as designing items for assessment scales, it is possible to delve into the measurement and development of HOTS in the field of interior design, providing guidance for educational and organizational practices. This comprehensive approach to learning and assessment helps cultivate students' HOTS and lays a solid foundation for their comprehensive abilities in the field of interior design. Thus, the CFA structural models provide strong support for the initial hypothesis of the proposed HOTS assessment structure in this study. As shown in Fig.  2 .

figure 2

Confirmatory factor analysis based on 4 dimensions. *A represents the dimension of critical thinking. B represents the dimension of problem-solving skills. C represents the dimension of teamwork skills. D represents the dimension of practical innovation skills.

Additionally, χ2. The fitting values of RMSEA and SRMR are both below the threshold, whereas the fitting values of the other indicators are all above the threshold, indicating that the model fits well. As shown in Table 13 .

Reliability and validity analysis

The reliability and validity of the scale need to be assessed after the model fit has been determined through validation factor analysis 57 . Based on the findings of Marsh et al. 57 , the following conclusions can be drawn. In terms of hierarchical and correlational model fit, the standardized factor loadings of each item range from 0.700 to 0.802, all of which are greater than or equal to 0.7. This indicates a strong correspondence between the observed items and each latent variable. Furthermore, the Cronbach's α coefficients, which are used to assess the internal consistency or reliability of the scale, ranged from 0.948 to 0.966 for each dimension, indicating a high level of data reliability and internal consistency. The composite reliabilities ranged from 0.948 to 0.967, exceeding the threshold of 0.6 and demonstrating a substantial level of consistency (as shown in Table 14 ).

Additionally, the diagonal bold font represents the square root of the AVE for each dimension. All the dimensions have average variance extracted (AVE) values ranging from 0.551 to 0.589, all of which are greater than 0.5, indicating that the latent variables have strong explanatory power for their corresponding items. These results suggest that the scale structure constructed in this study is reliable and effective. Furthermore, according to the results presented in Table 15 , the square roots of the AVE values for each dimension are greater than the absolute values of the correlations with other dimensions, indicating discriminant validity of the data. Therefore, these four subscales demonstrate good convergent and discriminant validity, indicating that they are both interrelated and independent. This implies that they can effectively capture the content required to complete the HOTS test scale.

Discussion and conclusion

The assessment scale for HOTS in interior design blended learning encompasses four dimensions: critical thinking skills, problem-solving skills, teamwork skills, and practical innovation skills. The selection of these dimensions is based on the characteristics and requirements of the interior design discipline, which aims to comprehensively evaluate students' HOTS demonstrated in blended learning environments to better cultivate their ability to successfully address complex design projects in practice. Notably, multiple studies have shown that HOTSs include critical thinking, problem-solving skills, creative thinking, and decision-making skills, which are considered crucial in various fields, such as education, business, and engineering 20 , 59 , 60 , 61 . Compared with prior studies, these dimensions largely mirror previous research outcomes, with notable distinctions in the emphasis on teamwork skills and practical innovation skills 62 , 63 . Teamwork skills underscore the critical importance of collaboration in contemporary design endeavors, particularly within the realm of interior design 64 . Effective communication and coordination among team members are imperative for achieving collective design objectives.

Moreover, practical innovation skills aim to increase students' capacity for creatively applying theoretical knowledge in practical design settings. Innovation serves as a key driver of advancement in interior design, necessitating students to possess innovative acumen and adaptability to evolving design trends for industry success. Evaluating practical innovation skills aims to motivate students toward innovative thinking, exploration of novel concepts, and development of unique design solutions, which is consistent with the dynamic and evolving nature of the interior design sector. Prior research suggests a close interplay between critical thinking, problem-solving abilities, teamwork competencies, and creative thinking, with teamwork skills acting as a regulatory factor for critical and creative thought processes 7 , 65 . This interconnected nature of HOTS provides theoretical support for the construction and validation of a holistic assessment framework for HOTS.

After the examination by interior design expert members, one item needed to be split into two items. The results of the CR (construct validity) analysis of the scale items indicate that independent sample t tests were subsequently conducted on all the items. The t values were greater than 3, with p values less than 0.001, indicating significant differences between the top and bottom 27% of the samples and demonstrating the discriminant validity of each item. This discovery highlights the diversity and effectiveness of the scale's internal items, revealing the discriminatory power of the scale in assessing the study subjects. The high t values and significant p values reflect the substantiality of the internal items in distinguishing between different sample groups, further confirming the efficacy of these items in evaluating the target characteristics. These results provide a robust basis for further refinement and optimization of the scale and offer guidance for future research, emphasizing the importance of scale design in research and providing strong support for data interpretation and analysis.

This process involves evaluating measurement scales through EFA, and it was found that the explanatory variance of each subscale reached 59.748%, and the CR, AVE, Cronbach's alpha, and Pearson correlation coefficient values of the total scale and subscales were in a better state, which strongly demonstrates the structure, discrimination, and convergence effectiveness of the scale 57 .

The scale structure and items of this study are reliable and effective, which means that students in the field of interior design can use them to test their HOTS level and assess their qualities and abilities. In addition, scholars can use this scale to explore the relationships between students' HOTS and external factors, personal personalities, etc., to determine different methods and strategies for developing and improving HOTS.

Limitations and future research

The developed mixed learning HOTS assessment scale for interior design also has certain limitations that need to be addressed in future research. The first issue is that, owing to the requirement of practical innovation skills, students need to have certain practical experience and innovative abilities. First-grade students usually have not yet had sufficient opportunities for learning and practical experience, so it may not be possible to evaluate their abilities effectively in this dimension. Therefore, when this scale is used for assessment, it is necessary to consider students' grade level and learning experience to ensure the applicability and accuracy of the assessment tool. For first-grade students, it may be necessary to use other assessment tools that are suitable for their developmental stage and learning experience to evaluate other aspects of their HOTS 7 . Future research should focus on expanding the scope of this dimension to ensure greater applicability.

The second issue is that the sample comes from ordinary private undergraduate universities in central China and does not come from national public universities or key universities. Therefore, there may be regional characteristics in the obtained data. These findings suggest that the improved model should be validated with a wider range of regional origins, a more comprehensive school hierarchy, and a larger sample size. The thirdly issue is the findings of this study are derived from self-reported data collected from participants through surveys. However, it is important to note that the literature suggests caution in heavily relying on such self-reported data, as perception does not always equate to actions 66 . In addition, future research can draw on this scale to evaluate the HOTS of interior design students, explore the factors that affect their development, determine their training and improvement paths, and cultivate skilled talent for the twenty-first century.

This study adopts a mixed method research approach, combining qualitative and quantitative methods to achieve a comprehensive understanding of the phenomenon 67 . By integrating qualitative and quantitative research methods, mixed methods research provides a comprehensive and detailed exploration of research questions, using multiple data sources and analytical methods to obtain accurate and meaningful answers 68 . To increase the quality of the research, the entire study followed the guidelines for scale development procedures outlined by Professor Li after the data were obtained. As shown in Fig.  3

figure 3

Scale development program.

Basis of theory

This study is guided by educational objectives such as 21st-century learning skills, the "5C" competency framework, and students' core abilities 4 . The construction process of the scale is based on theoretical foundations, including Bloom's taxonomy. Drawing from existing research, such as the CCTDI 41 , SPSI 69 , and TWKSAT scales, the dimensions and preliminary items of the scale were developed. Additionally, to enhance the validity and reliability of the scale, dimensions related to HOTS in interior design were obtained through semi-structured interviews, and the preliminary project adapted or directly cited existing research results. The preliminary items were primarily adapted or directly referenced from existing research findings. Based on existing research, such as the CCTDI, SPSI, TWKSAT, and twenty-first century skills frameworks, this study takes "critical thinking skills, problem-solving skills, teamwork skills, and practical innovative skills" as the four basic dimensions of the scale.

Participants and procedures

This study is based on previous research and develops a HOTS assessment scale to measure the thinking levels of interior design students in blended learning. By investigating the challenges and opportunities students encounter in blended learning environments and exploring the complexity and diversity of their HOTS, this study aims to obtain comprehensive insights. For research question 1, via the purposive sampling method, 10 interior design experts are selected to investigate the dimensions and evaluation indicators of HOTS in blended learning of interior design. The researcher employed a semi structured interview method, and a random sampling technique was used to select 10 senior experts and teachers in the field of interior design, holding the rank of associate professor or above. This included 5 males and 5 females. As shown in Table 16 .

For research question 2 and 3, the research was conducted at an undergraduate university in China, in the field of interior design and within a blended learning environment. In addition, a statement confirms that all experimental plans have been approved by the authorized committee of Zhengzhou University of Finance and Economics. In the process of practice, the methods used were all in accordance with relevant guidelines and regulations, and informed consent was obtained from all participants. The Interior Design Blended Learning HOTS assessment scale was developed based on sample data from 350 students who underwent one pre-test and retest. The participants in the study consisted of second-, third-, and fourth-grade students who had participated in at least one blended learning course. The sample sizes were 115, 118, and 117 for the respective grade levels, totaling 350 individuals. Among the participants, there were 218 male students and 132 female students, all of whom were within the age range of 19–22 years. Through purposeful sampling, this study ensured the involvement of relevant participants and focused on a specific university environment with diverse demographic characteristics and rich educational resources.

This approach enhances the reliability and generalizability of the research and contributes to a deeper understanding of the research question (as shown in Table 17 ).

Instruments

The tools used in this study include semi structured interview guidelines and the HOTS assessment scale developed by the researchers. For research question 1, the semi structured interview guidelines were reviewed by interior design experts to ensure the accuracy and appropriateness of their content and questions. In addition, for research question 2 and 3, the HOTS assessment scale developed by the researchers will be checked via the consistency ratio (CR) method to assess the consistency and reliability of the scale items and validate their effectiveness.

Data analysis

For research question 1, the researcher will utilize the NVivo version 14 software tool to conduct thematic analysis on the data obtained through semi structured interviews. Thematic analysis is a commonly used qualitative research method that aims to identify and categorize themes, concepts, and perspectives that emerge within a dataset 70 . By employing NVivo software, researchers can effectively organize and manage large amounts of textual data and extract themes and patterns from them.

For research question 2, the critical ratio (CR) method was employed to conduct item analysis and homogeneity testing on the items of the pilot test questionnaire. The CR method allows for the assessment of each item's contribution to the total score and the evaluation of the interrelationships among the items within the questionnaire. These analytical techniques served to facilitate the evaluation and validation of the scale's reliability and validity.

For research question 3, this study used SPSS (version 26), in which confirmatory factor analysis (CFA) was conducted on the confirmatory sample data via maximum likelihood estimation. The purpose of this analysis was to verify whether the hypothesized factor structure model of the questionnaire aligned with the actual survey data. Finally, several indices, including composite reliability (CR), average variance extracted (CR), average variance extracted (AVE), Cronbach's alpha coefficient, and the Pearson correlation coefficient, were computed to assess the reliability and validity of the developed scale and assess its reliability and validity.

In addition, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are commonly utilized techniques in questionnaire development and adaptation research 31 , 70 . The statistical software packages SPSS and AMOS are frequently employed for implementing these analytical techniques 71 , 72 , 73 . CFA is a data-driven approach to factor generation that does not require a predetermined number of factors or specific relationships with observed variables. Its focus lies in the numerical characteristics of the data. Therefore, prior to conducting CFA, survey questionnaires are typically constructed through EFA to reveal the underlying structure and relationships between observed variables and the latent structure.

In contrast, CFA tests the hypothesized model structure under specific theoretical assumptions or structural hypotheses, including the interrelationships among factors and the known number of factors. Its purpose is to validate the hypothesized model structure. Thus, the initial validity of the questionnaire structure, established through EFA, necessitates further confirmation through CFA 57 , 70 . Additionally, a sample size of at least 200 is recommended for conducting the validation factor analysis. In this study, confirmatory factor analysis was performed on a sample size of 317.

Data availability

All data generated or analyzed during this study are included in this published article. All the experimental protocols were approved by the Zhengzhou College of Finance and Economics licensing committee.

Hariadi, B. et al. Higher order thinking skills based learning outcomes improvement with blended web mobile learning Model. Int. J. Instr. 15 (2), 565–578 (2022).

Google Scholar  

Sagala, P. N. & Andriani, A. Development of higher-order thinking skills (HOTS) questions of probability theory subject based on bloom’s taxonomy. J. Phys. Conf. Ser. https://doi.org/10.1088/1742-6596/1188/1/012025 (2019).

Article   Google Scholar  

Yudha, R. P. Higher order thinking skills (HOTS) test instrument: Validity and reliability analysis with the rasch model. Eduma Math. Educ. Learn. Teach. https://doi.org/10.24235/eduma.v12i1.9468 (2023).

Leach, S. M., Immekus, J. C., French, B. F. & Hand, B. The factorial validity of the Cornell critical thinking tests: A multi-analytic approach. Think. Skills Creat. https://doi.org/10.1016/j.tsc.2020.100676 (2020).

Noroozi, O., Dehghanzadeh, H. & Talaee, E. A systematic review on the impacts of game-based learning on argumentation skills. Entertain. Comput. https://doi.org/10.1016/j.entcom.2020.100369 (2020).

Supena, I., Darmuki, A. & Hariyadi, A. The influence of 4C (constructive, critical, creativity, collaborative) learning model on students’ learning outcomes. Int. J. Instr. 14 (3), 873–892. https://doi.org/10.29333/iji.2021.14351a (2021).

Zhou, Y., Gan, L., Chen, J., Wijaya, T. T. & Li, Y. Development and validation of a higher-order thinking skills assessment scale for pre-service teachers. Think. Skills Creat. https://doi.org/10.1016/j.tsc.2023.101272 (2023).

Musfy, K., Sosa, M. & Ahmad, L. Interior design teaching methodology during the global COVID-19 pandemic. Interiority 3 (2), 163–184. https://doi.org/10.7454/in.v3i2.100 (2020).

Yong, S. D., Kusumarini, Y. & Tedjokoesoemo, P. E. D. Interior design students’ perception for AutoCAD SketchUp and Rhinoceros software usability. IOP Conf. Ser. Earth Environ. Sci. https://doi.org/10.1088/1755-1315/490/1/012015 (2020).

Anthony, B. et al. Blended learning adoption and implementation in higher education: A theoretical and systematic review. Technol. Knowl. Learn. 27 (2), 531–578. https://doi.org/10.1007/s10758-020-09477-z (2020).

Castro, R. Blended learning in higher education: Trends and capabilities. Edu. Inf. Technol. 24 (4), 2523–2546. https://doi.org/10.1007/s10639-019-09886-3 (2019).

Alismaiel, O. Develop a new model to measure the blended learning environments through students’ cognitive presence and critical thinking skills. Int. J. Emerg. Technol. Learn. 17 (12), 150–169. https://doi.org/10.3991/ijet.v17i12.30141 (2022).

Gao, Y. Blended teaching strategies for art design major courses in colleges. Int. J. Emerg. Technol. Learn. https://doi.org/10.3991/ijet.v15i24.19033 (2020).

Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J. & Drachsler, H. Feedback sources in essay writing: peer-generated or AI-generated feedback?. Int. J. Edu. Technol. Higher Edu. 21 (1), 23 (2024).

Ji, J. A Design on Blended Learning to Improve College English Students’ Higher-Order Thinking Skills. https://doi.org/10.18282/l-e.v10i4.2553 (2021).

Noroozi, O. The role of students’ epistemic beliefs for their argumentation performance in higher education. Innov. Edu. Teach. Int. 60 (4), 501–512 (2023).

Valero Haro, A., Noroozi, O., Biemans, H. & Mulder, M. First- and second-order scaffolding of argumentation competence and domain-specific knowledge acquisition: A systematic review. Technol. Pedag. Edu. 28 (3), 329–345. https://doi.org/10.1080/1475939x.2019.1612772 (2019).

Narasuman, S. & Wilson, D. M. Investigating teachers’ implementation and strategies on higher order thinking skills in school based assessment instruments. Asian J. Univ. Edu. https://doi.org/10.24191/ajue.v16i1.8991 (2020).

Valero Haro, A., Noroozi, O., Biemans, H. & Mulder, M. Argumentation competence: Students’ argumentation knowledge, behavior and attitude and their relationships with domain-specific knowledge acquisition. J. Constr. Psychol. 35 (1), 123–145 (2022).

Johansson, E. The Assessment of Higher-order Thinking Skills in Online EFL Courses: A Quantitative Content Analysis (2020).

Noroozi, O., Kirschner, P. A., Biemans, H. J. A. & Mulder, M. Promoting argumentation competence: Extending from first- to second-order scaffolding through adaptive fading. Educ. Psychol. Rev. 30 (1), 153–176. https://doi.org/10.1007/s10648-017-9400-z (2017).

Noroozi, O., Weinberger, A., Biemans, H. J. A., Mulder, M. & Chizari, M. Facilitating argumentative knowledge construction through a transactive discussion script in CSCL. Comput. Educ. 61 , 59–76. https://doi.org/10.1016/j.compedu.2012.08.013 (2013).

Noroozi, O., Weinberger, A., Biemans, H. J. A., Mulder, M. & Chizari, M. Argumentation-based computer supported collaborative learning (ABCSCL): A synthesis of 15 years of research. Educ. Res. Rev. 7 (2), 79–106. https://doi.org/10.1016/j.edurev.2011.11.006 (2012).

Setiawan, Baiq Niswatul Khair, Ratnadi Ratnadi, Mansur Hakim, & Istiningsih, S. Developing HOTS-Based Assessment Instrument for Primary Schools (2019).

Suparman, S., Juandi, D., & Tamur, M. Does Problem-Based Learning Enhance Students’ Higher Order Thinking Skills in Mathematics Learning? A Systematic Review and Meta-Analysis 2021 4th International Conference on Big Data and Education (2021).

Goodsett, M. Best practices for teaching and assessing critical thinking in information literacy online learning objects. J. Acad. Lib. https://doi.org/10.1016/j.acalib.2020.102163 (2020).

Putra, I. N. A. J., Budiarta, L. G. R., & Adnyayanti, N. L. P. E. Developing Authentic Assessment Rubric Based on HOTS Learning Activities for EFL Teachers. In Proceedings of the 2nd International Conference on Languages and Arts across Cultures (ICLAAC 2022) (pp. 155–164). https://doi.org/10.2991/978-2-494069-29-9_17 .

Bervell, B., Umar, I. N., Kumar, J. A., Asante Somuah, B. & Arkorful, V. Blended learning acceptance scale (BLAS) in distance higher education: Toward an initial development and validation. SAGE Open https://doi.org/10.1177/21582440211040073 (2021).

Byrne, D. A worked example of Braun and Clarke’s approach to reflexive thematic analysis. Qual. Quant. 56 (3), 1391–1412 (2022).

Xu, W. & Zammit, K. Applying thematic analysis to education: A hybrid approach to interpreting data in practitioner research. Int. J. Qual. Methods 19 , 1609406920918810 (2020).

Braun, V. & Clarke, V. Conceptual and design thinking for thematic analysis. Qual. Psychol. 9 (1), 3 (2022).

Creswell, A., Shanahan, M., & Higgins, I. Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv:2205.09712 (2022).

Baron, J. Thinking and Deciding 155–156 (Cambridge University Press, 2023).

Book   Google Scholar  

Silver, N., Kaplan, M., LaVaque-Manty, D. & Meizlish, D. Using Reflection and Metacognition to Improve Student Learning: Across the Disciplines, Across the Academy (Taylor & Francis, 2023).

Oksuz, K., Cam, B. C., Kalkan, S. & Akbas, E. Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 43 (10), 3388–3415 (2020).

Saputra, M. D., Joyoatmojo, S., Wardani, D. K. & Sangka, K. B. Developing critical-thinking skills through the collaboration of jigsaw model with problem-based learning model. Int. J. Instr. 12 (1), 1077–1094 (2019).

Imam, H. & Zaheer, M. K. Shared leadership and project success: The roles of knowledge sharing, cohesion and trust in the team. Int. J. Project Manag. 39 (5), 463–473 (2021).

DeCastellarnau, A. A classification of response scale characteristics that affect data quality: A literature review. Qual. Quant. 52 (4), 1523–1559 (2018).

Article   PubMed   Google Scholar  

Haber, J. Critical Thinking 145–146 (MIT Press, 2020).

Hanscomb, S. Critical Thinking: The Basics 180–181 (Routledge, 2023).

Sulaiman, W. S. W., Rahman, W. R. A. & Dzulkifli, M. A. Examining the construct validity of the adapted California critical thinking dispositions (CCTDI) among university students in Malaysia. Proc. Social Behav. Sci. 7 , 282–288 (2010).

Jaakkola, N. et al. Becoming self-aware—How do self-awareness and transformative learning fit in the sustainability competency discourse?. Front. Educ. https://doi.org/10.3389/feduc.2022.855583 (2022).

Nguyen, T. T. B. Critical thinking: What it means in a Vietnamese tertiary EFL context. English For. Language Int. J. 2 (3), 4–23 (2022).

Henriksen, D., Gretter, S. & Richardson, C. Design thinking and the practicing teacher: Addressing problems of practice in teacher education. Teach. Educ. 31 (2), 209–229 (2020).

Okes, D. Root cause analysis: The core of problem solving and corrective action 179–180 (Quality Press, 2019).

Eroğlu, S. & Bektaş, O. The effect of 5E-based STEM education on academic achievement, scientific creativity, and views on the nature of science. Learn. Individual Differ. 98 , 102181 (2022).

Dzurilla, T. J. & Nezu, A. M. Development and preliminary evaluation of the social problem-solving inventory. Psychol. Assess. J. Consult. Clin. Psychol. 2 (2), 156 (1990).

Tan, O.-S. Problem-based learning innovation: Using problems to power learning in the 21st century. Gale Cengage Learning (2021).

Driskell, J. E., Salas, E. & Driskell, T. Foundations of teamwork and collaboration. Am. Psychol. 73 (4), 334 (2018).

Lower, L. M., Newman, T. J. & Anderson-Butcher, D. Validity and reliability of the teamwork scale for youth. Res. Social Work Pract. 27 (6), 716–725 (2017).

Landa, R. Advertising by design: generating and designing creative ideas across media (Wiley, 2021).

Tang, T., Vezzani, V. & Eriksson, V. Developing critical thinking, collective creativity skills and problem solving through playful design jams. Think. Skills Creat. 37 , 100696 (2020).

Torrance, E. P. Torrance tests of creative thinking. Educational and psychological measurement (1966).

Javadi, M. H., Khoshnami, M. S., Noruzi, S. & Rahmani, R. Health anxiety and social health among health care workers and health volunteers exposed to coronavirus disease in Iran: A structural equation modeling. J. Affect. Disord. Rep. https://doi.org/10.1016/j.jadr.2022.100321 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Hu, L. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct. Equ. Model. Multidiscip. J. 6 (1), 1–55. https://doi.org/10.1080/10705519909540118 (1999).

Matsunaga, M. Item parceling in structural equation modeling: A primer. Commun. Methods Measures 2 (4), 260–293. https://doi.org/10.1080/19312450802458935 (2008).

Marsh, H. W., Morin, A. J., Parker, P. D. & Kaur, G. Exploratory structural equation modeling: An integration of the best features of exploratory and confirmatory factor analysis. Ann. Rev. Clin. Psychol. 10 (1), 85–110 (2014).

Song, Y., Lee, Y. & Lee, J. Mediating effects of self-directed learning on the relationship between critical thinking and problem-solving in student nurses attending online classes: A cross-sectional descriptive study. Nurse Educ. Today https://doi.org/10.1016/j.nedt.2021.105227 (2022).

Chu, S. K. W., Reynolds, R. B., Tavares, N. J., Notari, M., & Lee, C. W. Y. 21st century skills development through inquiry-based learning from theory to practice . Springer (2021).

Eliyasni, R., Kenedi, A. K. & Sayer, I. M. Blended learning and project based learning: the method to improve students’ higher order thinking skill (HOTS). Jurnal Iqra’: Kajian Ilmu Pendidikan 4 (2), 231–248 (2019).

Yusuf, P. & Istiyono,. Blended learning: Its effect towards higher order thinking skills (HOTS). J. Phys. Conf. Ser. https://doi.org/10.1088/1742-6596/1832/1/012039 (2021).

Byron, K., Keem, S., Darden, T., Shalley, C. E. & Zhou, J. Building blocks of idea generation and implementation in teams: A meta-analysis of team design and team creativity and innovation. Personn. Psychol. 76 (1), 249–278 (2023).

Walid, A., Sajidan, S., Ramli, M. & Kusumah, R. G. T. Construction of the assessment concept to measure students’ high order thinking skills. J. Edu. Gift. Young Sci. 7 (2), 237–251 (2019).

Alawad, A. Evaluating online learning practice in the interior design studio. Int. J. Art Des. Edu. 40 (3), 526–542. https://doi.org/10.1111/jade.12365 (2021).

Awuor, N. O., Weng, C. & Militar, R. Teamwork competency and satisfaction in online group project-based engineering course: The cross-level moderating effect of collective efficacy and flipped instruction. Comput. Educ. 176 , 104357 (2022).

Noroozi, O., Alqassab, M., Taghizadeh Kerman, N., Banihashem, S. K. & Panadero, E. Does perception mean learning? Insights from an online peer feedback setting. Assess. Eval. Higher Edu. https://doi.org/10.1080/02602938.2024.2345669 (2024).

Creswell, J. W. A concise introduction to mixed methods research. SAGE publications124–125 (2021) .

Tashakkori, A., Johnson, R. B., & Teddlie, C. Foundations of mixed methods research: Integrating quantitative and qualitative approaches in the social and behavioral sciences. Sage Publications 180–181(2020).

Jiang, X., Lyons, M. D. & Huebner, E. S. An examination of the reciprocal relations between life satisfaction and social problem solving in early adolescents. J. Adolescence 53 (1), 141–151. https://doi.org/10.1016/j.adolescence.2016.09.004 (2016).

Orcan, F. Exploratory and confirmatory factor analysis: Which one to use first. Egitimde ve Psikolojide Olçme ve Degerlendirme Dergisi https://doi.org/10.21031/epod.394323 (2018).

Asparouhov, T. & Muthén, B. Exploratory structural equation modeling. Struct. Eq. Model. Multidiscip. J. 16 (3), 397–438 (2009).

Article   MathSciNet   Google Scholar  

Finch, H., French, B. F., & Immekus, J. C. Applied psychometrics using spss and amos. IAP (2016).

Marsh, H. W., Guo, J., Dicke, T., Parker, P. D. & Craven, R. G. Confirmatory factor analysis (CFA), exploratory structural equation modeling (ESEM), and Set-ESEM: Optimal balance between goodness of fit and parsimony. Multivar. Behav. Res. 55 (1), 102–119. https://doi.org/10.1080/00273171.2019.1602503 (2020).

Download references

Acknowledgements

Thanks to the editorial team and reviewers of Scientific Reports for their valuable comments.

Author information

Authors and affiliations.

Faculty of Education, SEGI University, 47810 Petaling Jaya, Selangor, Malaysia

Department of Art and Design, Zhengzhou College of Finance and Economics, Zhengzhou, 450000, Henan, China

Xiaolei Fan

Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, 999078, Taipa, Macao, Special Administrative Region of China

Lingchao Meng

You can also search for this author in PubMed   Google Scholar

Contributions

D.L. Conceptualized a text experiment, and wrote the main manuscript text. D.L. and X.F. conducted experiments, D.L., X.F. and L.M. analyzed the results. L.M. contributed to the conceptualization, methodology and editing, and critically reviewed the manuscript. All authors have reviewed the manuscript.

Corresponding author

Correspondence to Lingchao Meng .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, D., Fan, X. & Meng, L. Development and validation of a higher-order thinking skills (HOTS) scale for major students in the interior design discipline for blended learning. Sci Rep 14 , 20287 (2024). https://doi.org/10.1038/s41598-024-70908-3

Download citation

Received : 28 February 2024

Accepted : 22 August 2024

Published : 31 August 2024

DOI : https://doi.org/10.1038/s41598-024-70908-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Assessment scale
  • Higher-order thinking skills
  • Interior design
  • Blended learning

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

critical thinking instrument

  • DOI: 10.14742/AJET.1349
  • Corpus ID: 54686831

An instrument to support thinking critically about critical thinking in online asynchronous discussions

  • Elizabeth Murphy
  • Published 1 November 2004
  • Education, Computer Science
  • Australasian Journal of Educational Technology

Tables from this paper

table 1

45 Citations

Critical thinking skills in asynchronous discussion forums: a case study, tagging thinking types in asynchronous discussion groups: effects on critical thinking.

  • Highly Influenced

Empirical study of teaching presence and critical thinking in asynchronous discussion forums

Generation gaps:, research on the mechanism of critical thinking in online asynchronous discussion, the effects of two computer-supported collaborative learning (cscl) scripts on university students’ critical thinking computer-supported collaborative learning scripts on university, students’ levels of critical thinking, supportive behaviors and types of questions in an online forum learning environment, cognitive skills in internet‐supported learning environments in higher education: research issues, enhancing student learning outcomes in asynchronous online discussion, digital peer feedback and students’ critical thinking: what correlation and to what extent, 23 references, a content analysis method to measure critical thinking in face-to-face and computer supported group learning, teaching critical thinking in undergraduate science courses, participation and critical thinking in online university distance education.

  • Highly Influential
  • 19 Excerpts

Evaluating the Quality of Learning in Computer Supported Co-Operative Learning

Critical thinking and computer conferencing: a model and tool to assess cognitive presence, critical inquiry in a text-based environment: computer conferencing in higher education, meaning negotiation, knowledge construction, and mentoring in a distance learning course, a case study of participation and critical thinking in a university-level course delivered by computer conferencing, thought and knowledge: an introduction to critical thinking, analysis of a global online debate and the development of an interaction analysis model for examining social construction of knowledge in computer conferencing, related papers.

Showing 1 through 3 of 0 Related Papers

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A Systematic Review of Critical Thinking Instruments for Use in Dental Education

Affiliations.

  • 1 Patrick L. Anders, DDS, MPH, is Clinical Associate Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine; Elizabeth M. Stellrecht, MLS, is Senior Assistant Librarian and Liaison, School of Dental Medicine, Health Science Library, University at Buffalo; Elaine L. Davis, PhD, is Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine; and W.D. McCall Jr., PhD, is Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine. [email protected].
  • 2 Patrick L. Anders, DDS, MPH, is Clinical Associate Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine; Elizabeth M. Stellrecht, MLS, is Senior Assistant Librarian and Liaison, School of Dental Medicine, Health Science Library, University at Buffalo; Elaine L. Davis, PhD, is Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine; and W.D. McCall Jr., PhD, is Professor, Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine.
  • PMID: 30745345
  • DOI: 10.21815/JDE.019.043

Critical thinking is widely recognized as an essential competency in dental education, but there is little agreement on how it should be assessed. The aim of this systematic review was to determine the availability of instruments that could be used to measure critical thinking in dental students and to evaluate psychometric evidence to support their use. In January 2017, an electronic search of both the medical and education literature was performed on nine databases. The search included both keyword and Medical Subject Heading terms for critical thinking, higher education/health sciences education, measurement/assessment, and reproducibility of results. The grey literature was included in the search. The search produced 2,977 unique records. From the title and abstract review, 183 articles were selected for further review, which resulted in 36 articles for data extraction. Ten of these studies sought to evaluate psychometric properties of the instruments used and were subjected to quality assessment. Seven assessment instruments were identified. Of these, three instruments that have not been widely used nor tested in health professions students showed evidence of psychometric strength and appeared to have potential for use in dental education. Further research should focus on the three critical thinking instruments with strong psychometric evidence, with the aim of establishing validity and reliability in the context of dental education.

Keywords: assessment; critical thinking; dental education; systematic review.

PubMed Disclaimer

Similar articles

  • A Systematic Review of the Use of Self-Assessment in Preclinical and Clinical Dental Education. Mays KA, Branch-Mays GL. Mays KA, et al. J Dent Educ. 2016 Aug;80(8):902-13. J Dent Educ. 2016. PMID: 27480701 Review.
  • The effectiveness of tools used to evaluate successful critical decision making skills for applicants to healthcare graduate educational programs: a systematic review. Benham B, Hawley D. Benham B, et al. JBI Database System Rev Implement Rep. 2015 May 15;13(4):231-75. doi: 10.11124/jbisrir-2015-2322. JBI Database System Rev Implement Rep. 2015. PMID: 26447081 Review.
  • An assessment strateqy whose time has come for documenting competency in dental education and beyond. Gadbury-Amyot CC. Gadbury-Amyot CC. J Am Coll Dent. 2010 Summer;77(2):22-6. J Am Coll Dent. 2010. PMID: 20836412
  • The effectiveness of using non-traditional teaching methods to prepare student health care professionals for the delivery of mental state examination: a systematic review. Xie H, Liu L, Wang J, Joon KE, Parasuram R, Gunasekaran J, Poh CL. Xie H, et al. JBI Database System Rev Implement Rep. 2015 Aug 14;13(7):177-212. doi: 10.11124/jbisrir-2015-2263. JBI Database System Rev Implement Rep. 2015. PMID: 26455855 Review.
  • Educational innovations for dentistry. Sweet J, Wilson J, Pugsley L. Sweet J, et al. Br Dent J. 2009 Jan 10;206(1):29-34. doi: 10.1038/sj.bdj.2008.1123. Br Dent J. 2009. PMID: 19132037
  • Comparison of the level of critical thinking skills of faculties and medical students of Ahvaz Jundishapur University of Medical Sciences, 2021. Shakurnia A, Khajeali N, Sharifinia R. Shakurnia A, et al. J Educ Health Promot. 2022 Nov 26;11:366. doi: 10.4103/jehp.jehp_1830_21. eCollection 2022. J Educ Health Promot. 2022. PMID: 36618486 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Captcha Page

We apologize for the inconvenience...

To ensure we keep this website safe, please can you confirm you are a human by ticking the box below.

If you are unable to complete the above request please contact us using the below link, providing a screenshot of your experience.

https://ioppublishing.org/contacts/

mySmowltech

critical thinking instrument

Critical thinking: definition and how to improve its skills

Critical thinking process all ideas must be open.

Recruitment

critical thinking instrument

Discover our proctoring plans

Receive an ebook on proctoring solutions. SMOWL is the most complete and customizable proctoring software.

Recent posts

The Future of HR Technology

HR technology trends: the future landscapes

Critical thinking process all ideas must be open.

Ghosting after interviews: tips and ideas to avoid it

Inclusive access to education

Proctoring and sustainability: A Winning team

Critical thinking is based on the observation and analysis of facts and evidences to return rational, skeptical and unbiased judgments.   

This type of thinking involves a series of skills that can be created but also improved, as we will see throughout this article in which we will begin by defining the concept and end with tips to build and improve the skills related to critical thinking.

What is critical thinking?

Critical thinking is a discipline based on the ability of people to observe, elucidate and analyze information, facts and evidences in order to judge or decide if it is right or wrong.

It goes beyond mere curiosity, simple knowledge or analysis of any kind of fact or information.

People who develop this type of outlook are able to logically connect ideas and defend them with weighty opinions that ultimately help them make better decisions.

Critical thinking: definition and how to improve its skills

How to build and improve critical thinking skills?

Building and improving critical thinking skills involves focusing on a number of abilities and capacities .

To begin the critical thinking process all ideas must be open and all options must be understood as much as possible.

Even the dumbest or craziest idea can end up being the gateway to the most intelligent and successful conclusion.

The problem with having an open mind is that it is the most difficult path and often involves a greater challenge and effort. It is well known that the easy thing to do is to go with the obvious and the commonly accepted but this has no place in critical thinking.

By contrast, it is helpful not to make hasty decisions and to weigh the problem in its entirety after a first moment of awareness.

Finally, practicing active listening will help you to receive feedback from others and to understand other points of view that may help you as a reference.

Impartiality

An important point in the critical thinking process is the development of the ability to identify biases and maintain an impartial view in evaluations.

To improve this aspect it is advisable to have tools to be able to identify and recognize the prejudices and biases you have and try to leave them completely aside when thinking about the solution.

Subscribe today to SMOWL’s weekly newsletter!

Discover the latest trends in eLearning, technology, and innovation, alongside experts in assessment and talent management. Stay informed about industry updates and get the information you need.

Simply fill out the form and stay up-to-date with everything relevant in our field.

Observation

Observation allows you to see each and every detail , no matter how small, subtle or inconsequential they may be or seem to be.

Behind the superficial information hides a universe of data, sources and experiences that help you make the best decision.

One of the pillars of critical thinking is objectivity. This forces you to base your value judgments on established facts that you will have gathered after a correct research process. 

At this point in the process you should also be clear about the influencing factors to be taken into account and those that can be left out.

Remember that your research is not only about gathering a good amount of information that puts the maximum number of options, variables or situations on the table. 

For the information to be of quality, it must be based on reliable and trustworthy sources.

If the information you have to collect is based on the comments and opinions of third parties, try to exercise quality control but without interference. 

To do this, ask open-ended questions that bring all the nuances to the table and at the same time serve to sift out possible biases.

How to build and improve critical thinking skills?

With the research process completed, it is time to analyze the sources and information gathered.

At this point, your analytical skills will help you to discard what does not conform to unconventional thinking, to prioritize among the information that is of value, to identify possible trends and to draw your own conclusions.

One of the skills that characterize a person with critical thinking is their ability to recognize patterns and connections between all the pieces of information they handle in their research.

This allows them to draw conclusions of great relevance on which to base their predictions with weighty foundations.

Analytical thinking is sometimes confused with critical thinking. The former only uses facts and data, while the latter incorporates other nuances such as emotions, experiences or opinions.

One of the problems with critical thinking is that it can be developed to infinity and beyond. You can always keep looking for new avenues of investigation and new lines of argument by stretching inference to limits that may not be necessary.

At this point it is important to clarify that inference is the process of drawing conclusions from initial premises or hypotheses.

Knowing when to stop the research and thinking process and move on to the next stage in which you put into practice the actions considered appropriate is necessary.

Communication

The information you collect in your research is not top secret material. On the contrary, your knowledge sharing with other people who are involved in the next steps of the process is so important.

Think that your analytical ability to extract the information and your conclusions can serve to guide others .

What is critical thinking?

Problem solving

It is important to note at this point that critical thinking can be aimed at solving a problem but can also be used to simply answer questions or even to identify areas for improvement in certain situations. 

At Smowltech, our proctoring plans help with the creation of objective, respectful and innovative exchange and evaluation spaces.Request us a free demo in which we display all the remote supervision solutions we can offer you, as personalized and detailed reports on remote activities’ progress.

Download now!

8 interesting

about proctoring

Discover everything you need about online proctoring in this book to know how to choose the best software.

Fill out the form and download the guide now.

And subscribe to the weekly SMOWL newsletter to get exclusive offers and promotions .

You will discover all the trends in eLearning, technology, innovation, and proctoring at the hands of evaluation and talent management experts .

Discover how SMOWL works

  • Register in mySmowltech indicating your LMS.
  • Check your email and follow the steps to integrate the tool.
  • Enjoy your free trial of 25 licenses.

Request a free demo with one of our experts

In addition to showing you how SMOWL works, we will guide and advise you at all times so that you can choose the plan that best suits your company or institution.

  • Copyright © 2024 all rights reserved SMOWLTECH

Write below what you are looking for

IMAGES

  1. Critical Thinking

    critical thinking instrument

  2. The instrument of critical thinking variable

    critical thinking instrument

  3. Signs of Critical Thinking Instrument

    critical thinking instrument

  4. Purposes of Critical Thinking Skills Instruments

    critical thinking instrument

  5. Table 1 from An instrument to support thinking critically about

    critical thinking instrument

  6. Signs of Critical Thinking Instrument

    critical thinking instrument

VIDEO

  1. “Exploring the Meaning of Hiem as an Instrument of Critical Thinking for the Acehnese People”

  2. How to UNSHACKLE YOUR MIND (the KEY to control your PERCEPTION)

  3. Mukti paye iss soch ke jaal se

  4. "Thinking about you" DEEP EMOTIONAL HIPHOP INSTRUMENTAL (prod. by Adrian Domeyer / STR33TBEATS) NEW

  5. How to Develop Critical Thinking Skills? Urdu / Hindi

  6. Patanjali's Yogasutra with Rajen Vakil

COMMENTS

  1. Critical Thinking Testing and Assessment

    The following instruments are available to generate evidence relevant to critical thinking teaching and learning: Course Evaluation Form : Provides evidence of whether, and to what extent, students perceive faculty as fostering critical thinking in instruction (course by course).

  2. Instruments to assess students' critical thinking—A qualitative

    Critical thinking (CT) skills are essential to academic and professional success. Instruments to assess CT often rely on multiple-choice formats with inherent problems. This research presents two instruments for assessing CT, an essay and open-ended group-discussion format, which were implemented in an undergraduate business course at a large ...

  3. Assessing Critical Thinking in Higher Education: Current State and

    Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...

  4. Frontiers

    An Approach to Performance Assessment of Critical Thinking: The iPAL Program. The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and ...

  5. Assessment of Critical Thinking

    2.1 Observing Learners in the Process of Critical Thinking. The desire for empirical assessment of competence in CT has spawned a variety of different lines of argument and assessment procedures based on them, depending on intent, tradition, and associated conceptual understanding (Jahn, 2012a). Depending on what is understood by CT and what function the assessment is supposed to have, there ...

  6. Critical Thinking > Assessment (Stanford Encyclopedia of Philosophy)

    The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students' critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level ...

  7. A Systematic Review on Instruments to Assess Critical Thinking

    Critical Thinking and Problem Solving (CTPS) are soft skills essential to be equipped among students according to 21st-century learning. Several instruments have been developed to measure CTPS ...

  8. (PDF) Development of an assessment instrument for critical thinking

    Critical thinking is a thinking skills that involves a process of reasoning and reflective thinking that cannot be directly observed, so it requires a separate assessment instrument.

  9. Development of assessment instruments to measure critical thinking skills

    Evaluation of critical thinking skills need measuring instrument that can draw perfectly the true condition. From the result of literacy study, it was gained some instruments to measure critical thinking skills on students, among others Academic Profile, California Critical Thinking Dispositions Inventory

  10. Critical Thinking Assessment Test (CAT)

    The CAT instrument is a unique tool designed to assess and promote the improvement of critical thinking and real-world problem solving skills. Most of the questions require short answer essay responses, and a detailed scoring guide helps ensure good scoring reliability. The CAT instrument is scored by the institution's own faculty using the detailed scoring guide.

  11. Development of an assessment instrument for critical thinking skills in

    The assessment instrument for critical thinking ability is a measurement tool used to test critical thinking skills based on predefined indicators. This research aims to examine the development of research on critical thinking ability instruments, including methods of development, instrument forms, indicators, analysis, and instrument validity ...

  12. Critical Thinking Inventories

    The Critical Thinking Inventories (CTIs) are short, Likert-item instruments that assess a course learning environment as it relates to critical thinking skill-building. There are two separate instruments: Learning Critical Thinking Inventory (LCTI) to be completed by students. This inventory asks students to report their perception of critical ...

  13. Development of assessment instruments to measure critical thinking skills

    Assessment instruments that is commonly used in the school generally have not been orientated on critical thinking skills. The purpose of this research is to develop assessment instruments to ...

  14. Introduction to Critical Thinking Skills

    Evaluating critical thinking. Midwest Publications. Google Scholar Groarke, L. (2009). What's wrong with the California critical thinking skills test? CT testing and accountability. In J. Sobocan & L. Groarke (Eds.), Critical thinking education and assessment: Can higher order thinking be tested? Althouse Press.

  15. Center for Assessment & Improvement of Learning

    The Critical-thinking Assessment Test (CAT) was developed with input from faculty across a wide range of institutions and disciplines, with guidance from colleagues in the cognitive/ learning sciences and assessment and with support from the National Science Foundation (NSF). ... Faculty can also use the CAT instrument as a model for ...

  16. CTS Tools for Faculty and Student Assessment

    The WGCTA-FS was found to be a reliable and valid instrument for measuring critical thinking (71). Cornell Critical Thinking Test (CCTT) There are two forms of the CCTT, X and Z. Form X is for students in grades 4-14. Form Z is for advanced and gifted high school students, undergraduate and graduate students, and adults.

  17. The development and psychometric validation of a Critical Thinking

    Highlights A Critical Thinking Disposition Scale (CTDS) was developed. Psychometric properties of the scale were tested. Two dimensions - Critical Openness and Reflective Scepticism - were identified. There was confirmatory support for the two-factor model in two different samples. Psychometric properties show the CTDS is a valid and reliable instrument.

  18. Development of assessment instruments to measure critical thinking

    The purpose of this research is to develop assessment instruments to measure critical thinking skills, to test validity, reliability, and practicality. This type of research is Research and Development. There are two stages on the preface step, which are field study and literacy study. On the development steps, there some parts, which are 1 ...

  19. Quantifying critical thinking: Development and validation of the

    The PLIC is a 10-question, closed-response assessment that probes student critical thinking skills in the context of physics experimentation. Using interviews and data from 5584 students at 29 institutions, we demonstrate, through qualitative and quantitative means, the validity and reliability of the instrument at measuring student critical ...

  20. PDF Development of Critical Thinking Skill Instruments on Mathematical ...

    MNSQ's INFIT value ranges from 0.86 - 1.14 which means that the items match the Rasch model. Outfit value for each item ranged from -1.2 - 1.7 (t ≤ 2.00) which means that all items were received. Keywords: development, instruments, critical thinking skills, critical thinking skills instruments, mathematical learning.

  21. Development and validation of a higher-order thinking skills (HOTS

    Moreover, the California Critical Thinking Disposition Inventory (CCTDI) is widely used to measure critical thinking skills, including dimensions such as seeking truth, confidence, questioning and ...

  22. The example of instrument for assessing critical thinking skills

    Download scientific diagram | The example of instrument for assessing critical thinking skills from publication: Guided Inquiry Lab: Its Effect to Improve Student's Critical Thinking on ...

  23. [PDF] An instrument to support thinking critically about critical

    The creation of an instrument for use by instructors, students, or researchers to identify, measure or promote critical thinking in online asynchronous discussions (OADs) revealed that while the instrument was valuable in identifying and measuring CT in the OAD, issues of practicality need to be addressed. This paper reports on the creation of an instrument for use by instructors, students, or ...

  24. A Systematic Review of Critical Thinking Instruments for Use ...

    Critical thinking is widely recognized as an essential competency in dental education, but there is little agreement on how it should be assessed. The aim of this systematic review was to determine the availability of instruments that could be used to measure critical thinking in dental students and to evaluate psychometric evidence to support ...

  25. Critical thinking skills assessment instrument in physics ...

    Critical thinking is a thinking skills that involves a process of reasoning and reflective thinking that cannot be directly observed, so it requires a separate assessment instrument. ... This research is a development research that aims to produce an instrument for assessing critical thinking skills that can measure students' critical thinking ...

  26. The Imperative of Critical Thinking in Higher Education

    On the other hand, critical thinking uses a dynamic and iterative cognitive effort that actively questions hypotheses, seeks evidence, and evaluates arguments. To differentiate between ordinary and critical thinking, for illustration, consider a packaged fruit juice advertisement with a persuasive message about its benefits. The persuasion in ...

  27. Critical thinking: definition and how to improve its skills

    Critical thinking is based on the observation and analysis of facts and evidences to return rational, skeptical and unbiased judgments. This type of thinking involves a series of skills that can be created but also improved, as we will see throughout this article in which we will begin by defining the concept and end with tips to build and improve the skills related to critical thinking.