Laila Issayeva, Author at Assessment Systems Smarter Online Testing & Psychometrics Mon, 23 Oct 2023 07:06:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.1 https://assess.com/wp-content/uploads/2019/06/cropped-Assess_icon_yellow-transparent-1-32x32.png Laila Issayeva, Author at Assessment Systems 32 32 Digital Badges https://assess.com/digital-badges/ Mon, 29 May 2023 04:28:25 +0000 https://assess.com/?p=54258 Digital badges (aka ebadges) have emerged in today’s digitalized world as a powerful tool for recognizing and showcasing individual’s accomplishments in an online format which is more comprehensive, immediate, brandable, and easily verifiable compared to traditional paper certificates or diplomas. In this blog post, we will delve into the world of digital badges, explore best […]

The post Digital Badges appeared first on Assessment Systems.

]]>
Digital badges (aka ebadges) have emerged in today’s digitalized world as a powerful tool for recognizing and showcasing individual’s accomplishments in an online format which is more comprehensive, immediate, brandable, and easily verifiable compared to traditional paper certificates or diplomas. In this blog post, we will delve into the world of digital badges, explore best badging practices, discuss their advantages in education, and highlight examples of badge-awarding platforms.

Digital Badges and Their Utilization

Digital badges are visual representations or icons that are applied to recognize and verify an individual’s achievement or mastery of a particular skill or knowledge area in the online space. They serve as a form of digital credentialing or micro-credentialing, providing a way to display their skills, knowledge, or accomplishments in a specific domain, helping them stand out, and gain recognition in the digital landscape.

Digital badges often contain metadata, such as the name of the issuing organization, a brief description of the achievement, criteria for earning the badge, and evidence of the individual’s accomplishment. This metadata is embedded within the badge image using a technology called Open Badges, allowing anyone to verify the authenticity and details of the badge.

Digital badges are typically issued and exhibited electronically what makes them easily shareable and accessible across various digital platforms, such as social media profiles, resumes, portfolios, or online learning platforms. Digital badges are widely used in various contexts, including education, professional development, training programs, online courses, and gamification.

Best Badging Practices

To ensure the credibility of digital badges, it is important to adhere to the most effective badging practices. Here are some key practices to consider:

  • Clearly Defined Criteria: Badges should have well-defined criteria that outline the specific skills or achievements required to earn the badge. This ensures that the badge holds value and meaning.
  • Authenticity and Verification: Badges should incorporate metadata via technologies like Open Badges, enabling easy verification of the badge’s authenticity and details. This helps maintain trust and credibility in the digital badge ecosystem.
  • Issuer Credibility: Reputable organizations, institutions, or experts in the field should issue badges. The credibility of the issuer adds value to the badge and increases its recognition.
  • Visual Design: Badges should be visually appealing and distinct, making them easily recognizable and shareable. Thoughtful design elements can enhance the badge’s appeal and encourage individuals to display them proudly.

Advantages of Using Digital Badges in Education

Digital badges offer numerous advantages in educational settings, transforming the way achievements and skills are recognized and valued. Below you may find some key advantages listed:

  • Granular Skill Recognition: Badges provide a way to recognize and demonstrate specific skills or achievements that might not be captured by traditional grades or degrees. This allows individuals to highlight their unique strengths and expertise.
  • Motivation and Engagement: Badges can act as a motivational tool, driving learners to actively pursue goals, and master new skills. The visual nature of badges and the ability to display them publicly create a sense of achievement and pride.
  • Portable and Shareable: Digital badges can be easily shared across various platforms, such as social media profiles, resumes, or online portfolios. This increases the visibility of accomplishments and facilitates networking and professional opportunities.
  • Lifelong Learning: Badges promote a culture of lifelong learning by recognizing and acknowledging continuous skill development. Individuals can earn badges for completing online courses, attending workshops, or acquiring new competencies, fostering a commitment to ongoing personal and professional growth.
  • Brand enhancement: Badges provide exposure to the brand of the issuing institution.
  • Turnaround time: Badges can often be available immediately on a profile page after the credential is awarded.
  • Verifiability: Digital badges are easily verifiable when using an appropriate platform/technology. This helps the consumer, hiring manager, or other stakeholder to determine if a person has the skills and knowledge that are relevant to the credential.

Examples of Badge-Awarding Platforms

Open badges platforms

Several platforms have emerged to facilitate the creation, issuance, and display of digital badges. Here are a few notable examples:

  • Credly is a widely used platform that allows organizations to design, issue, and manage digital badges. It also provides features for verifying badges and displaying them on various online platforms.
  • Open Badge Factory is an open-source platform that enables badge creation, management, and verification. It offers customizable badge templates and integration with various learning management systems.
  • Badgr is a platform that supports the creation and awarding of digital badges. It provides features for badge design, verification, and integration with various learning management systems (LMS) and online platforms.
  • BadgeCert is another platform which supports digital badges and can be integrated into other platforms.

 

The post Digital Badges appeared first on Assessment Systems.

]]>
The Graded Response Model – Samejima (1969) https://assess.com/graded-response-model/ Wed, 10 May 2023 14:55:05 +0000 https://assess.com/?p=53999 Samejima’s (1969) Graded Response Model (GRM, sometimes SGRM) is an extension of the two parameter logistic model (2PL) within the item response theory (IRT) paradigm.  IRT provides a number of benefits over classical test theory, especially regarding the treatment of polytomous items; learn more about IRT vs. CTT here.   What is the Graded Response […]

The post The Graded Response Model – Samejima (1969) appeared first on Assessment Systems.

]]>
Samejima’s (1969) Graded Response Model (GRM, sometimes SGRM) is an extension of the two parameter logistic model (2PL) within the item response theory (IRT) paradigm.  IRT provides a number of benefits over classical test theory, especially regarding the treatment of polytomous items; learn more about IRT vs. CTT here.

 

What is the Graded Response Model?

GRM is a family of latent trait (latent trait is a variable that is not directly measurable, e.g. a person’s level of neurosis, conscientiousness or openness) mathematical models for grading responses that was developed by Fumiko Samejima (1969) and has been utilized widely since then. GRM is also known as Ordered Categorical Responses Model as it deals with ordered polytomous categories that can relate to both constructed-response or selected-response items where examinees are supposed to obtain various levels of scores like 0-4 points. In this case, the categories are as follows: 0, 1, 2, 3, and 4; and they are ordered. ‘Ordered’ means what it says, that there is a specific order or ranking of responses. ‘Polytomous’ means that the responses are divided into more than two categories, i.e., not just correct/incorrect or true/false.

 

When should I use the GRM?

This family of models is applicable when polytomous responses to an item can be classified into more than two ordered categories (something more than correct/incorrect), such as to represent different degrees of achievement in a solution to a problem or levels of agreement , a Likert scale, or frequency to a certain statement. GRM covers both homogeneous and heterogeneous cases, while the former implies that a discriminating power underlying a thinking process is constant throughout a range of attitude or reasoning.

Samejima (1997) highlights a reasonability of employing GRM in testing occasions when examinees are scored based on correctness (e.g., incorrect, partially correct, correct) or while measuring people’s attitudes and preferences, like in Likert-scale attitude surveys (e.g., strongly agree, agree, neutral, disagree, strongly disagree). For instance, GRM can be used in an extroversion scoring model considering “I like to go to parties” as a high difficulty construction, and “I like to go out for coffee with a close friend” as an easy one.emotion scale grm

Here are some examples of assessments where GRM is utilized:

  • Survey attitude questions using responses like ‘strongly disagree, disagree, neutral, agree, strongly agree’
  • Multiple response items, such as a list of 8 animals and student selects which 3 are reptiles
  • Drag and drop or other tech enhanced items with multiple points available
  • Letter grades assigned to an essay: A, B, C, D, and E
  • Essay responses graded on a 0-to-4 rubric

 

Why to use GRM?

There are three general goals of applying GRM:

  • estimating an ability level/latent trait
  • estimating an adequacy with which test questions measure an ability level/latent trait
  • evaluating a probability that a particular test domain will receive a specific score/grade for each question

Using item response theory in general (not just the GRM) provides a host of advantages.  It can help you validate the assessment.  Using the GRM can also enable adaptive testing.

 

How to calculate a response probability with the GRM?

There is a two-step process of calculating a probability that an examinee selects a certain category in a given question. The first step is to find a probability that an examinee with a definite ability level selects a category n or greater in a given question:

GRM formula1

where

1.7  is the scale factor

a  is the discrimination of the question

bm  is a probability of choosing category n or higher

e  is the constant that approximately equals to 2.718

Θ  is the ability level

P*m(Θ) = 1  if  m = 1  since a probability of replying in the lowest category or in all the major ones is a certain event

P*m(Θ) = 0  if  m = M + 1  since a probability of replying in a category following the largest is null.

 

The second step is to find a probability that an examinee responds in a given category:

GRM formula2

This formula describes the probability of choosing a specific response to the question for each level of the ability it measures.

 

How do I implement the GRM on my assessment?

You need item response theory software.  Start by downloading Xcalibre for free.  Below are outputs for two example items.

How to interpret this?  The GRM uses category response functions which show the probability of selecting a given response as a function of theta (trait or ability).  For item 6, we see that someone of theta -3.0 to -0.5 is very likely to select “2” on the Likert scale (or whatever our response is).  Examinees above -.05 are likely to select “3” on the scale.  But on Item 10, the green curve is low and not likely to be chosen at all; examinees from -2.0 to +2.0 are likely to select “3” on the Likert scale, and those above +2.0 are likely to select “4”.  Item 6 is relatively difficult, in a sense, because no one chose “4.”

Xcalibre - graded response model easyXcalibre - graded response model difficult

References

Keller, L. A. (2014). Item Response Theory Models for Polytomous Response Data. Wiley StatsRef: Statistics Reference Online.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded coress. Psychometrika monograph supplement17(4), 2. doi:10.1002/j.2333-8504.1968.tb00153.x.

Samejima, F. (1997). Graded response model. In W. J. van der Linden and R. K. Hambleton (Eds), Handbook of Modern Item Response Theory, (pp. 85–100). Springer-Verlag.

The post The Graded Response Model – Samejima (1969) appeared first on Assessment Systems.

]]>
What ChatGPT brings us for item writing https://assess.com/chatgpt-item-writing/ Sun, 09 Apr 2023 02:50:41 +0000 https://assess.com/?p=53877 Even those ones who do not consider themselves technology buffs have heard about ChatGPT. Today, everybody is talking about ChatGPT—a modern megastar of artificial intelligence (AI) and machine learning (ML) industries. This is how ChatGPT introduces itself: “I am an AI language model called ChatGPT. My purpose is to assist and communicate with users through […]

The post What ChatGPT brings us for item writing appeared first on Assessment Systems.

]]>
Even those ones who do not consider themselves technology buffs have heard about ChatGPT. Today, everybody is talking about ChatGPT—a modern megastar of artificial intelligence (AI) and machine learning (ML) industries. This is how ChatGPT introduces itself:

“I am an AI language model called ChatGPT. My purpose is to assist and communicate with users through text-based conversations, and to provide helpful and informative responses to their questions and inquiries.”

I realized its sense straight away but still decided to interact with ChatGPT more closely while working on item writing for a Math test. I opened this link and initiated a new conversation—stupid simple. Below I am sharing my observations of ChatGPT features, classifying them as its strengths and imperfections.  I will then discuss my experience in item writing with ChatGPT.

 

Strengths of ChatGPT

Why be a psychometrician chatgpt item writing

ChatGPT is an intelligent chatbot that

  • is polite,
  • declines inadequate requests,
  • generates responses very quickly,
  • allows user to provide follow-up adjustments,
  • remembers what user said earlier in the conversation,
  • can significantly reduce time for item writing to subject matter experts.

 

Imperfections of ChatGPT

General ones

  • Free version of ChatGPT has a restricted number of symbols produced (around 2,000). Therefore, if you need to construct a test, like me, you need to task the chatbot in portions or keep asking it to finalize the assignment that you have given. Thus, your interaction with ChatGPT will be more efficient if you plan your assignment better.
  • ChatGPT has limited knowledge of the world after 2021. You need to consider this fact when designing a prompt.
  • ChatGPT can generate inaccurate responses to questions. You may find several examples below.

 

Math Item Writing

ChatGPT’s Math skills are not excellent. Below I have outlined some bright examples:

  • ChatGPT “thinks” that 11/2 equals to 2 and 6/5 equals to 1;
  • ChatGPT does not realize that one root cannot be a solution of a quadratic equation that has two different roots while it lists both roots among answer options;
  • ChatGPT makes mistakes when naming a correct response out of a few options generated;
  • ChatGPT makes distractors (incorrect answers in multiple choice questions) up without considering possible mistakes that test takers can make;
  • ChatGPT provides a correct solution to a question but does not include it into answer options;
  • ChatGPT cannot differentiate between levels of difficulty even if it knows the sample from the prompt.

 

Let’s have a look at a couple of Math questions written by ChatGPT according to my prompts.

Prompt

Response

Comments

ChatGPT, please, write 1 multiple-choice Math question of low difficulty for 16-17 year old high school American students for the following topic in Algebra: Linear equations in two variables. Multiple-choice questions must have only one correct answer (key) and three incorrect answers (distractors) based on test takers’ incorrect reasoning. Please, provide an explanation to each distractor and a solution to the question.

 

Which point satisfies the equation y = -2x + 7?

A) (5, 1)

B) (-1, 5)

C) (2, 5)

D) (1, -5)

 

Explanation:

To find a point that satisfies the equation, we can plug in values for x and y that make the equation true. For example, if we let x = 1, then y = -2(1) + 7 = 5. So, the point (1, 5) satisfies the equation.

ChatGPT did not include a correct response into answer options even though it solved this question correctly.
ChatGPT, please, write 1 student-produced response Math question of medium difficulty for 16-17 year old high school American students for the following topic in Algebra: Systems of two linear equations in two variables. Student-produced response questions must have only one correct answer as a real number. Please, provide a solution to this question. Solve the system of equations:

2x + 3y = 8

x – y = 1.

 

Explanation: To solve the system of equations, we can use either substitution or elimination method. Using substitution, we can solve one equation for one variable and substitute it into the other equation. For example, we can solve the second equation for x and get x = y + 1. Then, we substitute x = y + 1 into the first equation and get 2(y + 1) + 3y = 8. Simplifying this equation gives us 5y + 2 = 8, which gives us y = 1. Substituting y = 1 into x = y + 1 gives us x = 2. Therefore, the solution to the system of equations is (2, 1).

Solving 5y + 2 = 8 does not give us y = 1 but y = 6/5 or 1.2. Therefore, the point (2, 1) is not a correct solution of this system of equations.

 

English Item Writing

ChatGPT’s English skills are not as perfect as I thought they would be since it is an English native speaker (I suppose). I have had some experience with ChatGPT making it design several tasks to test students’ English skills. As expected, ChatGPT is more productive with chunks of text. I asked it to make some Reading and Writing questions for SAT and IELTS, and the results were much better than those for Math. However, there were some shortcomings in terms of grammar errors and poor lexical choice. For instance, ChatGPT suggested option A as a correct response that is arguable:

Which of the following options best replaces the underlined phrase in the sentence: “The company’s profits have increased by 20% over the last quarter.”

A) The company’s profits have risen by 20% over the last quarter.

B) The company’s profits have grown by 20% over the last quarter.

C) The company’s profits have expanded by 20% over the last quarter.

D) The company’s profits have developed by 20% over the last quarter.

 

Russian Item Writing

I was curious how proficient ChatGPT was in Russian. Well, it knows how many letters the Russian alphabet contains and some other facts, but when I asked ChatGPT to write a question to check some grammar, my expectations were betrayed: ChatGPT gave a wrong answer when asked to identify a gerund.

Какое из перечисленных слов является деепричастием?

A) Зеленеющий

B) Шумно

C) Летящий

D) Бежавший

Правильный ответ: C) Летящий – это деепричастие настоящего времени. Ошибочные ответы: A) – прилагательное, B) – наречие, D) – причастие прошедшего времени.

 

Summary

 

Since AI-enabled tools do not have real brains and only regurgitate information based on what they “learned” by interacting with billions of data points, it is fair enough that there will be some inaccuracies in their responses even though they will look human-like.

In its own introduction, ChatGPT announced straight away that it was intended to ASSIST us, humans, not to DO things instead of us. ChatGPT can provide helpful responses but is not capable of independent thought or emotions. Like any AI, ChatGPT is vulnerable to multiple issues, including bias and potential inaccuracies.

I would like to re-emphasize once again my point that I outlined in the previous post, that any AI-powered tool no matter how “good” it is needs a smart user to manipulate it. Otherwise, it is going to be a complete disaster! However, if users provide ChatGPT with a very well structured question and the right guidance, there will be a high chance of obtaining an accurate response.

One more thing to consider is that Math is a way of thinking. Therefore, I would not expect AI-powered chatbots to be super-efficient in this field, at least in the near future.

Another important consideration is the legal situation. Current opinion in the field is this: Copyright laws are designed to protect creative development of new things by humans, but if ChatGPT is doing the heavy lifting, then it is not human-created and therefore not fully protected by copyright law. If the items are stolen, you might not have legal recourse. However, this topic is, of course, quite new and continues to evolve.

In the end, is it worth it to use ChatGPT for item writing? Absolutely. Even though there are errors which are quickly found, such as having two correct answers to a question or not having a key specified, these sorts of things are easily fixed. The average time to develop new items can be significantly reduced. Moreover, ChatGPT will continue to get even more powerful! So, stay tuned!

 

The post What ChatGPT brings us for item writing appeared first on Assessment Systems.

]]>
AI in Education https://assess.com/ai-in-education/ Sun, 19 Mar 2023 14:02:00 +0000 https://assess.com/?p=53788 Artificial intelligence (AI) is poised to address some challenges that education deals with today, through innovation of teaching and learning processes. By applying AI in education technologies, educators can determine student needs more precisely, keep students more engaged, improve learning, and adapt teaching accordingly to boost learning outcomes. A process of utilizing AI in education […]

The post AI in Education appeared first on Assessment Systems.

]]>
Artificial intelligence (AI) is poised to address some challenges that education deals with today, through innovation of teaching and learning processes. By applying AI in education technologies, educators can determine student needs more precisely, keep students more engaged, improve learning, and adapt teaching accordingly to boost learning outcomes. A process of utilizing AI in education started off from looking for a substitute for one-on-one tutoring in the 1970s and has been witnessing multiple improvements since then. This article will look at some of the latest AI developments used in education, their potential impact, and drawbacks they possess.

 

Application of AI

AI robot - AI in Education

Recently, a helping hand of AI technologies has permeated into all aspects of educational process. The research that has been going since 2009 shows that AI has been extensively employed in managing, instructing, and learning sectors. In management, AI tools are used to review and grade student assignments, sometimes they operate even more accurately than educators do. There are some AI-based interactive tools that teachers apply to build and share student knowledge. Learning can be enhanced through customization and personalization of content enabled by new technological systems that leverage machine learning (ML) and adaptability.

Below you may find a list of major educational areas where AI technologies are actively involved and that are worthy of being further developed.

Personalized learning This educational approach tailors learning trajectory to individual student needs and interests. AI algorithms analyze student information (e.g. learning style and performance) to create customized learning paths. Based on student weaknesses and strengths, AI recommends exercises and learning materials.
Adaptive learning This approach does the same as personalized learning but in real-time stimulating learners to be engaged and motivated. ALEKS is a good example of an adaptive learning program.
Learning courses These are AI-powered online platforms that are designed for eLearning and course management, and enable learners to browse for specific courses and study with their own speed. These platforms offer learning activities in an increasing order of their difficulty aiming at ultimate educational goals. For instance, advanced Learning Management Systems (LMS) and Massive Open Online Courses (MOOCs).
Learning assistants/Teaching robots AI-based assistants can supply support and resources to learners upon request. They can respond to questions, provide personalized feedback, and guide students through learning content. Such virtual assistants might be especially helpful for learners who cannot access offline support.
Adaptive testing This mode of delivering tests means that each examinee will get to respond to specific questions that correspond to their level of expertise based on their previous responses. It is possible due to AI algorithms enabled by ML and psychometric methods, i.e. item response theory (IRT). You can get more information about adaptive testing from Nathan Thompson’s blog post.
Remote proctoring It is a type of software that allows examiners to coordinate an assessment process remotely whilst keeping confidentiality and preventing examinees from cheating. In addition, there can be a virtual proctor who can assist examinees in resolving any issues arisen during the process. The functionality of proctoring software can differ substantially depending on the stakes of exams and preferences of stakeholders. You can read more on this topic from the ASC’s blog here.
Test assembly Automated test assembly (ATA) is a widely used valid and efficient method of test construction based on either classical test theory (CTT) or item response theory (IRT). ATA lets you assemble test forms that are equivalent in terms of content distribution and psychometric statistics in seconds. ASC has designed TestAssembler to minimize a laborious and time-consuming process of form building.
Automated grading Grading student assignments is one of the biggest challenges that educators face. AI-powered grading systems automate this routine work reducing bias and inconsistencies in assessment results and increasing validity. ASC has developed an AI essay scoring system—SmartMarq. If you are interested in automated essay scoring, you should definitely read this post.
Item generation There are often cases when teachers are asked to write a bunch of items for assessment purposes, as if they are not busy with lesson planning and other drudgery. Automated item generation is very helpful in terms of time saving and producing quality items.
Search engine The time of libraries has sunk into oblivion, so now we mostly deal with huge search engines that have been constructed to carry out web searches. AI-powered search engines help us find an abundance of information; search results heavily depend on how we formulate our queries, choose keywords, and navigate between different sites. One of the biggest search engines so far is Google.
Chatbot Last but not least… Chatbots are software applications that employ AI and natural language processing (NLP) to make humanized conversations with people. AI-powered chatbots can provide learners with additional personalized support and resources. ChatGPT can truly be considered as the brightest example of a chatbot today.

 

Highlights of AI and challenges to address

ai chatbot - AI in Education

Today AI-powered functions revolutionize education, just to name a few: speech recognition, NLP, and emotion detection. AI technologies enable identifying patterns, building algorithms, presenting knowledge, sensing, making and following plans, maintaining true-to-life interactions with people, managing complex learning activities, magnifying human abilities in learning contexts, and supporting learners in accordance with their individual interests and needs. AI allows students to use handwriting, gestures or speech as input while studying or taking a test.

Along with numerous opportunities, AI-evolution brings some risks and challenges that should be profoundly investigated and addressed. While approaching utilization of AI in education, it is important to keep caution and consideration to make sure that it is done in a responsible and ethical way, and not to get caught up in the mainstream since some AI tools consult billions of data available to everyone on the web. Another challenge associated with AI is a variability in its performance: some functions are performed on a superior level (such as identifying patterns in data) but some of them are quite primitive (such as inability to support an in-depth conversation). Even though AI is very powerful, human beings still play a crucial role in verifying AI’s output to avoid plagiarism and falsification of information.

 

Conclusion

AI is already massively applied in education around the world. With the right guidance and frameworks in place, AI-powered technologies can help build more efficient and equitable learning experiences. Today we have an opportunity to witness how AI- and ML-based approaches contribute to development of individualized, personalized, and adaptive learning.

ASC’s CEO, Dr Thompson, presented several topics on AI at the 2023 ATP Conference in Dallas, TX. If you are interested in utilizing AI-powered services provided by ASC, please do not hesitate to contact us!

 

References

Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: A guidance for policymakers. UNESCO.

Niemi, H., Pea, R. D., & Lu, Y. (Eds.). (2022). AI in Learning: Designing the Future. Springer. https://doi.org/10.1007/978-3-031-09687-7

 

The post AI in Education appeared first on Assessment Systems.

]]>
Gamification in Learning & Assessment https://assess.com/gamification-in-learning-assessment/ Mon, 13 Feb 2023 18:01:10 +0000 https://assess.com/?p=52936 Gamification in assessment and psychometrics presents new opportunities for ways to improve the quality of exams. While the majority of adults perceive games with caution because of their detrimental effect on youngsters’ minds causing addiction, they can be extremely beneficial for learning and assessment if employed thoughtfully. Gamification does not only provide learners with multiple […]

The post Gamification in Learning & Assessment appeared first on Assessment Systems.

]]>
Gamification in assessment and psychometrics presents new opportunities for ways to improve the quality of exams. While the majority of adults perceive games with caution because of their detrimental effect on youngsters’ minds causing addiction, they can be extremely beneficial for learning and assessment if employed thoughtfully. Gamification does not only provide learners with multiple opportunities to learn in context, but also is instrumental in developing digital literacy skills that are highly necessary in modern times.

What is Gamification?

Gamification means that elements of games, such as point-scoring, team collaboration, competition, and prizes) are incorporated into processes that would not otherwise have them. For example, a software for managing a Sales team might incorporate points for the number of phone calls and emails, splitting the team into two “teams” to compete against each other on those points, and winning a prize at the end of the month. Such ideas can also be incorporated into learning and assessment. A student might get points for each module they complete correctly, and a badge for each test they pass to show mastery of a skill, which are then displayed on their profile in the learning system.

Gamification equals motivation?student exam help

It is a fact that learning is much more effective when learners are motivated. What can motivate learners, you might ask? Engagement comes first—that is the core of learning. Engaged learners grasp knowledge because they are interested in the learning process, the material itself, and they are curious about discovering more. In-contrast, unengaged learners wait when a lesson ends.

A traditional educational process usually involves several lessons where students learn one unit, and at the end of this unit, they take a cumulative test that gauges their level of acquisition. This model usually provides minimum of context for learning throughout the unit, so learners are supposed just to learn and memorize things unless they are given a chance to succeed or fail on the test.

Gamification can change this approach. When lessons and tests are gamified, learners obtain an opportunity to learn in context and use their associations and imagination—they become participants of the process, not just executors of instructions.

Gamification: challenges and ways to overcome them

While gamified learning and assessment are very efficacious, they might be challenging for educators in terms of development and implementation. Below you may check some challenges and how they can be tackled.

Challenge

Solution

More work Interactive lessons containing gamified elements demand more time and effort from educators, which is why overwhelmed with other obligations many of them give up and keep up with traditional style of teaching. However, if the whole team sets up the planning and preparations prior to starting a new unit, then there will be less work and less stress, respectively.
Preparation Gamified learning and assessment can be difficult for educators lacking creativity or not having any experience. Senior managers, like heads of departments, should take a leading position here: organize some courses and support their staff.
Distraction When developing gamified learning or assessment, it is important not to get distracted with fancy stuff and keep focused on the targeted learning objectives.
Individual needs Gamified learning and assessment cannot be unified, so educators will have to customize their materials to meet learner needs.

Gamified assessment

Psychometric tests have been evolving over time to provide more benefits to educators and learners, employers and candidates, and other stakeholders. Gamification is the next stage in the evolutionary process after having gained positive feedback from scientists and practitioners.

Gamified assessment is applied by human resources departments in the hiring process like psychometric tests evaluating candidate’s knowledge and skills. However, game-based assessment is quicker and more engaging than aptitude tests due to its user-friendly and interactive format. The latter features are also true for computerized adaptive testing (CAT), and I believe that these two can be complemented by each other to double the benefits provided.

There are several ways to incorporate gamification into assessment. Here are some ideas, but this is by no means exhaustive.

Aspect

Example

High fidelity items and/or assignments Instead of multiple choice items to ask about a task (e.g., operating a construction crane), create a simulation that is similar to a game.
Badging Candidates win badges for passing exams, which can be displayed places like their LinkedIn profile or email signature.
Points Obviously, most tests have “points” as part of the exam score, but it can be used in other ways, such as how many modules/quizzes you pass per month.
Teams Subdivide a class or other group into teams, and have them compete on other aspects.

Analyzing my personal experience, I remember how I used kahoot.it tool on my Math classes to interact with students and make them more engaged in the formative assessment activities. Students were highly motivated to take such tests because they were rewarding—it felt like competition and sometimes they got sweets. It was fun!

Summary

Obviously, gamified learning and assessment require more time and effort from creators than traditional non-gamified ones, but they are worthy. Both educators and learners are likely to benefit from this experience in different ways. If you are ready to apply gamified assessment by employing CAT technologies, our experts are ready to help. Contact us!

 

The post Gamification in Learning & Assessment appeared first on Assessment Systems.

]]>
Meta-analysis and Test Validation in Psychological Measurement https://assess.com/meta-analysis/ Tue, 17 Jan 2023 17:42:04 +0000 https://assess.com/?p=48514 Meta-analysis is a research process of collating data from multiple independent but similar scientific studies in order to identify common trends and findings by means of statistical methods. To put it simply, it is a method where you can accumulate all of your research findings and analyze them statistically. It is often used in psychometrics […]

The post Meta-analysis and Test Validation in Psychological Measurement appeared first on Assessment Systems.

]]>
Meta-analysis is a research process of collating data from multiple independent but similar scientific studies in order to identify common trends and findings by means of statistical methods. To put it simply, it is a method where you can accumulate all of your research findings and analyze them statistically. It is often used in psychometrics and industrial-organizational psychology to help validate assessments. Meta-analysis not only serves as a summary of a research question but also provides a quantitative evaluation of the relationship between two variables or the effectiveness of an experiment. It can also work for examining theoretical assumptions that compete with each other.

Background of Meta-Analysis

An American statistician and researcher, Gene Glass, devised the term ‘meta-analysis’ in 1976. He called so the statistical analysis of a large amount of data from individual studies in order to integrate the findings. Medical researchers began employing meta-analysis a few years later. One of the first influential applications of this method was when Elwood and Cochrane used meta-analysis to examine the effect of aspirin on reducing recurrences of heart attacks.

teamwork

Purpose of Meta-Analysis

In general, meta-analysis is aimed at two things:

  • to establish whether a study has an effect and to determine whether it is positive or negative,
  • to analyze the results of previously conducted studies to find out common trends.

Performing Meta-Analysis

Even though there could be various ways of conducting meta-analysis depending on the research purpose and field, there are eight major steps:

  1. Set a research question and propose a hypothesis
  2. Conduct a systematic review of the relevant studies
  3. Extract data from the studies to include into the meta-analysis considering sample sizes and data variability measures for intervention and control groups (the control group is under observation whilst the intervention group is under experiment)
  4. Calculate summary measures, called effect sizes (the difference in average values between intervention and control groups), and standardize
    estimates if necessary for making comparisons between the groups
  5. Choose a meta-analytical method: quantitative (traditional univariate meta-analysis, meta-regression, meta-analytic structural equation modeling) or qualitative
  6. Pick up the software depending on the complexity of the methods used and the dataset (e.g. templates for Microsoft Excel, Stata, SPSS, SAS, R, Comprehensive Meta-Analysis, RevMan), and code the effect sizes
  7. Do analyses by employing an appropriate model for comparing effect sizes using fixed effects (assumes that all observations share a common mean effect size) or random effects (assumes heterogeneity and allows for a variation of the true effect sizes across observations)
  8. Synthesize results and report them

Prior to making any conclusions and reporting results, it would be helpful to use the checklist suggested by DeSimone et al. (2021) to ensure that all crucial aspects of the meta-analysis have been addressed in your study.

Meta-Analysis in Assessment and Psychometrics: Test Validation and Validity Generalization

Due to its versatility, meta-analysis is used in various fields of research, in particular as a test validation strategy in psychology and psychometrics. The most common situation to apply meta-analysis is validating the use of tests in workplace in the field of personnel psychology and pre-employment testing. The classic example of such application is the work done by Schmidt and Hunter (1998) who analyzed 85 years of research on what best predicts job performance. This is one of the most important articles in that topic. It has been recently updated by Sackett et al. (2021) with slightly different results.

How is meta-analysis applied to such a situation?  Well, start be reconceptualizing a “sample” as a set of studies, not a set of people. So let’s say we find 100 studies that use pre-employment tests to select examinees by predicting job performance (obviously, there are far more). Because most studies use more than one test, there might be 77 that use a general cognitive ability test, 63 that use a conscientiousness assessment, 24 that use a situational judgment test, etc. We look at the correlation coefficients reported for those first 77 studies and find that the average is 0.51, while the average correlation for conscientiousness is 0.44 and for SJTs is 0.39. You can see how this is extremely useful in a practical sense, as a practitioner that might be tasked with selecting an assessment battery!

Meta-analysis studies will often go further and clean up the results, by tossing studies with poor methodology or skewed samples, and applying corrections for things like range restriction and unreliability. This enhances the validity of the overall results. To see such an example, visit the Sackett et al. (2021) article.

Such research has led to the concept of validity generalization. This suggests that if a test has been validated for many uses, or similar uses, you can consider it validated for your particular use without having to do a validation study. For example, if you are selecting clerical workers and you can see that there are literally hundreds of studies which show that numeracy or quantitative tests will predict job performance, there is no need for you to do ANOTHER study. If challenged, you can just point to the hundreds of studies already done. Obviously, this is a reasonable argument, but you should not take it too far, i.e., generalize too much.

Conclusion

As you might have understood so far, conducting meta-analysis is not a piece of cake. However, it is very efficient when the researcher intends to evaluate effects in diverse participants, set another hypothesis creating a precedence for future research studies, demonstrate statistical significance or surmount the issue of a small sample size in research.

References

Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2021). Introduction to meta-analysis. John Wiley & Sons.

DeSimone, J. A., Brannick, M. T., O’Boyle, E. H., & Ryu, J. W. (2021). Recommendations for reviewing meta-analyses in organizational research. Organizational Research Methods24(4), 694-717.

Field, A. P., & Gillett, R. (2010). How to do a meta‐analysis. British Journal of Mathematical and Statistical Psychology63(3), 665-694.

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational researcher5(10), 3-8.

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Sage Publications.

Gurevitch, J., Koricheva, J., Nakagawa, S., & Stewart, G. (2018). Meta-analysis and the science of research synthesis. Nature555(7695), 175-182.

Hansen, C., Steinmetz, H., & Block, J. (2022). How to conduct a meta-analysis in eight steps: a practical guide. Management Review Quarterly72(1), 1-19.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Academic Press.

Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Sage Publications.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Sage.

Peto, R., & Parish, S. (1980). Aspirin after myocardial infarction. Lancet1(8179), 1172-1173.

Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2021). Revisiting meta-analytic estimates of validity in personnel selection: addressing systematic overcorrection for restriction of range. Journal of Applied Psychology.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: practical and theoretical implications of 85 years of research findings. Psychological bulletin124(2), 262.

 

The post Meta-analysis and Test Validation in Psychological Measurement appeared first on Assessment Systems.

]]>
Test validation https://assess.com/test-validation/ Tue, 27 Dec 2022 19:08:27 +0000 https://assess.com/?p=47425 Test validation is the process of verifying whether the specific requirements to test development stages are fulfilled or not, based on solid evidence. In particular, test validation is an ongoing process of developing an argument that a specific test, its score interpretation or use is valid. The interpretation and use of testing data should be […]

The post Test validation appeared first on Assessment Systems.

]]>
Test validation is the process of verifying whether the specific requirements to test development stages are fulfilled or not, based on solid evidence. In particular, test validation is an ongoing process of developing an argument that a specific test, its score interpretation or use is valid. The interpretation and use of testing data should be validated in terms of content, substantive, structural, external, generalizability, and consequential aspects of construct validity (Messick, 1994). Validity is the status of an argument that can be positive or negative: positive evidence supports and negative evidence weakens the validity argument, accordingly. Validity cannot be absolute and can be judged only in degrees. American Educational Research Association [AERA], American Psychological Association [APA], and National Council on Measurement in Education [NCME] (1999) claim that validity is crucial for educational and psychological test development and evaluation.

Validation as part of test development

To be effective, test development has to be structured, systematic, and detail-oriented. These features can guarantee sufficient validity evidence supporting inferences proposed by test scores obtained via assessment. Downing (2006) suggested a twelve-step framework for the effective test development:

  1. Overall plan
  2. Content definition
  3. Test blueprint
  4. Item development
  5. Test design and assembly
  6. Test production
  7. Test administration
  8. Scoring test responses
  9. Standard setting
  10. Reporting test results
  11. Item bank management
  12. Technical report

Even though this framework is outlined as a sequential timeline, in practice some of these steps may occur simultaneously or may be ordered differently. A starting point of the test development – the purpose – defines the planned test and regulates almost all validity-related activities. Each step of the test development process focuses on its crucial aspect – validation.

Hypothetically, an excellent performance of all steps can ensure a test validity, i.e. the produced test would estimate examinee ability fairly within the content area to be measured by this test. However, human factor involved in the test production might play a negative role, so there is an essential need for the test validation.

Reasons for test validation

There are myriads of possible reasons that can lead to the invalidation of test score interpretation or use. Let us consider some obvious issues that potentially jeopardize test validation and are subject to validation:

  • overall plan: wrong choice of a psychometric model;
  • content definition: content domain is ill defined;
  • test blueprint: test blueprint does not specify an exact sampling plan for the content domain;
  • item development: items measure content at an inappropriate cognitive level;
  • test design and assembly: unequal booklets;
  • test administration: cheating;
  • scoring test responses: inconsistent scoring among examiners;
  • standard setting: unsuitable method of establishing passing scores;
  • item bank management: inaccurate updating of item parameters.

Context for test validation

All tests have common types of validity evidence that is purported, e.g. reliability, comparability, equating, and item quality. However, tests can vary in terms of a quantity of constructs measured (single, multiple) and can have different purposes which call for the unique types of test validation evidence. In general, there are several major types of tests:

  • Admissions tests (e.g., SAT, ACT, and GRE)
  • Credentialing tests (e.g., a live-patient examination for a dentist before licensing)
  • Large-scale achievement tests (e.g., Stanford Achievement Test, Iowa Test of Basic Skills, and TerraNova)
  • Pre-employment tests
  • Medical or psychological
  • Language

The main idea is that the type of test usually defines a unique validation agenda that focuses on appropriate types of validity evidence and issues that are challenged in that type of test.

Categorization of test validation studies

Since there are multiple precedents for the test score invalidation, there are many categories of test validation studies that can be applied to validate test results. In our post, we will look at the categorization suggested by Haladyna (2011):

Category 1: Test Validation Studies Specific to a Testing Program

Subcategory of a study

Focus of a study

    1. Studies That Provide Validity Evidence in Support of the Claim for a Test Score Interpretation or Use
  • Content analysis
  • Item analysis
  • Standard setting
  • Equating
  • Reliability
    2. Studies That Threaten a Test Score Interpretation of Use
  • Cheating
  • Scoring errors
  • Student motivation
  • Unethical test preparation
  • Inappropriate test administration
    3. Studies That Address Other Problems That Threaten Test Score Interpretation or Use
  • Drop in reliability
  • Drift in item parameters over time
  • Redesign of a published test
  • Possible security problem

Category 2: Test Validation Studies That Apply to More Than One Testing Program

    Studies that lead to the establishment of concepts, principles, or procedures that guide, inform, or improve test development or scoring
  • Introducing a concept
  • Introducing a principle
  • Introducing a procedure
  • Studying a pervasive problem

Summary

Even though test development is a longitudinal laborious process, test creators have to be extremely accurate while executing their obligations within each activity. The crown of this process is obtaining valid and reliable test scores, and their adequate interpretation and use. The higher the stakes or consequences of the test scores, the greater attention should be paid to the test validity, and, therefore, to the test validation. The latter one is emphasized by integrating all reliable sources of evidence to strengthen the argument for test score interpretation and use.

References

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. American Educational Research Association.

Downing, S. M. (2011). Twelve steps for effective test development. In. S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-25). Lawrence Erlbaum Associates.

Haladyna, T. M. (2011). Roles and importance of validity studies in test development. In. S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 739-755). Lawrence Erlbaum Associates.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational researcher23(2), 13-23.

 

The post Test validation appeared first on Assessment Systems.

]]>
Maximum Likelihood Estimation https://assess.com/maximum-likelihood-estimation/ Sun, 18 Dec 2022 20:42:35 +0000 https://assess.com/?p=47276 Maximum Likelihood Estimation (MLE) is an approach to estimating parameters for a model.  It is one of the core aspects of Item Response Theory (IRT), especially to estimate item parameters (analyze questions) and estimate person parameters (scoring).  This article will provide an introduction to the concepts of MLE. Content History behind Maximum Likelihood Estimation Defining […]

The post Maximum Likelihood Estimation appeared first on Assessment Systems.

]]>
Maximum Likelihood Estimation (MLE) is an approach to estimating parameters for a model.  It is one of the core aspects of Item Response Theory (IRT), especially to estimate item parameters (analyze questions) and estimate person parameters (scoring).  This article will provide an introduction to the concepts of MLE.

Content

  1. History behind Maximum Likelihood Estimation
  2. Defining Maximum Likelihood Estimation
  3. Comparison of likelihood and probability
  4. Key characteristics of Maximum Likelihood Estimation
  5. Weaknesses of Maximum Likelihood Estimation
  6. Application of Maximum Likelihood Estimation
  7. Summarizing remarks about Maximum Likelihood Estimation
  8. References

History behind Maximum Likelihood Estimation

Even though early ideas about MLE appeared in the mid-1700s, Sir Ronald Aylmer Fisher developed them into a more formalized concept much later. Fisher was working seminally on maximum likelihood from 1912 to 1922, criticizing himself and producing several justifications. In 1925, he finally published “Statistical Methods for Research Workers”, one of the 20th century’s most influential books on statistical methods. In general, the production of maximum likelihood concept has been a breakthrough in Statistics.

Defining Maximum Likelihood Estimation

Wikipedia defines MLE as follows:

Likelihood EstimationIn statistics, Maximum Likelihood Estimation is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.

Merriam Webster has a slightly different definition for MLE:

A statistical method for estimating population parameters (as the mean and variance) from sample data that selects as estimates those parameter values maximizing the probability of obtaining the observed data.

To sum up, MLE is a method that detects parameter values of a model. These parameter values are identified such that they maximize the likelihood that the process designed by the model produced the data that were actually observed. To put it simply, MLE answers the question:

For which parameter value does the observed data have the biggest probability?

Comparison of likelihood and probability

The definitions above contain “probability” but it is important not to mix these two different concepts. Let us look at some differences between likelihood and probability, so that you could differentiate between them.

Likelihood

Probability

Refers to the occurred events with known outcomes Refers to the events that will occur in the future
Likelihoods do not add up to 1 Probabilities add up to 1
Example 1: I flipped a coin 20 times and obtained 20 heads. What is the likelihood that the coin is fair? Example 1: I flipped a coin 20 times. What is the probability of the coin to land heads or tails every time?
Example 2: Given the fixed outcomes (data), what is the likelihood of different parameter values? Example 2: The fixed parameter P = 0.5 is given. What is the probability of different outcomes?

 

Calculating Maximum Likelihood Estimation

MLE can be calculated as a derivative of a log-likelihood in relation to each parameter, the mean μ and the variance σ2, that is equated to 0. There are four general steps in estimating the parameters:

  • Call for a distribution of the observed data
  • Estimate distribution’s parameters using log-likelihood
  • Paste estimated parameters into a distribution’s probability function
  • Evaluate the distribution of the observed data

 

Key characteristics of Maximum Likelihood Estimation

  • MLE operates with one-dimensional data
  • MLE uses only “clean” data (e.g. no outliers)
  • MLE is usually computationally manageable
  • MLE is often real-time on modern computers
  • MLE works well for simple cases (e.g. binomial distribution)

 

Weaknesses of Maximum Likelihood Estimation

  • MLE is sensitive to outliers
  • MLE often demands optimization for speed and memory to obtain useful results
  • MLE is sometimes poor at differentiating between models with similar distributions
  • MLE can be technically challenging, especially for multidimensional data and complex models

 

Application of Maximum Likelihood Estimation

In order to apply MLE, two important assumptions (typically referred to as the i.i.d. assumption) need to be made:

  • Data must be independently distributed, i.e. the observation of any given data point does not depend on the observation of any other data point (each data point is an independent experiment)
  • Data must be identically distributed, i.e. each data point is generated from the same distribution family with the same parameters

Let us consider several world-known applications of MLE:

  • Global Positioning System (GPS)
  • Smart keyboard programs for iOS and Android operating systems (e.g. Swype)
  • Speech recognition programs (e.g. Carnegie Mellon open source SPHINX speech recognizer, Dragon Naturally Speaking)
  • Detection and measurement of the properties of the Higgs Boson at the European Organization for Nuclear Research (CERN) by means of the Large Hadron Collider (Francois Englert and Peter Higgs were awarded the Nobel Prize in Physics in 2013 for the theory of Higgs Boson)

Generally speaking, MLE is employed in agriculture, economics, finance, physics, medicine and many other fields.

 

Summarizing remarks about Maximum Likelihood Estimation

Despite some functional issues with MLE such as technical challenges for multidimensional data and complex multiparameter models that interfere solving many real world problems, MLE remains a powerful and widely used statistical approach for classification and parameter estimation. MLE has brought many successes to the mankind in both scientific and commercial worlds.

 

References

Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science12(3), 162-176.

Stigler, S. M. (2007). The epic story of maximum likelihood. Statistical Science, 598-620.

The post Maximum Likelihood Estimation appeared first on Assessment Systems.

]]>
Multidimensional Item Response Theory https://assess.com/multidimensional-item-response-theory/ Sun, 18 Dec 2022 20:34:11 +0000 https://assess.com/?p=46077 Multidimensional item response theory (MIRT) has been developing from its Factor Analytic and unidimensional item response theory (IRT) roots. This development has led to an increased emphasis on precise modeling of item-examinee interaction and a decreased emphasis on data reduction and simplification. MIRT represents a broad family of probabilistic models designed to portray an examinee’s […]

The post Multidimensional Item Response Theory appeared first on Assessment Systems.

]]>
Multidimensional item response theory (MIRT) has been developing from its Factor Analytic and unidimensional item response theory (IRT) roots. This development has led to an increased emphasis on precise modeling of item-examinee interaction and a decreased emphasis on data reduction and simplification. MIRT represents a broad family of probabilistic models designed to portray an examinee’s likelihood of a correct response based on item parameters and multiple latent traits/dimensions. The MIRT models determine a compound multidimensional space to describe individual differences in the targeted dimensions.

Within MIRT framework, items are treated as fundamental units of test construction. Furthermore, items are considered as multidimensional trials to obtain valid and reliable information about examinee’s location in a complex space. This philosophy extends the work from unidimensional IRT to provide a more comprehensive description of item parameters and how the information from items combines to depict examinees’ characteristics. Therefore, items need to be crafted mindfully to be sufficiently sensitive to the targeted combinations of knowledge and skills, and then be carefully opted to help improve estimates of examinee’s characteristics in the multidimensional space.

Trigger for development of Multidimensional Item Response Theory

In modern psychometrics, IRT is employed for calibrating items belonging to individual scales so that each dimension is regarded as unidimensional. According to IRT models, an examinee’s response to an item depends solely on the item parameters and on the examinee’s single parameter, that is the latent trait θ. Unidimensional IRT models are advantageous in terms of operating with quite simple mathematical forms, having various fields of application, and being somewhat robust to violating assumptions.

However, there is a high probability that real interactions between examinees and items are far more jumbled than these IRT models imply. It is likely that responding to a specific item requires examinees to apply plentiful abilities and skills, especially in the compound areas such as the natural sciences. Thus, despite the fact that unidimensional IRT models are highly useful under specific conditions, the world of psychometrics faced the need for more sophisticated models that would reflect multiform examinee-item interactions. For that reason, unidimensional IRT models were extended to multidimensional models to become capable to express situations when examinees need multiple abilities and skills to respond to test items.

Categories of Multidimensional Item Response Theory models

There are two broad categories of MIRT models: compensatory and non-compensatory (partially compensatory).

  • Under the compensatory model, examinees’ abilities work in cooperation to escalate the probability of a correct response to an item, i.e. higher ability on one trait/dimension compensates for lower ability on the other. For instance, an examinee should read a passage on a current event and answer a question about it. This item assesses two abilities: reading comprehension and knowledge of current events. If the examinee is aware of the current event, then that will compensate for their lower reading ability. On the other hand, if the examinee is an excellent reader then their reading skills will compensate for lack of knowledge about the event.
  • Under the non-compensatory model, abilities do not compensate each other, i.e. an examinee needs to possess a high level abilities on all traits/dimensions to have a high chance to respond to a test item correctly. For example, an examinee should solve a traditional mathematical word problem. This item assesses two abilities: reading comprehension and mathematical computation. If the examinee has excellent reading ability but low mathematical computation ability, they will be able to read the text but not be able to solve the problem. Possessing reverse abilities, the examinee will not be able to solve the problem without understanding what is being asked.

Within the literature, compensatory MIRT models are more commonly used.

Applications of Multidimensional Item Response Theory

  • Since MIRT analyses concentrate on the interaction between item parameters and examinee characteristics, they have provoked numerous studies of skills and abilities necessary to give a correct answer to an item, and of sensitivity dimensions for test items. This research area demonstrates the importance of a thorough comprehension of the ways that tests function. MIRT analyses can help verify group differences and item sensitivities that facilitate test and item bias, and define the reasons behind differential item functioning (DIF) statistics.
  • MIRT allows linking of calibrations, i.e. putting item parameter estimates from multiple calibrations into the same multidimensional coordinate system. This enables reporting examinee performance on different sets of items as profiles on multiple dimensions located on the same scales. Thus, MIRT makes it possible to create large pools of calibrated items that can be used for the construction of multidimensionally parallel test forms and computerized adaptive testing (CAT).

Conclusion

Given the complexity of the constructs in education and psychology and the level of details provided in test specifications, MIRT is particularly relevant for investigating how individuals approach their learning and, subsequently, how it is influenced by various factors. MIRT analysis is still at an early stage of its development and hence is a very active area of current research, in particular of CAT technologies. Interested readers are referred to Reckase (2009) for more detailed information about MIRT.

References

Reckase, M. D. (2009). Multidimensional Item Response Theory. Springer.

The post Multidimensional Item Response Theory appeared first on Assessment Systems.

]]>
The IRT Item Pseudo-guessing Parameter https://assess.com/irt-item-pseudo-guessing-parameter/ Sun, 18 Dec 2022 20:24:03 +0000 https://assess.com/?p=46060 The item pseudo-guessing parameter is one of the three item parameters estimated under item response theory (IRT): discrimination a, difficulty b, and pseudo-guessing c. The parameter that is utilized only in the 3PL model is the pseudo-guessing parameter c.  It represents a lower asymptote for the probability of an examinee responding correctly to an item. […]

The post The IRT Item Pseudo-guessing Parameter appeared first on Assessment Systems.

]]>
The item pseudo-guessing parameter is one of the three item parameters estimated under item response theory (IRT): discrimination a, difficulty b, and pseudo-guessing c. The parameter that is utilized only in the 3PL model is the pseudo-guessing parameter c.  It represents a lower asymptote for the probability of an examinee responding correctly to an item.

Background of IRT item pseudo-guessing parameter 

If you look at the post on the IRT 2PL model, you will realize that the probability of a response depends on the examinee ability level θ, the item discrimination parameter a, and the item difficulty parameter b. However, one of the realities in testing is that examinees will get some multiple-choice items by guessing. Therefore, the probability of the correct response might include a small component that is guessing.

Neither 1PL, nor 2PL considered guessing phenomenon, but Birnbaum (1968) altered the 2PL model to include it. Unfortunately, due to this inclusion the logistic function from the 2PL model lost its nice mathematical properties. Nevertheless, even though it is no longer a logistic model in a technical aspect, it has become known as the three-parameter logistic model (3PL or IRT 3PL). Baker (2001) suggested the following equation for the IRT 3PL model

3pl-formula

where:

a is the item discrimination parameter

b is the item difficulty parameter

c is the item pseudo-guessing parameter

θ is the examinee ability parameter

Interpretation of pseudo-guessing parameter

In general, the pseudo-guessing parameter c is the probability of getting the item correct by guessing alone. For instance, c = 0.20 means that at all ability levels, the probability of getting the item correct by guessing alone is 0.20.  This very often reflects the structure of multiple choice items: 5-options items will tend to have values around 0.20 and 4-option items around 0.25.

It is worth noting, that the value of c does not vary as a function of the trait/ability level θ, i.e. examinees with high and low ability levels have the same probability of responding correctly by guessing. Theoretically, the guessing parameter ranges between 0 and 1, but practically values above 0.35 are considered inacceptable, hence the range 0 < c < 0.35 is applied.  A value higher than 1/k, where k is the number of options, often indicates that a distractor is not performing.

How pseudo-guessing parameter affects other parameters

Due to the presence of the guessing parameter, the definition of the item difficulty parameter b is changed. Within the 1PL and 2PL models, b is the point on the ability scale at which the probability of the correct response is 0.5. Under the 3PL model, the lower limit of the item characteristic curve (ICC) or item response function (IRF) is the value of c rather than zero. According to Baker (2001), the item difficulty parameter is the point on the ability scale where:

probability-c

Therefore, the probability is halfway between the value of c and 1. Thus, the parameter c has defined a boundary to the lowest value of the probability of the correct response, and the item difficulty parameter b determines the point on the ability scale where the probability of the correct response is halfway between this boundary and 1.

The item discrimination parameter a can still be interpreted as being proportional to the slope of the ICC/IRF at the point θ = b. However, under the 3PL model, the slope of the ICC/IRF at θ = b actually equals to a×(1−c)/4. These changes in the definitions of the item parameters a and b are quite important when interpreting test analyses.

References

Baker, F. B. (2001). The basics of item response theory.

Birnbaum, A. L. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Addison-Wesley.

The post The IRT Item Pseudo-guessing Parameter appeared first on Assessment Systems.

]]>