Posts on psychometrics: The Science of Assessment

The Covid-19 pandemic caused a radical shift in the education sector, which has certainly impacted EdTech trends. This resulted in the accelerated adoption of education technology solutions across k12, e-learning, and higher education. Some trends and technologies that dominated the marketplace in 2019, including Video-based learning,  AI, Machine Learning, digital assessments, learning analytics, and many others are expected to take a new turn in 2021. 

To give you a complete glimpse into the future of education post-pandemic, we have prepared 10 EdTech Trends and Predictions To Watch Out For In 2021. What will happen to the education technology marketplace? Which trends and technologies will disrupt education as we know it? 

Important EdTech Stats To Note:

  • 86% of educators say Technology is the core of learning
  • EdTech expenditure will hit $404 billion by 2025 at a CAGR of 16%, between 2019-2025
flashing lamp
  • Utilization of Technology for Engagement is on the rise (84% of Educators support the idea)
  • Global investments in education technology are on the rise, with a total of $10 billion in 2020. They are expected to reach $87 billion in the next decade. 

Image Credit: Bryce Durbin / TechCrunch

  1. E-Learning

E-learning is the biggest EdTech trend in 2021. Its scalability and cost-effectiveness have made this model the go-to solution for students all over the world, especially during the pandemic.  Also, recent research shows that e-learning improves retention rates by 25-60%, which makes it a better deal for students. 

The industry is booming and is expected to hit the $375 billion mark by 2026. 

To capitalize on this trend, many institutions have started offering online degrees online. Platforms such as Coursera have partnered with the world’s greatest universities to provide short courses, certifications, and bachelor’s degrees online, at lower costs. 

Other platforms offering eLearning services include Lessonly, Udemy, Masterclass, and Skillshare. This trend is not likely to slow down post-covid, but there will be barriers that should be eliminated in order to accelerate the adoption of this trend. 

  1. Digital Assessments Become A Necessity

Assessments are a vital part of the learning process as they gauge the effectiveness of learning methods. Over the past few years, Online tests have been replacing traditional testing methods which mostly involved pen and paper. Some of the main advantages of digital assessments Include:

  •  Reliability
  •  Cost-effectiveness
  •  Flexibility. 
  • Increased security and efficiency
  • Students with disabilities can have equal chances with other students. This is because online testing can support technologies such as speech-to-text.
  • Immediate results reduce stress and anxiety among students
  • Learning analytics help improve assessment quality

Online exams have increased effectiveness in the testing process by adding technologies and strategies such as adaptive testing,  remote proctoring, and learning analytics into the equation. This has helped students improve retention rates and exam developers enhance their testing strategies. Cheating, which has been a major concern in assessments has been reduced by online assessment techniques such as Lockdown browsers, time windows, and time windows. Digital assessments are a must-have for all institutions looking to capitalize on the advantages of online testing in 2021.

  1. Extended Reality (XR)

Virtual Reality Learning Retention (Source: FrontCore)

Virtual Reality and Augmented Reality are revolutionizing the education sector by challenging the way in which students used to receive and process information.  2021 will witness increased adoption of XR technologies and solutions across education and corporate learning environments. 

Extended Reality technologies and solutions are improving retention rates and learning experiences in students by creating immersive worlds where students can learn new skills and visualize concepts in an interactive way.  According to research, Virtual Reality has higher retention rates compared to other learning methods such as lecturing, demonstration, and audio-visuals. Some other ways in which XR is improving the education sector are offering cost-effective and interesting field trips, enhancing lifelong learning, and transforming hands-on learning. 

  1. Chatbots In Higher Education

Chatbots took the sales and marketing industry by storm over the past few years and are finding their way into the education sector. During the Covid-19 pandemic, many higher education institutions integrated chatbots into their systems to help in the automation of tasks. 

Some tasks that can be automated including lead generation for courses and student query resolution on pressing issues such as fees. This improves the cost-effectiveness in institution administration. This EdTech trend will be integrated into more institutions in 2021. 

  1. Video-Based Learning At The Forefront

Video has become king in learning and entertainment. 2021 will see an exponential increase in video consumption. In relation to learning, the rapid increase in devices on the internet, and new technological innovations such as smart TVs will accelerate the adoption of video-based learning. 

There are many advantages that come with video-based learning. Some of them include cost-friendliness, higher retention rate in students, support for self-pace learning, and many others. The reliability of this mode of learning will make higher education institutions invest in this mode of learning. 

Using video-based learning is not a complex procedure, especially because there are many tools and platforms online that can help with the process. The cost of producing high-quality videos and lectures has also become cheaper. 

  1. Artificial Intelligence (AI)

Artificial Intelligence (AI) has been at the forefront of pioneering innovations that drive humanity to the new ages of civilization. In relation to education, AI has played a big role in enhancing learning experiences, improving assessment practices and so much more! 

In 2021, AI will make headlines as it will be integrated into important aspects of higher education such as campus security, student performance evaluation, and so much more! Facial recognition, which is a component of AI, can be used in tracking student attendance, securing campus premises, and decreasing infidelity during online testing.  AI also plays a vital part in student evaluation and improving the assessment process by powering technologies such as adaptive testing. Adaptive testing is a method that improves the testing process by providing unique assessments to different students based on factors such as IQ and many others.  The power of AI is unlimited and there is no predicting what 2021 will bring to the education industry.

  1. Big Data and Learning Analytics

Learning analytics is yet another one of the EdTech trends that is yet to disrupt the education industry. Learning analytics is a set of technologies and strategies that are used to empower educators, track learning processes, and help instructors make data-driven decisions.

Some components of learning analytics include Artificial Intelligence, visualizations, machine learning, statistics, personalized adaptive learning, and educational data mining. 

As we all know, data runs everything in the information era, and whoever has the best data analytics model wins. In education, analytics has played an important role in shaping instructional design and assessment processes for the best. 

2021 will see increased adoption of analytics tools into Learning and Development.

Here are some of the ways learning analytics will shape Education for the best;

  • Behavior and performance prediction– Using predictive analytics, instructors can predict the performance of learners based on their present and past performances.
  • Increased retention rates– By having a clear picture of what students are good at and what they are struggling with (You can identify this by analyzing their digital assessment results), one can shape the learning process to increase retention rates. 
  • Support for personalized learning– This is the hot cake of learning analytics. By using data analytics, instructors can create personalized learning experiences for different students based on several metrics.
  • Increase cost-efficiency– Education and training are expensive, and learning analytics can help improve the quality of learning outcomes while cutting out unnecessary costs.
  • Helps in the definition of learning strategy– By understanding what you need to achieve, and the purpose of your goals, you can use data analytics to create a roadmap to help you in the process. 
  1. Gamification

Gamification is not a new concept in education. It has been around for ages, but it’s expected to take a different turn through the integration of game mechanics in learning. Minecraft, the game everybody can’t seem to have enough of is a good example. This game has come in handy in teaching important concepts in society such as storytelling.  

Programming and game design, which are skills that are important in our digital world, has received a huge boost from the existence of games such as Roblox. The game attracts more than 100 million players each month.  

In 2021, we are likely to see the rise of more gamification strategies in the ed-tech space.

Part 2: Predictions

  1. Accelerated EdTech Investments

Over the past decade, the global EdTech venture capital has seen exponential growth, from 500 million in 2010 to more than $10 billion in 2020). 

As the adoption of digital EdTech solutions rises, this number is expected to rise rapidly in the next decade. Some reports estimate an $87 billion increase in investments by 2030.

 With Asia being the leading adopters of EdTech, startups from other continents will be looking to up their game to capitalize on opportunities arising from the industry.

  1.  EdTech Market Value Will Reach New Heights

The Covid-19 pandemic increased the adoption of Education technology solutions. The expenditure of the sector is expected to reach new heights ($404 billion) between 2019 and 2025. 

What acted as a source of short-term resort for the education sector will become the new norm. 

Many schools and colleges have not started their transition to the new models of learning, but most of them are laying off strategies to start the process. Integrating some of the trends discussed above may be a good way to start your journey.

Final Thoughts on 2021 EdTech Trends

The uncertainty that comes in the aftermath of the Covid-19 pandemic makes it hard to predict the ed-tech trends that will lead the way in 2021. But, we believe that 2021 will be a year of many firsts and unknowns in the education sector. That said, there is immense power in understanding how the above-discussed trends and predictions may shape L & D. And how you can capitalize on the opportunities that arise from them. 

If you are interested in leveraging the power of digital assessments and psychometrics practices for your education and corporate campaigns, feel free to sign up for access to our free assessment tools or consult with our experienced team for services such as test development, job analysis, implementation services and so much more!

Online corporate training assessment is an integral part of improving the performance and productivity of employees. To gauge the effectiveness of the training, online assessments are the go-to solution for many businesses.  While traditional on-premise assessments do the trick, online corporate training assessments have proved to be more cost-effective, reliable, and secure. (Image Source: Unsplash)

Also known as digital assessments or e-assessments, online assessments are the new black in recruitment and learning & development (L&D). In fact, according to a Talent Assessment Study conducted in 2016-2017, the adoption of online assessment grew by 116% in L&D and 114% in recruitment. Made up of several approaches, online assessments aim to assess technical and soft skills and individuals’ personalities. They can be adopted in industries such as education, technology, and even corporate environments!

However, despite the many benefits of online assessments, developing assessments that are effective has proven to be a problem for many businesses. 

In this blog, we explore 5 tips to create effective online corporate training assessments to make sure that your training achieves its main objectives.

But, before going into the brass tucks of online assessment tricks, here are some benefits of using online testing in corporate training assessment:

 

future of assessment

The Future Of Assessments: Tech. ed

 

Benefits Of Online Corporate Training Assessments

Insight On Company Strengths and Weaknesses

Online testing gives businesses and organizations insight into the positives and negatives of their training programs. For example, if an organization realizes that certain employees can’t grasp certain concepts, they may decide to modify how they are delivered or eliminate them completely. The employees can also work on their areas of weaknesses after the assessments, hence improving productivity. 

Helps in Measuring Performance

Unlike traditional testing which is impossible to perform analytics on the assessments, measure performance and high-fidelity, online assessments can quantify initial goals such as call center skills. By measuring performance, businesses can create data-driven roadmaps on how their employees can achieve their best form in terms of performance. 

Advocate For Ideas and Concepts That Can Be Integrated Into The Real World

Workers learn every day and sometimes what they learn is not used in driving the business towards attaining its objectives. This can lead to burnout and information overload in employees, which in turn lowers performance and work quality. By using online assessments, you can customize tests to help workers attain skills that are only in alignment with your business goals. This can be done by using methods such as adaptive testing

Other Benefits Include: 

  • The assessments can be taken from anywhere in the world
  • Saves the company a lot of time and resources
  • Improved security compared to traditional assessments
  • Improved accuracy and reliability. 
  • Scalability and flexibility

Components Of Effective Corporate Training Assessments

Effective online testing is all about having a well-defined strategy. What is it you are trying to achieve with the corporate assessment? 

It is therefore important that you have data-driven insight into key questions such as what you are trying to assess, why you are doing it, the time to assess learners, which online assessment tools to use, and how you go about with the assessments.

 These help you formulate assessments with learning goals and objectives in mind, which improves their effectiveness. 

Tips To Improve Your Online Corporate Training Assessments

1. Personalized Testing

Most businesses have an array of different training needs. Most employees have different backgrounds and responsibilities in organizations, which is difficult to create effective generalized tests. To achieve the main objectives of your training, it is important to differentiate the assessments. Sales assessments, technical assessments, management assessments, etc can not be the same. Even in the same department, there could be diversification in terms of skills and responsibilities. One way to achieve personalized testing is by using methods such as Computerized Adaptive Testing. Through the immense power of AI and machine learning, this method gives you the power to create tests that are unique to employees. Not only does personalized testing improve effectiveness in your workforce, but it is also cost-effective, secure, and in alignment with the best Psychometrics practices in the corporate world. It is also important to keep in mind the components of effective online assessments when creating personalized tests.  

2. Analyzing Assessment Results

Many businesses don’t see the importance of analyzing corporate training assessment results. How do you expect to improve your training programs and assessments if you don’t derive value from the assessment results? 

Analyzing Assessment Results

Example of Assessment analysis on Iteman

 

Analyze assessment results using psychometric analytics software such as Iteman to get important insights such as successful participants, item performance issues, and many others. This provides you with a blueprint to improve your assessments and employee training programs. 

3. Integrating Online Assessments Into Company Culture

Getting the best out of online assessments is not about getting it right once, but getting it right over a long period of time. Integrating assessments into company culture is one great way to achieve this. This will make assessments part of the systems and employees will always look forward to improving their skills. You can also use strategies such as gamification to make sure that your employees enjoy the process. It is also critical to give the employees the freedom to provide feedback on the assessments and training programs. 

4. Diversify Your Assessment Types

One great myth about online assessments is that they are limited in terms of questions and problems you can present to your employees. However, this is not true!

 By using methods such as item banking, assessment systems are able to provide users with the ability to develop assessments using different question types. Some modern question types include:

  • Drag & drop 
  • Multiple correct 
  • Embedded audio or video
  • Cloze or fill in the blank
  • Number lines
  • Situational judgment test items
  • Counter or timer for performance tests

Diversification of question types improves comprehension in employees and helps them develop skills to approach problems from multiple angles. 

5. Choose Your Online Assessment Tools Carefully

This is among the most important considerations you should make when creating a corporate assessment strategy. This is because online assessment software and tools are the core of how your campaigns turn out. 

There are many online assessment tools available, but choosing one that meets your requirements can be a daunting task. Apart from the key considerations of budget, functionality, etc, there are many other factors to keep in mind before choosing online assessment tools. 

To help you choose an online assessment tool that will help you in your assessment journey, here are a few things to consider:

Ease-of-use

Most people are new to online assessments, and as much as some functionalities can be powerful, they may be overwhelming to candidates and the test development staff. This may make candidates underperform. It is, therefore, important to vet the platform and its functionalities to make sure that they are easy to use. 

Functionality 

Online assessments are becoming popular and new inventions are being made every day. Does the online assessment software have the latest innovations in the industry? Do you get value for your money? Does it support modern psychometrics like item response theory? These are just but a few questions to ask when vetting a platform for functionality. 

Assessment Reporting and Visualizations

One major advantage of online assessments over traditional ones is that they offer access to instant assessment reporting. You should therefore look for a platform that offers advanced reporting and visualizations in metrics such as performance, question strengths, and many others. 

Cheating precautions and Security

When it comes to online corporate assessments, there are two concerns when it comes to security. How secure are the assessments? And how secure is the platform? In relation to the tests, the platform should provide precautions and technologies such as Lockdown browser against cheating. They should also have measures in place to make sure that user data is secure. 

Reliable Support System

This is one consideration that many businesses don’t keep in mind, and end up regretting in the long run. Which channels does the corporate training assessment platform use to provide its users with support? Do they have resources such as whitepapers and documentation in case you need them? How fast is their support? 

These are questions you should ask before selecting a platform to take care of your assessment needs. 

Scalability

A good online testing platform should be able to provide you with resources should your needs go beyond expectation. This includes delivery volume – server scalability – but also being able to manage more item authors, more assessments, more examinees, and greater psychometric rigor.

Final Thoughts

Adopting effective online corporate training assessments can be daunting tasks with a lot of forces at play, and we hope these tips will help you get the best out of your assessments. Online assessments are revolutionizing many industries, and it’s about time you integrate them into your workflow.

Do you want to integrate online assessments into your corporate environment or any industry but feel overwhelmed by the process? Feel free to contact an experienced team of professionals to help you create an assessment strategy that helps you achieve your long-term goals and objectives.  

You can also sign up to get free access to our online assessment suite including 60 item types, IRT, adaptive testing, and so much more functionality!

We are proud to announce that ASC’s next-generation platform, Assess.ai, was recently announced as a Finalist for the 2021 EdTech Awards. This follows the successful launch of the landmark platform in 2020, where it quickly garnered attention as a finalist for the Innovation award at the 2020 conference of the Association of Test Publishers.

What is Assess.ai?

Assess.ai brings together best practices in assessment with modern approaches to automation, AI, and software design. It is built to make it easier to develop high-quality assessments with automated item review and automated item generation, and deliver with advanced psychometrics like adaptive and multistage testing.

Computerized Adaptive Testing Software

An example of this approach is shown below; a modern UI/UX presents a visual platform for publishing MultiStage Tests based on information functions from item response theory (IRT). Users can easily publish tests with this modern AI-based approach, without having to write a single line of code.

Multistage testing

 

What are the EdTech Awards?

See the full list of finalists, and learn more about the awards, via the link below.

https://www.edtechdigest.com/2021-finalists-winners/

Item analysis refers to the process of statistically analyzing assessment data to evaluate the quality and performance of your test items. This is an important step in the test development cycle, not only because it helps improve the quality of your test, but because it provides documentation for validity: evidence that your test performs well and score interpretations mean what you intend.  It is one of the most common applications of psychometrics, by using item statistics to flag, diagnose, and fix the poorly performing items on a test.

This post will describe the basics of this process.  You can also you can also check out our tutorial videos on our YouTube channel and download our free psychometric software.


Download a free copy of Iteman: Software for Item Analysis

The Goals of item analysis

Item analysis boils down to two goals:

  1. Find the items that are not performing well (difficulty and discrimination, usually)
  2. Figure out WHY those items are not performing well

There are different ways to evaluate performance, such as whether the item is too difficult/easy, too confusing (not discriminating), miskeyed, or perhaps even biased to a minority group.

Moreover, there are two completely different paradigms for this analysis: classical test theory (CTT) and item response theory (IRT). On top of that, the analyses can differ based on whether the item is dichotomous (right/wrong) or polytomous (2 or more points).

Because of the possible variations, item analysis complex topic. But, that doesn’t even get into the evaluation of test performance. In this post, we’ll cover some of the basics for each theory, at the item level.

Implementing Item Analysis

To implement item analysis, you should utilize dedicated software designed for this purpose. If you utilize an online assessment platform, it will provide you output for item analysis, such as distractor P values and point-biserials (if not, it isn’t a real assessment platform).

In some cases, you might utilize standalone software. CITAS provides a simple spreadsheet-based approach to help you learn the basics, completely for free.  A screenshot of the CITAS output is below.  However, professionals will need a level above this.  Iteman and Xcalibre are two specially-designed software programs from ASC for this purpose, one for CTT and one for IRT.  Click here to see terms of use for free versions.

CITAS output with histogram

Item Analysis with Classical Test Theory

Classical Test Theory provides a simple and intuitive approach to item analysis. It utilizes nothing more complicated than proportions, averages, counts, and correlations. For this reason, it is useful for small-scale exams or use with groups that do not have psychometric expertise.

Item Difficulty: Dichotomous

CTT quantifies item difficulty for dichotomous items as the proportion (P value) of examinees that correctly answer it.

It ranges from 0.0 to 1.0. A high value means that the item is easy, and a low value means that the item is difficult.  There are no hard and fast rules because interpretation can vary widely for different situations.  For example, a test given at the beginning of the school year would be expected to have low statistics since the students have not yet been taught the material.  On the other hand, a professional certification exam, where someone can not even sit unless they have 3 years of experience and a relevant degree, might have all items appear easy even though they are quite advanced topics!  Here are some general guidelines”

    0.95-1.0 = Too easy (not doing much good to differentiate examinees, which is really the purpose of assessment)

    0.60-0.95 = Typical

    0.40-0.60 = Hard

    <0.40 = Too hard (consider that a 4 option multiple choice has a 25% chance of pure guessing)

With Iteman, you can set bounds to automatically flag items.  The minimum P value bound represents what you consider the cut point for an item being too difficult. For a relatively easy test, you might specify 0.50 as a minimum, which means that 50% of the examinees have answered the item correctly.

For a test where we expect examinees to perform poorly, the minimum might be lowered to 0.4 or even 0.3. The minimum should take into account the possibility of guessing; if the item is multiple-choice with four options, there is a 25% chance of randomly guessing the answer, so the minimum should probably not be 0.20.  The maximum P value represents the cut point for what you consider to be an item that is too easy. The primary consideration here is that if an item is so easy that nearly everyone gets it correct, it is not providing much information about the examinees.  In fact, items with a P of 0.95 or higher typically have very poor point-biserial correlations.

Note that because the scale is inverted (lower value means higher difficulty), this is sometimes referred to as item facility.

The Item Mean (Polytomous)

This refers to an item that is scored with 2 or more point levels, like an essay scored on a 0-4 point rubric or a Likert-type item that is “Rate on a scale of 1 to 5.”

  • 1=Strongly Disagree
  • 2=Disagree
  • 3=Neutral
  • 4=Agree
  • 5=Strongly Agree

The item mean is the average of the item responses converted to numeric values across all examinees. The range of the item mean is dependent on the number of categories and whether the item responses begin at 0. The interpretation of the item mean depends on the type of item (rating scale or partial credit). A good rating scale item will have an item mean close to ½ of the maximum, as this means that on average, examinees are not endorsing categories near the extremes of the continuum.

You will have to adjust for your own situation, but here is an example for the 5-point Likert-style item.

1-2 is very low; people disagree fairly strongly on average

2-3 is low to neutral; people tend to disagree on average

3-4 is neutral to high; people tend to agree on average

4-5 is very high; people agree fairly strongly on average

Iteman also provides flagging bounds for this statistic.  The minimum item mean bound represents what you consider the cut point for the item mean being too low.  The maximum item mean bound represents what you consider the cut point for the item mean being too high.

The number of categories for the items must be considered when setting the bounds of the minimum/maximum values. This is important as all items of a certain type (e.g., 3-category) might be flagged.

Item Discrimination: Dichotomous

In psychometrics, discrimination is a GOOD THING, even though the word often has a negative connotation in general. The entire point of an exam is to discriminate amongst examinees; smart students should get a high score and not-so-smart students should get a low score. If everyone gets the same score, there is no discrimination and no point in the exam! Item discrimination evaluates this concept.

CTT uses the point-biserial item-total correlation (Rpbis) as its primary statistic for this.

The Pearson point-biserial correlation (r-pbis) is a measure of the discrimination or differentiating strength, of the item. It ranges from −1.0 to 1.0 and is a correlation of item scores and total raw scores.  If you consider a scored data matrix (multiple-choice items converted to 0/1 data), this would be the correlation between the item column and a column that is the sum of all item columns for each row (a person’s score).

A good item is able to differentiate between examinees of high and low ability yet have a higher point-biserial, but rarely above 0.50. A negative point-biserial is indicative of a very poor item because it means that the high-ability examinees are answering incorrectly, while the low examinees are answering it correctly, which of course would be bizarre, and therefore typically indicates that the specified correct answer is actually wrong. A point-biserial of 0.0 provides no differentiation between low-scoring and high-scoring examinees, essentially random “noise.”  Here are some general guidelines on interpretation.  Note that these assume a decent sample size; if you only have a small number of examinees, many item statistics will be flagged!

0.20+ = Good item; smarter examinees tend to get the item correct

0.10-0.20 = OK item; but probably review it

0.0-0.10 = Marginal item quality; should probably be revised or replaced

<0.0 = Terrible item; replace it

***Major red flag is if the correct answer has a negative Rpbis and a distractor has a positive Rpbis

The minimum item-total correlation bound represents the lowest discrimination you are willing to accept. This is typically a small positive number, like 0.10 or 0.20. If your sample size is small, it could possibly be reduced.  The maximum item-total correlation bound is almost always 1.0, because it is typically desired that the Rpbis be as high as possible.

The biserial correlation is also a measure of the discrimination or differentiating strength, of the item. It ranges from −1.0 to 1.0. The biserial correlation is computed between the item and total score as if the item was a continuous measure of the trait. Since the biserial is an estimate of Pearson’s r it will be larger in absolute magnitude than the corresponding point-biserial.

The biserial makes the stricter assumption that the score distribution is normal. The biserial correlation is not recommended for traits where the score distribution is known to be non-normal (e.g., pathology).

Item Discrimination: Polytomous

The Pearson’s r correlation is the product-moment correlation between the item responses (as numeric values) and total score. It ranges from −1.0 to 1.0. The r correlation indexes the linear relationship between item score and total score and assumes that the item responses for an item form a continuous variable. The r correlation and the Rpbis are equivalent for a 2-category item, so guidelines for interpretation remain unchanged.

The minimum item-total correlation bound represents the lowest discrimination you are willing to accept. Since the typical r correlation (0.5) will be larger than the typical Rpbis (0.3) correlation, you may wish to set the lower bound higher for a test with polytomous items (0.2 to 0.3). If your sample size is small, it could possibly be reduced.  The maximum item-total correlation bound is almost always 1.0, because it is typically desired that the Rpbis be as high as possible.

The eta coefficient is an additional index of discrimination computed using an analysis of variance with the item response as the independent variable and total score as the dependent variable. The eta coefficient is the ratio of the between-groups sum of squares to the total sum of squares and has a range of 0 to 1. The eta coefficient does not assume that the item responses are continuous and also does not assume a linear relationship between the item response and total score.

As a result, the eta coefficient will always be equal or greater than Pearson’s r. Note that the biserial correlation will be reported if the item has only 2 categories.

Key and Distractor Analysis

In the case of many item types, it pays to evaluate the answers. A distractor is an incorrect option. We want to make sure that more examinees are not selecting a distractor than the key (P value) and also that no distractor has higher discrimination. The latter would mean that smart students are selecting the wrong answer, and not-so-smart students are selecting what is supposedly correct. In some cases, the item is just bad. In others, the answer is just incorrectly recorded, perhaps by a typo. We call this a miskey of the item. In both cases, we want to flag the item and then dig into the distractor statistics to figure out what is wrong.

Example

Below is an example output for one item from our Iteman software, which you can download for free. You might also be interested in this video.  This is a very well-performing item.  Here are some key takeaways.

  • This is a 4-option multiple choice item
  • It was on a subscore named “Example subscore”
  • This item was seen by 736 examinees
  • 70% of students answered it correctly, so it was fairly easy, but not too easy
  • The Rpbis was 0.53 which is extremely high; the item is good quality
  • The line for the correct answer in the quantile plot has a clear positive slope, which reflects the high discrimination quality
  • The proportion of examinees selecting the wrong answers was nicely distributed, not too high, and with negative Rpbis values. This means the distractors are sufficiently incorrect and not confusing.

Iteman Psychometric Item Analysis

Item Analysis with Item Response Theory

Item Response Theory (IRT) is a very sophisticated paradigm of item analysis and tackles numerous psychometric tasks, from item analysis to equating to adaptive testing. It requires much larger sample sizes than CTT (100-1000 responses per item) and extensive expertise (typically a PhD psychometrician). It isn’t suitable for small-scale exams like classroom quizzes.

However, it is used by virtually every “real” exam you will take in your life, from K-12 benchmark exams to university admissions to professional certifications.

If you haven’t used IRT, I recommend you check out this blog post first.

Item Difficulty

IRT evaluates item difficulty for dichotomous items as a b-parameter, which is sort of like a z-score for the item on the bell curve: 0.0 is average, 2.0 is hard, and -2.0 is easy. (This can differ somewhat with the Rasch approach, which rescales everything.) In the case of polytomous items, there is a b-parameter for each threshold, or step between points.

Item Discrimination

IRT evaluates item discrimination by the slope of its item response function, which is called the a-parameter. Often, values above 0.80 are good and below 0.80 are less effective.

Key and Distractor Analysis

In the case of polytomous items, the multiple b-parameters provide an evaluation of the different answers. For dichotomous items, the IRT modeling does not distinguish amongst correct answers. Therefore, we utilize the CTT approach for distractor analysis. This remains extremely important for diagnosing issues in multiple choice items.

Example

Here is an example of what output from an IRT analysis program (Xcalibre) looks like. You might also be interested in this video.

  • Here, we have a polytomous item, such as an essay scored from 0 to 3 points.
  • It is calibrated with the generalized partial credit model.
  • It has strong classical discrimination (0.62)
  • It has poor IRT discrimination (0.466)
  • The average raw score was 2.314 out of 3.0, so fairly easy
  • There was a sufficient distribution of responses over the four point levels
  • The boundary parameters are not in sequence; this item should be reviewed

Xcalibre-poly-output

Do you conduct adaptive testing research? Perhaps a thesis or dissertation? Or maybe you have developed adaptive tests and have a technical report or validity study? I encourage you to check out the Journal of Computerized Adaptive Testing as a publication outlet for your adaptive testing research. JCAT is the official journal of the International Association for Computerized Adaptive Testing (IACAT), a nonprofit organization dedicated to improving the science of assessments.

JCAT has an absolutely stellar board of editors and was founded to focus on improving the dissemination of research in adaptive testing. The IACAT website also contains a comprehensive bibliography of research in adaptive testing, across all journals and tech reports, for the past 50 years.  IACAT was founded at the 2009 conference on computerized adaptive testing and has since held conferences every other year as well as hosting the JCAT journal.

Potential research topics at the JCAT journal

Here are some of the potential research topics:

laptop data graph

  • Item selection algorithms
  • Item exposure algorithms
  • Termination criteria
  • Cognitive diagnostic models
  • Simulation studies
  • Validation studies
  • Item response theory models
  • Multistage testing
  • Use of adaptive testing in new(er) situations, like patient reported outcomes
  • Design of actual adaptive assessments and their release into the wild

If you are not involved in CAT research but are interested, please visit the IACAT and journal website to read the articles.  Access is free.  JCAT would also appreciate it if you would share this information to colleagues so that they might consider publication.

La pandemia COVID-19 está cambiando drásticamente todos los aspectos de nuestro mundo, y una de las áreas más afectadas es la evaluación educativa y otros tipos de evaluación. Muchas organizaciones aún realizaban pruebas con metodologías de hace 50 años, como colocar a 200 examinados en una sala grande con escritorios, exámenes en papel y un lápiz. COVID-19 está obligando a muchas organizaciones a dar un giro, lo que brinda la oportunidad de modernizar las evaluaciones. Pero, ¿cómo podemos mantener la seguridad en la evaluación, y por lo tanto la validez, a través de estos cambios? A continuación, presentamos algunas sugerencias, las cuales se pueden implementar fácilmente en las plataformas de evaluación de ASC, líderes en la industria. Comience registrándose para obtener una cuenta gratuita en https://assess.com/assess-ai/.

Verdadera banca de ítems con acceso a contenido

Una buena evaluación comienza con buenos ítems. Si bien los Sistemas de Gestión del Aprendizaje (LMS) y otras plataformas que no son realmente de evaluación incluyen algunas funciones de creación de ítems, por lo general no cumplen con los requisitos básicos para una verdadera banca de ítems. Existen prácticas recomendadas con respecto a la banca de ítems que son estándar en las organizaciones de evaluación a gran escala (p. Ej., Los Departamentos de Educación de Estado en EE. UU.), pero son sorprendentemente raras para los exámenes de certificación / licencia profesional, universidades y otras organizaciones. A continuación, se muestran algunos ejemplos.

• Los ítems son reutilizables (no es necesario cargarlos para cada prueba en la que se utilicen)

• Seguimiento de la versión del ítem

• Seguimiento y auditorías de edición hecha por usuarios

• Controles de contenido de autor (los profesores de matemáticas solo pueden ver elementos de matemáticas)

• Almacenar metadatos como parámetros de la Teoría de Respuesta al Ítem (TRI) y estadísticas clásicas

• Seguimiento del uso de ítems en las pruebas

• Flujo de trabajo de revisión de ítems

Acceso basado en roles

Todos los usuarios deben estar limitados por roles, como Autor del ítem, Revisor del Ítem, Editor de Pruebas y Administrador de Examinados. Entonces, por ejemplo, es posible que alguien a cargo de administrar la lista de examinados / estudiantes nunca vea ninguna pregunta del examen.

Análisis forense de datos

Hay muchas formas de analizar los resultados de tu prueba para buscar posibles amenazas de seguridad / validez. Nuestro  software SIFT  proporciona una plataforma de software gratuita para ayudarte a implementar esta metodología moderna. Puedes evaluar los índices de colusión, que cuantifican qué tan similares son las respuestas para cualquier par de examinados. También puedes evaluar los tiempos de respuesta, el rendimiento del grupo y las estadísticas acumuladas.

Aleatorización

Cuando las pruebas se entregan en línea, debe tener la opción de aleatorizar el orden de los ítems y también el orden de las respuestas. Al imprimir en papel, debe haber una opción para aleatorizar el orden. Pero, por supuesto, está mucho más limitado respecto a esto cuando se usa papel.

Prueba lineal sobre la marcha (LOFT)

LOFT creará una prueba aleatoria única para cada examinado. Por ejemplo, puedes tener un grupo de 300 ítems distribuidos en 4 dominios, y cada examinado recibirá 100 ítems con 25 de cada dominio. Esto aumenta enormemente la seguridad.

Pruebas adaptativas computarizadas (CAT)

CAT lleva la personalización aún más lejos y adapta la dificultad del examen y el número de ítems que ve cada alumno, en base a ciertos algoritmos y objetivos psicométricos. Esto hace que la prueba sea extremadamente segura.

Navegador bloqueado

¿Quieres asegurarte de que el alumno no pueda navegar en busca de respuestas o tomar capturas de pantalla de ítems? Necesitas un navegador bloqueado. Las plataformas de evaluación de ASC,  Assess.ai  y  FastTest, vienen con esto listo para usar y sin costo adicional.

Códigos de prueba para examinados

¿Quieres asegurarte de que la persona adecuada realice el examen adecuado? Genera contraseñas únicas de un solo uso para que las entregue un supervisor después de la verificación de identidad. Esto es especialmente útil en la supervisión remota; el estudiante nunca recibe ninguna información  antes del examen sobre cómo ingresar, excepto para iniciar la sesión de supervisión virtual. Una vez que el supervisor verifica la identidad del examinado le proporciona la contraseña única de un solo uso.

Códigos de supervisor

¿Quieres un paso adicional en el procedimiento de inicio de la prueba? Una vez que se verifica la identidad de un estudiante e ingresa su código, el supervisor también debe ingresar una contraseña diferente que sea exclusiva para él ese día.

Ventanas de fecha / hora

¿Quieres evitar que los examinados ingresen temprano o tarde? Configura una ventana de tiempo específica, como el viernes de 9 a 12 am.

Supervisión basada en IA (Inteligencia Artificial)

Este nivel de supervisión es relativamente económico, y hace un gran trabajo validando los resultados de un examinado individual. Sin embargo, no protege la propiedad intelectual de las preguntas de tu examen. Si un examinado roba todas las preguntas, no lo sabrás de inmediato. Por lo tanto, es muy útil para exámenes de nivel bajo o medio, pero no tan útil para exámenes de alto riesgo como certificaciones o licenciaturas. Obtenga más información sobre nuestras opciones de supervisión remota. También te recomiendo esta publicación de blog para obtener una descripción general de la industria de supervisión remota.

Supervisión en línea en tiempo real

Si no puedes asistir a los centros de pruebas en persona debido a COVID, esta es la siguiente mejor opción. Los supervisores en vivo pueden registrar al candidato, verificar la identidad e implementar todas las demás cosas anteriores. Además, pueden verificar el entorno del examinado y detener el examen si ven que el examinado roba preguntas u otros problemas importantes. MonitorEDU es un gran ejemplo de esto.

¿Cómo puedo empezar?

¿Necesitas ayuda para implementar algunas de estas medidas? ¿O simplemente quieres hablar sobre las posibilidades? Envía un correo electrónico a ASC a solutions@assess.com.

 

Traducido de la entrada de blog escrita por el Dr. Nathan Thompson.

Nathan Thompson obtuvo su doctorado en psicometría de la Universidad de Minnesota, con un enfoque en pruebas adaptativas computarizadas. Su licenciatura fue de Luther College con una triple especialización en Matemáticas, Psicología y Latín. Está interesado principalmente en el uso de la IA y la automatización de software para aumentar y reemplazar el trabajo realizado por psicometristas, lo que le ha proporcionado una amplia experiencia en el diseño y programación de software. El Dr. Thompson ha publicado más de 100 artículos de revistas y presentaciones de conferencias, pero su favorito sigue siendo https://pareonline.net/getvn.asp?v=16&n=1.

La vigilancia en línea existe desde hace más de una década. Pero dado el reciente brote de COVID-19, las instituciones educativas y de fuerza laboral / certificación están luchando por cambiar sus operaciones, y una gran parte de esto es un aumento increíble en la vigilancia en línea. Esta publicación de blog está destinada a proporcionar una descripción general de la industria de vigilancia en línea para alguien que es nuevo en el tema o está comenzando a comprar y está abrumado por todas las opciones que existen.

Vigilancia en Línea: Dos Mercados Distintos

En primer lugar, describiría la industria de vigilancia en línea como perteneciente a dos mercados distintos, por lo que el primer paso es determinar cuál de ellos se adapta a tu organización.

1. Sistemas a mayor escala, de menor costo (cuando son a gran escala) y con menos seguridad, diseñados para ser utilizados solo como un complemento para las principales plataformas LMS como Blackboard o Canvas. Por lo tanto, estos sistemas de vigilancia en línea están diseñados para exámenes de nivel medio, como un examen de mitad de período de Introducción a la psicología en una universidad.

2. Sistemas de menor escala, mayor costo y mayor seguridad diseñados para ser utilizados con plataformas de evaluación independientes. Estos son generalmente para exámenes de mayor importancia como certificación o fuerza laboral, o quizás para uso especial en universidades como exámenes de Admisión y Colocación.

¿Cómo reconocer la diferencia? El primer tipo anunciará la fácil integración con sistemas como Blackboard o Canvas como característica clave. También se centrarán a menudo en la revisión de videos por IA, en lugar de usar humanos en tiempo real. Otra consideración clave es observar la base de clientes existente, que usualmente es anunciada.

Otras formas en que los sistemas de vigilancia en línea pueden diferir

IA vs humanos: Algunos sistemas se basan exclusivamente en algoritmos de inteligencia artificial para marcar las grabaciones de video de los examinados. Otros sistemas utilizan humanos reales.

Grabar y Revisar vs Humanos en Tiempo Real: Existen dos formas si se utilizan humanos. Primero, puede ser en vivo y en tiempo real, lo que significa que hay un ser humano en el otro extremo del video que puede confirmar la identidad antes de permitir que comience la prueba, y detener la prueba si hay actividad ilícita. Grabar y Revisar grabará el audio y un humano lo comprobará en un plazo de 24 a 48 horas. Esto es más flexible, pero no puedes detener la prueba si alguien está robando el contenido; probablemente no lo sabrás hasta el día siguiente.

Captura de pantalla: Algunos proveedores de vigilancia en línea tienen la opción de grabar / transmitir la pantalla y también la cámara web. Algunos también brindan la opción de hacer únicamente esto (sin cámara web) para exámenes de menor importancia.

Teléfono móvil como tercera cámara: Algunas plataformas más nuevas ofrecen la opción de integrar fácilmente el teléfono móvil del examinado como una tercera cámara, que funciona efectivamente como un supervisor humano. Se les indicará a los examinados que utilicen el video para mostrar debajo de la mesa, detrás del monitor, etc., antes de comenzar el examen. Luego, se les puede indicar que coloquen el teléfono a 2 metros de distancia con una vista clara de toda la habitación mientras se realiza la prueba.

Uso de supervisores propios: Algunos sistemas de vigilancia en línea le permiten utilizar su propio personal como supervisores, lo que es especialmente útil si la prueba se realiza en un período de tiempo reducido. Si se entrega continuamente 24 × 7 durante todo el año, probablemente desee utilizar el personal altamente capacitado del proveedor.

Integraciones de API: Algunos sistemas requieren que los desarrolladores de software configuren una integración de API con su LMS o plataforma de evaluación. Otros son más flexibles y puedes iniciar sesión por ti mismo, cargar una lista de examinados y ya queda todo listo para la prueba.

Bajo pedido vs Programado: Algunas plataformas requieren que se programe un margen de tiempo para que los examinados realicen la prueba. Otros son puramente bajo demanda y el examinado puede presentarse cuando esté listo. MonitorEDU es un excelente ejemplo de esto: los examinados se presentan en cualquier momento, presentan su identificación a un humano en tiempo real y luego comienzan la prueba de inmediato: sin descargas / instalaciones, sin verificaciones del sistema, sin integraciones de API, nada.

Más seguridad: Un Mejor Sistema de Entrega de Pruebas

Una buena plataforma de entrega de pruebas también vendrá con su propia funcionalidad para mejorar la seguridad de las pruebas: aleatorización, generación automatizada de ítems, pruebas adaptativas computarizadas, pruebas lineales sobre la marcha, banca profesional de ítems, puntuación de la teoría de respuesta a los ítems, puntuación escalada, análisis psicométrico, equiparación, entrega de bloqueo y más. En el contexto de la vigilancia en línea, quizás lo más destacado sea la entrega de bloqueo. En este caso, la prueba se hará cargo por completo de la computadora del examinado y no podrá usarla para nada más hasta que termine la prueba.

Los sistemas LMS rara vez incluyen esta funcionalidad, porque no son necesarios para un examen de mitad de período de Introducción a la psicología. Sin embargo, hay muchas cosas en juego en la mayoría de las evaluaciones del mundo (admisiones universitarias, certificaciones, contratación de personal, etc.) y estas pruebas dependen en gran medida de dicha funcionalidad. Tampoco es solo una costumbre o una tradición. Dichos métodos se consideran esenciales según los estándares internacionales, incluidos AERA / APA / NCMA, ITC y NCCA.

Socios de ASC de Vigilancia en Línea

ASC les brinda a sus clientes una solución lista para ser usada, debido a que está asociado con algunos de los líderes en el ámbito. Estos incluyen: MonitorEDU, ProctorExam, Examity y Proctor360. Obtén más información en nuestra página web sobre esa funcionalidad y otra que explica el concepto de seguridad de prueba configurable.

Traducido de la entrada de blog escrita por el Dr. Nathan Thompson.

Nathan Thompson obtuvo su doctorado en psicometría de la Universidad de Minnesota, con un enfoque en pruebas adaptativas computarizadas. Su licenciatura fue de Luther College con una triple especialización en Matemáticas, Psicología y Latín. Está interesado principalmente en el uso de la IA y la automatización de software para aumentar y reemplazar el trabajo realizado por psicometristas, lo que le ha proporcionado una amplia experiencia en el diseño y programación de software. El Dr. Thompson ha publicado más de 100 artículos de revistas y presentaciones de conferencias, pero su favorito sigue siendo https://pareonline.net/getvn.asp?v=16&n=1.

The IRT Test Information Function is a concept from item response theory (IRT) that is designed to evaluate how well an assessment differentiates examinees, and at what ranges of ability. For example, we might expect an exam composed of difficult items to do a great job in differentiating top examinees, but it is worthless for the lower half of examinees because they will be so confused and lost.

The reverse is true of an easy test; it doesn’t do any good for top examinees. The test information function quantifies this and has a lot of other important applications and interpretations.

IRT Test Information Function: how to calculate it

The test information function is not something you can calculate by hand. First, you need to estimate item-level IRT parameters, which define the item response function. The only way to do this is with specialized software; there are a few options in the market, but we recommend Xcalibre.

Next, the item response function is converted to an item information function for each item. The item information functions can then be summed into a test information function. Lastly, the test information function is often inverted into the conditional standard error of measurement function, which is extremely useful in test design and evaluation.

IRT Item Parameters

Software like Xcalibre will estimate a set of item parameters. The parameter you use depends on the item types and other aspects of your assessment.

For example, let’s just use the 3-parameter model, which estimates a, b, and c. And we’ll use a small test of 5 items. These are ordered by difficulty: item 1 is very easy and Item 5 is very hard.

Item a b c
1 1.00 -2.00 0.20
2 0.70 -1.00 0.40
3 0.40 0.00 0.30
4 0.80 1.00 0.00
5 1.20 2.00 0.25

 

Item Response Function

The item response function uses the IRT equation to convert the parameters into a curve. The purpose of the item parameters is to fit this curve for each item, like a regression model to describe how it performs.

Here are the response functions for those 5 items. Note the scale on the x-axis, similar to the bell curve, with the easy items to the left and hard ones to the right.

 

Item Information Function

The item information function evaluates the calculus derivative of the item response function. An item provides more information about examinees where it provides more slope.

For example, consider Item 5: it is difficult, so it is not very useful for examinees in the bottom half of ability. The slope of the Item 5 IRF is then nearly 0 for that entire range. This then means that its information function is nearly 0.

5 items IIFs

 

Test Information Function

The test information function then sums up the item information functions to summarize where the test is providing information. If you imagine adding the graphs above, you can easily imagine some humps near the top and bottom of the range where there are the prominent IIFs. 

5 items TIF

 

Conditional Standard Error of Measurement Function

The test information function can be inverted into an estimate of the conditional standard error of measurement. What do we mean by conditional? If you are familiar with classical test theory, you know that it estimates the same standard error of measurement for everyone that takes a test.

But given the reasonable concepts above, it is incredibly unreasonable to expect this. If a test has only difficult items, then it measures top students well, and does not measure lower students well, so why should we say that their scores are just as accurate? The conditional standard error of measurement turns this into a function of ability.

Also, note that it refers to the theta scale and not to the number-correct scale.

5 items CSEM

 

How can I implement all this?

For starters, I recommend delving deeper into an item response theory book. My favorite is Item Response Theory for Psychologists by Embretson and Riese. Next, you need some item response theory software.

Xcalibre can be downloaded as a free version for learning and is the easiest program to learn how to use (no 1980s-style command code… how is that still a thing?). But if you are an R fan, there are plenty of resources in that community as well.

Tell me again: why are we doing this?

The purpose of all this is to effectively model how items and tests work, namely, how they interact with examinees. This then allows us to evaluate their performance so that we can improve them, thereby enhancing reliability and validity.

Classical test theory had a lot of shortcomings in this endeavor, which led to IRT being invented. IRT also facilitates some modern approaches to assessment, such as linear on-the-fly testing, adaptive testing, and multistage testing.

Professional certification programs that allow participants to validate their knowledge and skills abound in the U.S. and around the world. Examples range from long-standing, well-recognized teacher certification and CPR programs to more niche and cutting-edge offerings, such as the Project Management Professional and Amazon Web Services credentials. Candidates are often pursuing some combination of new career, promotion, higher salary, and self-fulfillment. They take varying risks with their time and finances for what could return great reward. Given the high stakes involved, an extensive effort goes into certification program management – that is, ensuring that the certifications are developed and run according to best practices. This includes psychometrics, but is most definitely not limited to that topic.

 

What goes into certification program management?

There are many aspects that go into certification program management, including:pre-employment-testing

  • Legal status
  • Board governance
  • Accounting
  • Test development
  • Staffing, org charts, and org structure (firewall between certification and education)
  • Continuing education
  • Recertification
  • Prerequisites and eligibility pathways
  • Operations
  • Policies for candidates

One important consideration in certification program management is the requirement of firewall between staff involved in Certification and those involved in Education. Basically, you don’t want the people who are teaching courses to have seen the items on the test – especially if there is an incentive for them to help the students pass! For example, if instructors are tracked by pass rate for their students or institution, they have a reason to want more people to pass, and could divulge more info about the exam than they should. If they never know such information, they instead concentrate on teaching.

Other stakes in certification program management include the protection of the public. This includes patients, students, customers, employees, employers, and all others affected by the performance of certified individuals. The program itself also wagers its reputation each time it confers a certification.

In this high-risk environment, savvy certification program managers are concerned with granting certification only to those likely to practice competently. They optimize their tests using tools from the science of psychometrics to ensure that candidates must demonstrate appropriate knowledge and skills in order to pass. Learn more about the process of test development in this blog post.

What is a question bank? A question bank refers to a pool of test questions to be used on various assessments across time.  For example, a Certified Widgetmaker Exam might have a pool of 500 questions developed over the past 10 years. Suppose the exam is delivered in June and December of every year, and each time 150 questions are used. This strong pool of items allows the organization to easily select questions and publish a new form of the exam each time, maintaining security and validity.

A question bank is more commonly called an item bank. It is due to the fact that the term question is not often used because many assessment items are not actually questions; they might be statements, vignettes, simulations, or many things other than the traditional question-and-4-answers.

What goes into a question bank?  Metadata.

A question bank is actually much more than the questions themselves. If you ran the Certified Widgetmaker Exam, you would want to keep track of some additional important information. This is all based on the concept of treating the question as a reusable object; if you use the item 4 times, you should never need to type/upload it 4 times. It should be in the system only once, with all its associated metadata!

What to track Examples
Which exam forms used each question Dec 2017, May 2018, May 2019, Dec 2020
Unique item ID Math.Algebra.078
Source/Reference Wilson (2016) p. 123
Status New, Under Review, Active, Retired
Statistics Classical difficulty and discrimination: Item response theory parameters
Reviewer comments Jake Smith 2020/11/22: “I think that D is arguably correct, and we need to provide greater detail in the stem.”
Content area, domain, blueprint Math / Algebra / Quadratic

 

The Solution: Question Banking Software

As you can see, there’s actually quite a bit of functionality and data that goes into a true question bank system. And this is only regarding the questions themselves – it doesn’t get into additional topics such as media file management, Workflow Management, Automated Item Generation, or Test Assembly & Publishing. A professional question banking software system will have much, much more than just a way to store the questions.  FastTest provides a powerful alternative solution to some older platforms on the market.

Looking for a deeper treatment of the topic? Check out the chapter Computerized Item Banking by ASC’s cofounder, C. David Vale, in the 2006 Handbook of Test Development.

Want to learn more about how question banking software can help your organization? Click here, check out this other post, or fill out our contact form for a demonstration.