Assessment Systems https://assess.com/ Smarter Online Testing & Psychometrics Thu, 16 Nov 2023 18:17:29 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.1 https://assess.com/wp-content/uploads/2019/06/cropped-Assess_icon_yellow-transparent-1-32x32.png Assessment Systems https://assess.com/ 32 32 Content Validity in Assessment https://assess.com/content-validity-in-assessment/ Fri, 03 Nov 2023 21:22:19 +0000 https://assess.com/?p=56818 Content validity is an aspect of validity, a term that psychometricians use to refer to evidence that interpretations of test scores are supported.  For example, predictive validity provides evidence that a pre-employment test will predict job performance, tenure, and other important criteria.  Content validity, on the other hand, focuses on evidence that the content of […]

The post Content Validity in Assessment appeared first on Assessment Systems.

]]>
Content validity is an aspect of validity, a term that psychometricians use to refer to evidence that interpretations of test scores are supported.  For example, predictive validity provides evidence that a pre-employment test will predict job performance, tenure, and other important criteria.  Content validity, on the other hand, focuses on evidence that the content of the test covers what it should cover.

What is Content Validity?

Content validity refers to the extent to which a measurement instrument (e.g., a test, questionnaire, or survey) accurately and adequately measures the specific content or construct it is designed to assess. In simpler terms, it assesses whether the questions or items included in an assessment are relevant and representative of the subject matter or concept under investigation.

Example 1: You are working on a benchmark test for 5th grade mathematics in the USA.  You would likely want to ensure that all items align to the Common Core State Standards for the 5th grade mathematics curriculum.

Example 2: You are working on a certification exam for widgetmakers.  You should make sure that all items align to the publicly posted blueprint for this certification.  That, in turn, was not defined in willy-nilly – it should have been built on the results of a formal job task analysis study.

The Importance of Content Validity

Drives Accurate Measurement: Content validity helps in ensuring that the assessment tool is measuring what it’s intended to measure. This is critical for drawing meaningful conclusions and making informed decisions based on the results.content validity

Enhances Credibility: When your assessment has high content validity, it enhances the credibility and trustworthiness of your findings. It demonstrates that you’ve taken the time to design a valid instrument. This is often referred to as face validity – which is not a “real” type of validity that psychometricians consider, but refers to if someone off the street looks at the test and says “yeah, that looks like all the items are on widgetmaking.”

Reduces Bias: Using assessment items that are not content-valid can introduce bias and inaccuracies into your results. By maintaining content validity, you reduce the risk of skewed or unreliable data.

Improves Decision-Making: Organizations often rely on assessments to make important decisions, such as hiring employees, designing educational curricula, or evaluating the effectiveness of marketing campaigns. Content-valid assessments provide a solid foundation for making these decisions.

Legal Defensibility: In general, if you deliver a test to select employees, you need to show either content validity (e.g., test on Microsoft Excel for bookkeepers) or predictive validity (conscientiousness is a personality trait but probably related to success as a bookkeeper).  A similar notion applies to other types of tests.

How to Assess Content Validity

There are various methods to assess content validity, such as expert reviews, pilot testing, and statistical techniques. One common method is to gather a panel of experts in the subject matter and have them review the assessment items to ensure that they align with the content domain.  Of course, if all the items are written directly to the blueprints in the first place, and reviewed before they even become part of the pool of active items, a post-hoc review like that is not necessary.

There has been more recent research on the application of machine learning to evaluate content, including the add-on option to look for enemy items by evaluating the distance between the content of any given pair of items.

If the test is multidimensional, a statistical approach known as factor analysis can help, to see if the items actually load on the dimensions they should.

Conclusion

In summary, content validity is an essential aspect of assessment design that ensures the questions or items used in an assessment are appropriate, relevant, and representative of the construct being measured. It plays a significant role in enhancing the accuracy, credibility, and overall quality of your assessments. Whether you’re a student preparing for an exam, a researcher developing a survey, or a business professional creating a customer feedback form, understanding and prioritizing content validity will help you achieve more reliable and meaningful results. So, next time you’re tasked with creating or using an assessment tool, remember the importance of content validity and its impact on the quality of your data and decision-making processes.

However, it is not the only aspect of validity.  The documentation of validity is a complex process that is often ongoing.  You will also need data on statistical performance of the test (e.g., alpha reliability), evaluation bias (e.g., differential item functioning), possibly predictive validity, and more.  Therefore, it’s important to work with a psychometrician that can help you understand what is involved and ensure that the test meets both international standards and the reason that you are building the test in the first place!

The post Content Validity in Assessment appeared first on Assessment Systems.

]]>
Predictive Validity of Test Scores https://assess.com/predictive-validity/ Wed, 04 Oct 2023 03:37:14 +0000 https://assess.com/?p=45150 Predictive Validity is a type of test score validity which evaluates how well a test predicts something in the future, usually with a goal of making more effective decisions about people.  For instance, it is often used in the world of pre-employment testing, where we want a test to predict things like job performance or […]

The post Predictive Validity of Test Scores appeared first on Assessment Systems.

]]>
Predictive Validity is a type of test score validity which evaluates how well a test predicts something in the future, usually with a goal of making more effective decisions about people.  For instance, it is often used in the world of pre-employment testing, where we want a test to predict things like job performance or tenure, so that a company can hire people that do a good job and stay a long time – a very good result for the company, and worth the investment.

Validity, in a general sense, is evidence that we have to support intended interpretations of test scores.  There are different types of evidence that we can gather to do so.  Predictive validity refers to evidence that the test predicts things that it should predict.  If we have quantitative data to support such conclusions, it makes the test more defensible and can improve the efficiency of its use.  For example, if a university admissions test does a great job of predicting success at university, then universities will want to use it to select students that are more likely to succeed.

Examples of Predictive Validity

Predictive validity evidence can be gathered for a variety of assessment types.

  1. Pre-employment: Since the entire purpose of a pre-employment test is to positively predict good things like job performance or negatively predict bad things like employee theft or short tenure, a ton of effort goes into developing tests to function in this way, and then documenting that they do.
  2. University Admissions: Like pre-employment testing, the entire purpose of university admissions exams is predictive.  They should positively correlate with good things (first year GPA, four year graduation rate) and negatively predict the negative outcomes like academic probation or dropping out.
  3. Prep Exams: Preparatory or practice tests are designed to predict performance on their target test.  For example, if a prep test is designed to mimic the Scholastic Aptitude Test (SAT), then one way to validate it is to gather the SAT scores later, after the examinees take it, and correlate with the prep test.
  4. Certification & Licensure: The primary purpose of credentialing exams is not to predict job performance, but to ensure that the candidate has mastered the material necessary to practice their profession.  Therefore, predictive validity is not important, compared to content-related validity such as blueprints based on a job analysis. However, some credentialing organizations do research on the “value of certification” linking it to improved job performance, reduced clinical errors, and often external third variables such as greater salary.
  5. Medical/Psychological: There are some assessments that are used in a clinical situation, and the predictive validity is necessary in that sense.  For instance, there might be an assessment of knee pain used during initial treatment (physical therapy, injections) that can be predictively correlated with later surgery.  The same assessment might then be used after the surgery to track rehabilitation.

Predictive Validity in Pre-employment Testing

The case of pre-employment testing is perhaps the most common use of this type of validity evidence.  A new study (Sacket, Zhang, Berry, & Lievens, 2022) was recently released that was a meta-analysis of the various types of pre-employment tests and other selection procedures (e.g., structured interview), comparing their predictive validity power.  This was a modern update to the classic article by Schmidt & Hunter (1998).  While in the past the consensus has been that cognitive ability tests provide the best predictive power in the widest range of situations, the new article suggests otherwise.  It recommends the use of structured interview and job knowledge tests, which are more targeted towards the role in question, and therefore not surprising that they are well-performing.  This in turn suggests that you should not buy pre-fab ability tests and use them in a shotgun approach with the assumption of validity generalization, but instead leverage an online testing platform like FastTest that allows you to build high-quality exams that are more specific to your organization.

Why do we need predictive validity?

There are a number of reasons that you might need predictive validity for an exam.  They are almost always regarding the case where the test is used to make important decisions about people.

  1. Smarter decision-making: Predictive validity provides valuable insights for decision-makers. It helps recruiters identify the most suitable candidates, educators tailor their teaching methods to enhance student learning, and universities to admit the best students.
  2. Legal defensibility: If a test is being used for pre-employment purposes, it is legally required in the USA to either show that the test is obviously job-related (e.g., knowledge of Excel for a bookkeeping job) or that you have hard data demonstrating predictive validity.  Otherwise, you are open for a lawsuit.
  3. Financial benefits: Often, the reason for needing improved decisions is very financial.  It is often costly for large companies to recruit and train personnel.  It’s entirely possible that spending $100,000 per year on pre-employment tests could save millions of dollars in the long run.
  4. Benefits to the examinee: Sometimes, there is directly a benefit to the examinee.  This is often the case with medical assessments.

How to implement predictive validity

The simplest case is that of regression and correlation.  How well does the test score correlate with the criterion variable?  Below is a oversimplified example, of predicting university GPA from scores on an admissions test.  Here, the correlation is 0.858 and the regression is GPA = 0.34*SCORE + 0.533.  Of course, in real life, you would not see this strong of a predictive power, as there are many other factors which influence GPA.

Predictive validity

Advanced Issues

It is usually not a simple situation of two straightforward variables, such as one test and one criterion variable.  Often, there are multiple predictor variables (quantitative reasoning test, MS Excel knowledge test, interview, rating of the candidate’s resume), and moreover there are often multiple criterion variables (job performance ratings, job tenure, counterproductive work behavior).  When you use multiple predictors and a second or third predictor adds some bit of predictive power over that of the first variable, this is known as incremental validity.

You can also implement more complex machine learning models, such as neural networks or support vector machines, if they fit and you have sufficient sample size.

When performing such validation, you need to also be aware of bias.  There can be test bias where the test being used as a predictor is biased against a subgroup.  There can also be predictive bias where two subgroups have the same performance on the test, but one is overpredicted for the criterion and the other is underpredicted.  A rule of thumb for investigating this in the USA is the four-fifths rule.

Summary

Predictive validity is one type of test score validity, referring to evidence that scores from a certain test can predict their intended target variables.  The most common application of it is to pre-employment testing, but it is useful in other situations as well.  But validity is an extremely important and wide-ranging topic, so it is not the only type of validity evidence that you should gather.

The post Predictive Validity of Test Scores appeared first on Assessment Systems.

]]>
What is Psychometrics? Definition: Improve exams with science. https://assess.com/psychometrics-101-need/ Wed, 05 Jul 2023 19:12:56 +0000 http://www.fasttestweb.com/?p=5067 Psychometrics is the science of educational and psychological assessment.  It scientifically studies how tests are developed, delivered, and scored, regardless of the test topic.    The goal is to provide validity: evidence to support that interpretations of scores from the test are trustworthy. Psychometricians tackle fundamental questions around assessment, such as how to determine if a […]

The post What is Psychometrics? Definition: Improve exams with science. appeared first on Assessment Systems.

]]>
Psychometrics is the science of educational and psychological assessment.  It scientifically studies how tests are developed, delivered, and scored, regardless of the test topic.    The goal is to provide validity: evidence to support that interpretations of scores from the test are trustworthy.

Psychometricians tackle fundamental questions around assessment, such as how to determine if a test is reliable or if a question is of good quality, as well as much more complex questions like how to ensure that a score today is on the same scale as a score 10 years ago.  The goal of psychometrics is to make test scores fairer, more precise, and more valid – because test scores are used to make decisions about people (pass a course, hire for a job…), and better tests mean better decisions.  Why?  The scientific evidence is overwhelming that tests provide better information for decision makers than many other types of information, such as interviews, resumes, or educational attainment.

What is psychometrics? An introduction / definition.

Psychometrics is the study of assessment itself, regardless of what type of test is under consideration. In fact, many psychometricians don’t even work on a particular test, they just work on psychometrics itself, such as new methods of data analysis.  Most psychometricians don’t care about what the test is measuring, and will often switch to new jobs at completely unrelated topics, such as moving from a K-12 testing company to psychological measurement to an Accountant certification exam.  We often refer to whatever we are measuring simple as “theta” – a term from item response theory.

Psychometrics is a branch of data science.  In fact, it’s been around a long time before that term was even a buzzword.  Don’t believe me?  Check out this Coursera course on Data Science, and the first example they give as one of the foundational historical projects in data science is… psychometrics!  (early research on factor analysis of intelligence)

Even though assessment is everywhere and Psychometrics is an essential aspect of assessment, to most people it remains a black box, and professionals are referred to as “psychomagicians” in jest. However, a basic understanding is important for anyone working in the testing industry, especially those developing or selling tests.  It’s also important for many areas that use assessments, like human resources and education.

What is not psychometrics?

Psychometrics is NOT limited to very narrow types of assessment.  Some people use the term interchangeably with concepts like IQ testing, personality assessment, or pre-employment testing.  These are each but tiny parts of the field!  Also, it is not the administration of a test.

 

What questions does the field of Psychometrics address?

Building and maintaining a high-quality test is not easy.  A lot of big issues can arise.  Much of the field revolves around solving major questions about tests: what should they cover, what is a good question, how do we set a good cutscore, how do we make sure that the test predicts job performance or student success, etc.

 

How do we define what should be covered by the test? (Test Design)

Before writing any items, you need to define very specifically what will be on the test.  If the test is in credentialing or pre-employment, psychometricians typically run a job analysis study to form a quantitative, scientific basis for the test blueprints.  A job analysis is necessary for a certification program to get accredited.  In Education, the test coverage is often defined by the curriculum.

 

How do we ensure the questions are good quality? (Item Writing)

There is a corpus of scientific literature on how to develop test items that accurately measure whatever you are trying to measure.  A great overview is the book by Haladyna.  This is not just limited to multiple-choice items, although that approach remains popular.  Psychometricians leverage their knowledge of best practices to guide the item authoring and review process in a way that the result is highly defensible test content.  Professional item banking software provides the most efficient way to develop high-quality content and publish multiple test forms, as well as store important historical information like item statistics.

 

How do we set a defensible cutscore? (Standard Setting)

Test scores are often used to classify candidates into groups, such as pass/fail (Certification/Licensure), hire/non-hire (Pre-Employment), and below-basic/basic/proficient/advanced (Education).  Psychometricians lead studies to determine the cutscores, using methodologies such as Angoff, Beuk, Contrasting-Groups, and Borderline.

 

How do we analyze results to improve the exam? (Psychometric Analysis)

Psychometricians are essential for this step, as the statistical analyses can be quite complex.  Smaller testing organizations typically utilize classical test theory, which is based on simple mathematics like proportions and correlations.  Large, high-profile organizations typically use item response theory (IRT), which is based on a type of nonlinear regression analysis.  Psychometricians evaluate overall reliability of the test, difficulty and discrimination of each item, distractor analysis, possible bias, multidimensionality, linking multiple test forms/years, and much more.  Software such as  Iteman  and  Xcalibre  is also available for organizations with enough expertise to run statistical analyses internally.  Scroll down below for examples.

 

How do we compare scores across groups or years? (Equating)

This is referred to as linking and equating.  There are some psychometricians that devote their entire career to this topic.  If you are working on a certification exam, for example, you want to make sure that the passing standard is the same this year as last year.  If you passed 76% last year and this year you passed 25%, not only will the candidates be angry, but there will be much less confidence in the meaning of the credential.

 

How do we know the test is measuring what it should? (Validity)

Validity is the evidence provided to support score interpretations.  For example, we might interpret scores on a test to reflect knowledge of English, and we need to provide documentation and research supporting this.  There are several ways to provide this evidence.  A straightforward approach is to establish content-related evidence, which includes the test definition, blueprints, and item authoring/review.  In some situations, criterion-related evidence is important, which directly correlates test scores to another variable of interest.  Delivering tests in a secure manner is also essential for validity.

 

Where is Psychometrics Used?

Certification/Licensure/Credentialing

In certification testing, psychometricians develop the test via a documented chain of evidence following a sequence of research outlined by accreditation bodies, typically: job analysis, test blueprints, item writing and review, cutscore study, and statistical analysis.  Web-based item banking software like  FastTest  is typically useful because the exam committee often consists of experts located across the country or even throughout the world; they can then easily log in from anywhere and collaborate.

 

Pre-Employment

In pre-employment testing, validity evidence relies primarily on establishing appropriate content (a test on PHP programming for a PHP programming job) and the correlation of test scores with an important criterion like job performance ratings (shows that the test predicts good job performance).  Adaptive tests are becoming much more common in pre-employment testing because they provide several benefits, the most important of which is cutting test time by 50% – a big deal for large corporations that test a million applicants each year. Adaptive testing is based on item response theory, and requires a specialized psychometrician as well as specially designed software like  FastTest.

 

K-12 Education

Most assessments in education fall into one of two categories: lower-stakes formative assessment in classrooms, and higher-stakes summative assessments like year-end exams.  Psychometrics is essential for establishing the reliability and validity of higher-stakes exams, and on equating the scores across different years.  They are also important for formative assessments, which are moving towards adaptive formats because of the 50% reduction in test time, meaning that student spend less time testing and more time learning.

 

Universities

Universities typically do not give much thought to psychometrics even though a significant amount of testing occurs in higher education, especially with the move to online learning and MOOCs.  Given that many of the exams are high stakes (consider a certificate exam after completing a year-long graduate program!), psychometricians should be used in the establishment of legally defensible cutscores and in statistical analysis to ensure reliable tests, and professionally designed assessment systems used for developing and delivering tests, especially with enhanced security.

 

test development cycle job task analysis psychometrics

Medicine/Psychology

Have you ever taken a survey at your doctor’s office, or before/after a surgery?  Perhaps a depression or anxiety inventory at a psychotherapist?  Psychometricians have worked on these.

 

The Test Development Cycle

Psychometrics is the core of the test development cycle, which is the process of developing a strong exam.  It is sometimes called similar names like assessment lifecycle.

You will recognize some of the terms from the introduction earlier.  What we are trying to demonstrate here is that those questions are not standalone topics, or something you do once and simply file a report.  An exam is usually a living thing.  Organizations will often be republishing a new version every year or 6 months, which means that much of the cycle is repeated on that timeline.  Not all of it is; for example, many orgs only do a job analysis and standard setting every 5 years.

Consider a certification exam in healthcare.  The profession does not change quickly because things like anatomy never change and medical procedures rarely change (e.g., how to measure blood pressure).  So, every 5 years it does a job analysis of its certificants to see what they are doing and what is important.  This is then converted to test blueprints.  Items are re-mapped if needed, but most likely do not need it because there are probably only minor changes to the blueprints.  Then a new cutscore is set with the modified-Angoff method, and the test is delivered this year.  It is delivered again next year, but equated to this year rather than starting again.  However, the item statistics are still analyzed, which leads to a new cycle of revising items and publishing a new form for next year.

 

Example of Psychometrics in Action

Here is some output from our Iteman software.  This is deeply analyzing a single question on English vocabulary, to see if the student knows the word alleviate.  About 70% of the students answered correctly, with a very strong point-biserial.  The distractor P values were all in the minority and the distractor point-biserials were negative, which adds evidence to the validity.  The graph shows that the line for the correct answer is going up while the others are going down, which is good.  If you are familiar with item response theory, you’ll notice how the blue line is similar to an item response function.  That is not a coincidence.

FastTest Iteman Psychometrics Analysis

 

Now, let’s look at another one, which is more interesting.  Here’s a vocab question about the word confectioner.  Note that only 37% of the students get it right… even though there is a 25% chance just of guessing!!!  However, the point-biserial discrimination remains very strong at 0.49.  That means it is a really good item.  It’s just hard, which means it does a great job to differentiate amongst the top students.

Confectioner confetti

 

Psychometrics looks fun!  How can I join the band?

You will need a graduate degree.  I recommend you look at the NCME website with resources for students.  Good luck!

Already have a degree and looking for a job?  Here’s the two sites that I recommend:

NCME – Also has a job listings page that is really good (ncme.org)

Horizon Search – Headhunter for Psychometricians and I/O Psychologists

The post What is Psychometrics? Definition: Improve exams with science. appeared first on Assessment Systems.

]]>
Digital Badges https://assess.com/digital-badges/ Mon, 29 May 2023 04:28:25 +0000 https://assess.com/?p=54258 Digital badges (aka ebadges) have emerged in today’s digitalized world as a powerful tool for recognizing and showcasing individual’s accomplishments in an online format which is more comprehensive, immediate, brandable, and easily verifiable compared to traditional paper certificates or diplomas. In this blog post, we will delve into the world of digital badges, explore best […]

The post Digital Badges appeared first on Assessment Systems.

]]>
Digital badges (aka ebadges) have emerged in today’s digitalized world as a powerful tool for recognizing and showcasing individual’s accomplishments in an online format which is more comprehensive, immediate, brandable, and easily verifiable compared to traditional paper certificates or diplomas. In this blog post, we will delve into the world of digital badges, explore best badging practices, discuss their advantages in education, and highlight examples of badge-awarding platforms.

Digital Badges and Their Utilization

Digital badges are visual representations or icons that are applied to recognize and verify an individual’s achievement or mastery of a particular skill or knowledge area in the online space. They serve as a form of digital credentialing or micro-credentialing, providing a way to display their skills, knowledge, or accomplishments in a specific domain, helping them stand out, and gain recognition in the digital landscape.

Digital badges often contain metadata, such as the name of the issuing organization, a brief description of the achievement, criteria for earning the badge, and evidence of the individual’s accomplishment. This metadata is embedded within the badge image using a technology called Open Badges, allowing anyone to verify the authenticity and details of the badge.

Digital badges are typically issued and exhibited electronically what makes them easily shareable and accessible across various digital platforms, such as social media profiles, resumes, portfolios, or online learning platforms. Digital badges are widely used in various contexts, including education, professional development, training programs, online courses, and gamification.

Best Badging Practices

To ensure the credibility of digital badges, it is important to adhere to the most effective badging practices. Here are some key practices to consider:

  • Clearly Defined Criteria: Badges should have well-defined criteria that outline the specific skills or achievements required to earn the badge. This ensures that the badge holds value and meaning.
  • Authenticity and Verification: Badges should incorporate metadata via technologies like Open Badges, enabling easy verification of the badge’s authenticity and details. This helps maintain trust and credibility in the digital badge ecosystem.
  • Issuer Credibility: Reputable organizations, institutions, or experts in the field should issue badges. The credibility of the issuer adds value to the badge and increases its recognition.
  • Visual Design: Badges should be visually appealing and distinct, making them easily recognizable and shareable. Thoughtful design elements can enhance the badge’s appeal and encourage individuals to display them proudly.

Advantages of Using Digital Badges in Education

Digital badges offer numerous advantages in educational settings, transforming the way achievements and skills are recognized and valued. Below you may find some key advantages listed:

  • Granular Skill Recognition: Badges provide a way to recognize and demonstrate specific skills or achievements that might not be captured by traditional grades or degrees. This allows individuals to highlight their unique strengths and expertise.
  • Motivation and Engagement: Badges can act as a motivational tool, driving learners to actively pursue goals, and master new skills. The visual nature of badges and the ability to display them publicly create a sense of achievement and pride.
  • Portable and Shareable: Digital badges can be easily shared across various platforms, such as social media profiles, resumes, or online portfolios. This increases the visibility of accomplishments and facilitates networking and professional opportunities.
  • Lifelong Learning: Badges promote a culture of lifelong learning by recognizing and acknowledging continuous skill development. Individuals can earn badges for completing online courses, attending workshops, or acquiring new competencies, fostering a commitment to ongoing personal and professional growth.
  • Brand enhancement: Badges provide exposure to the brand of the issuing institution.
  • Turnaround time: Badges can often be available immediately on a profile page after the credential is awarded.
  • Verifiability: Digital badges are easily verifiable when using an appropriate platform/technology. This helps the consumer, hiring manager, or other stakeholder to determine if a person has the skills and knowledge that are relevant to the credential.

Examples of Badge-Awarding Platforms

Open badges platforms

Several platforms have emerged to facilitate the creation, issuance, and display of digital badges. Here are a few notable examples:

  • Credly is a widely used platform that allows organizations to design, issue, and manage digital badges. It also provides features for verifying badges and displaying them on various online platforms.
  • Open Badge Factory is an open-source platform that enables badge creation, management, and verification. It offers customizable badge templates and integration with various learning management systems.
  • Badgr is a platform that supports the creation and awarding of digital badges. It provides features for badge design, verification, and integration with various learning management systems (LMS) and online platforms.
  • BadgeCert is another platform which supports digital badges and can be integrated into other platforms.

 

The post Digital Badges appeared first on Assessment Systems.

]]>
Certification Management System: Streamline Credential Management https://assess.com/certification-management-system/ Fri, 19 May 2023 19:17:24 +0000 https://assess.com/?p=54170 A Certification Management System (CMS) or Credential Management System (CMS) plays a pivotal role in streamlining the key processes surrounding the certification or credentialing of people, namely that they have certain knowledge or skills in a profession.  It helps with ensuring compliance, reducing business operation costs, and maximizing the value of certifications. In this article, […]

The post Certification Management System: Streamline Credential Management appeared first on Assessment Systems.

]]>
A Certification Management System (CMS) or Credential Management System (CMS) plays a pivotal role in streamlining the key processes surrounding the certification or credentialing of people, namely that they have certain knowledge or skills in a profession.  It helps with ensuring compliance, reducing business operation costs, and maximizing the value of certifications. In this article, we explore the significance of adopting a CMS and its benefits for both nonprofits and businesses.  In today’s fast-paced and competitive business landscape, managing certifications and credentials efficiently is crucial for organizations.

What is a certification management system?Certification management system, credentialing

A certification management system is an enterprise software platform that is designed specifically for organizations whose primary goal is to award credentials to people for professional skills and knowledge.  Such an organization is often a nonprofit like an Association or Board, and is sometimes called an “awarding body” or similar term.   However, nowadays there are many for-profit corporations which offer certifications.  For example, many IT/Software companies will certify people on their products.  Here’s a page that does nothing but list the certifications offered by SalesForce!

These organizations often offer various credentials within a field.

  • Initial certifications (high stakes and broad) – Example: Certified Widgetmaker
  • Certificates or microcredentials – Example: Advanced Widget Design Specialist
  • Recertification exams – Example: taking a test every 5 years to maintain your Certified Widgetmaker status
  • Benchmark/progress exams – Example: Widgetmaker training programs are 2 years long and you take a benchmark exam at the end of Year 1
  • Practice tests: Example: old items from the Certified Widgetmaker test provided in a low-stakes fashion for practice

A credentialing body will need to manage the many business and operation aspects around these.  Some examples:

  • Applications, tracking who is applying for which
  • Payment processing
  • Eligibility pathways and documentation (e.g., diplomas)
  • Pass/Fail results
  • Retake status
  • Expiration dates

There will often be functionality that makes these things easier, like automated emails to remind the professionals when their certification is expiring so they can register for their Recertification exam.

 

Reasons to use a certification management system

  1. Enhancing Compliance and Regulatory Adherence: In industries with stringent compliance requirements, such as healthcare, finance, and IT, adhering to regulations and maintaining accurate records of certifications is paramount. A comprehensive CMS provides a centralized repository where organizations can securely store, track, and manage certifications and credentials. This ensures compliance with industry standards, regulatory bodies, and audits. With automated alerts and renewal notifications, organizations can stay on top of certification expirations, reducing the risk of non-compliance and associated penalties.  A certification management system will also help your organization achieve accreditation like NCCA or ANSI/ISO 17024.
  2. Streamlining Certification Tracking and Renewals: Managing certifications manually can be a time-consuming and error-prone process. A CMS simplifies this task by automating certification tracking, renewal reminders, and verification processes. By digitizing the management of certifications, organizations can save valuable time and resources, eliminating the need for tedious paperwork and manual record-keeping. Additionally, employees can easily access their certification status, track progress, and initiate renewal processes through a user-friendly interface, enhancing transparency and self-service capabilities.
  3. Improving Workforce Efficiency and Development: An efficient CMS empowers organizations to optimize their workforce’s knowledge and skill development. By capturing comprehensive data on certifications, skills, and training, organizations gain valuable insights into their employees’ capabilities. This information can guide targeted training initiatives, succession planning, and talent management efforts. Moreover, employees can leverage the CMS to identify skill gaps, explore potential career paths, and pursue professional development opportunities. This aligns individual aspirations with organizational goals, fostering a culture of continuous learning and growth.
  4. Enhancing Credential Verification and Fraud Prevention: Verifying the authenticity of certifications is critical, especially in industries where credentials hold significant weight. A CMS with built-in verification features enables employers, clients, and other stakeholders to authenticate certifications quickly and accurately. By incorporating advanced security measures, such as blockchain technology or encrypted digital badges, CMSs provide an added layer of protection against fraud and credential forgery. This not only safeguards the reputation of organizations but also fosters trust and confidence among customers, partners, and regulatory bodies.

 

Of course, the bottom line is that a certification management system will save money, because this is a lot of information for the awarding body to track, and it is mission-critical.

 

Conclusion

Implementing a Certification Management System or Credential Management System is a strategic investment for organizations seeking to streamline their certification processes and maximize their value. By centralizing certification management, enhancing compliance, streamlining renewals, improving workforce development, and bolstering credential verification, a robust CMS empowers organizations to stay ahead in a competitive landscape while ensuring credibility and regulatory adherence.

The post Certification Management System: Streamline Credential Management appeared first on Assessment Systems.

]]>
ANSI/ISO 17024 Accreditation https://assess.com/ansi-iso-17024/ Tue, 16 May 2023 06:15:50 +0000 https://assess.com/?p=54105 ANSI/ISO 17024 accreditation is an internationally recognized standard for the accreditation of personnel certification bodies. ANSI stands for the American National Standards Institute, while ISO refers to the International Organization for Standardization. The portion of ANSI which carries out the accreditation process is the ANSI National Accreditation Board (ANAB). What does ANSI/ISO 17024 cover? ANSI […]

The post ANSI/ISO 17024 Accreditation appeared first on Assessment Systems.

]]>
ANSI/ISO 17024 accreditation is an internationally recognized standard for the accreditation of personnel certification bodies. ANSI stands for the American National Standards Institute, while ISO refers to the International Organization for Standardization. The portion of ANSI which carries out the accreditation process is the ANSI National Accreditation Board (ANAB).

What does ANSI/ISO 17024 cover?

ANSI 17024 specifies the requirements for bodies operating certification programs for individuals, ensuring that the certification processes are fair, valid, and reliable. The standard outlines the general principles and requirements for the certification of personnel across various fields, including but not limited to healthcare, information technology, engineering, and safety.certification accreditation

The standard covers a wide range of aspects related to certification bodies, including:

  1. Impartiality and independence: Certification bodies must demonstrate impartiality and avoid any conflicts of interest.
  2. Certification program development: The standard sets criteria for developing certification programs, including defining competencies, establishing eligibility requirements, and developing examination processes.
  3. Examination processes: It outlines guidelines for the design, development, and administration of examinations to assess individuals’ knowledge, skills, and competencies.
  4. Certification process: The standard addresses the application process, evaluation of candidates, decision-making on certification, and ongoing certification maintenance.
  5. Management system requirements: ANSI/ISO 17024 includes requirements for the management system of the certification body, including document control, record keeping, and continual improvement processes.

What does ANSI/ISO 17024 mean?

Accreditation to ANSI/ISO 17024 provides assurance to stakeholders that the certification programs and processes are conducted in a consistent, competent, and reliable manner. It enhances the credibility and acceptance of certifications issued by accredited certification bodies, helping individuals demonstrate their professional competence and expertise in their respective fields.

Should my organization pursue accreditation?

That is a business question for you.  In some cases, you are required; in some professions, there might be a law that candidates do not receive federal funding or do not have certifications recognized if their certification is not accredited.  However, for many professions, accreditation is optional.  In those cases, if there are two certification bodies, it is a competitive advantage for one to become accredited.  But for small certification bodies with no competitors, accreditation is often not worth the great expense.

Note that ANSI/ISO 17024 is not the only show in town.  The National Commission for Certifying Agencies also accredits certifications, though they define it per certification program, not certification body.

Do I need to do all this work myself?

No!  Much of it, yes, you need to do, because no one else has the specific knowledge of your profession and content area.  But we can certainly help with some portions, especially the exam development and psychometrics.  We can also provide the item banking and exam delivery platform to securely administer your exams and report the results.

The post ANSI/ISO 17024 Accreditation appeared first on Assessment Systems.

]]>
Microcredentials: What are they? https://assess.com/microcredentials/ Thu, 11 May 2023 14:32:49 +0000 https://assess.com/?p=54032 Microcredentials are short, focused, and targeted educational or assessment-based certificate programs that offer learners a way to acquire specific skills or knowledge in a particular field.  In today’s fast-paced and rapidly evolving job market, traditional degrees may not always be enough to stand out among the competition, and they often take too long to achieve. […]

The post Microcredentials: What are they? appeared first on Assessment Systems.

]]>
Microcredentials are short, focused, and targeted educational or assessment-based certificate programs that offer learners a way to acquire specific skills or knowledge in a particular field.  In today’s fast-paced and rapidly evolving job market, traditional degrees may not always be enough to stand out among the competition, and they often take too long to achieve. Microcredentials have emerged as a promising solution to this problem.

In addition, there are other terms like “nano-degrees” or “digital badges” which sometimes overlap, but there are no agreed-upon definitions that differentiate.  However, Badges are usually considered to be even smaller than a Microcredential or Nano-Degree.

In most cases, they are tied to educational programs, and are therefore quite distinct from certification or licensure, which are assessment-focused.

Why Microcredentials?microcredentials-degree-online

Microcredentials have become increasingly popular in recent years because they offer several benefits to both learners and employers. For learners, they provide a more flexible and cost-effective way to gain new skills or upgrade existing ones without committing to a full-time degree program. Additionally, microcredentials allow learners to demonstrate their competency in a specific skill or knowledge area, which can help them stand out in a crowded job market.  Given that they are less expensive and short duration, learners can often receive “more bang for their buck” in terms of adding to their skillset and improving job prospects.

For employers, microcredentials offer a way to identify job candidates who possess the specific skills they need. By looking for candidates who have earned they in relevant areas, employers can quickly narrow down the pool of applicants and find the most qualified candidates. Additionally, they can be used by employers to upskill their existing workforce, helping employees stay current with the latest developments in their field.

How to Earn Microcredentials

Microcredentials can be earned in a variety of ways, including online courses, workshops, boot camps, and other short-term training programs. These programs typically focus on a specific topic, such as data analytics, project management, or digital marketing. To earn a microcredential, learners must complete a series of assessments or projects that demonstrate their mastery of the subject matter. Once earned, microcredentials are typically displayed as digital badges that can be shared on social media profiles, online resumes, or other platforms.

Types of Microcredentials

There are several types of microcredentials available, including skill-based, competency-based, and stackable credentials. Skill-based microcredentials focus on developing specific skills or knowledge areas, such as coding, graphic design, or language proficiency. Competency-based microcredentials, on the other hand, assess a learner’s ability to perform a specific task or set of tasks, such as managing a team or conducting market research. Finally, stackable microcredentials allow learners to build on existing credentials by earning additional microcredentials in related areas, creating a pathway to a full degree program.

Examples

Consider the field of marketing.  Traditionally, you might go to a university for a 4-year degree in Marketing, Business, or Communications.  This is a very broad approach, and of course takes 4 years (or more for some people).  Alternatively, there are now many options where you can get a microcredential focused specifically on Digital Marketing, a very in-demand skill set.  A great example of this is Oregon State University, but a quick googling will show you many more.  Some providers will even get more specific, such as Social Media Management, Search Engine Optimization, or even specifically WordPress.  But then these are typically branded as a Badge rather than a Nano-Degree or Microcredential.

Conclusion

In conclusion, microcredentials offer a flexible and cost-effective way to gain new skills or upgrade existing ones, making them an attractive option for both learners and employers. By focusing on specific skills or knowledge areas, they allow learners to demonstrate their competence in a particular field, while also providing a way for employers to identify the most qualified job candidates. As the job market continues to evolve, microcredentials are likely to become an increasingly important part of the educational landscape, providing learners with the tools they need to succeed in their careers.

The post Microcredentials: What are they? appeared first on Assessment Systems.

]]>
The Graded Response Model – Samejima (1969) https://assess.com/graded-response-model/ Wed, 10 May 2023 14:55:05 +0000 https://assess.com/?p=53999 Samejima’s (1969) Graded Response Model (GRM, sometimes SGRM) is an extension of the two parameter logistic model (2PL) within the item response theory (IRT) paradigm.  IRT provides a number of benefits over classical test theory, especially regarding the treatment of polytomous items; learn more about IRT vs. CTT here.   What is the Graded Response […]

The post The Graded Response Model – Samejima (1969) appeared first on Assessment Systems.

]]>
Samejima’s (1969) Graded Response Model (GRM, sometimes SGRM) is an extension of the two parameter logistic model (2PL) within the item response theory (IRT) paradigm.  IRT provides a number of benefits over classical test theory, especially regarding the treatment of polytomous items; learn more about IRT vs. CTT here.

 

What is the Graded Response Model?

GRM is a family of latent trait (latent trait is a variable that is not directly measurable, e.g. a person’s level of neurosis, conscientiousness or openness) mathematical models for grading responses that was developed by Fumiko Samejima (1969) and has been utilized widely since then. GRM is also known as Ordered Categorical Responses Model as it deals with ordered polytomous categories that can relate to both constructed-response or selected-response items where examinees are supposed to obtain various levels of scores like 0-4 points. In this case, the categories are as follows: 0, 1, 2, 3, and 4; and they are ordered. ‘Ordered’ means what it says, that there is a specific order or ranking of responses. ‘Polytomous’ means that the responses are divided into more than two categories, i.e., not just correct/incorrect or true/false.

 

When should I use the GRM?

This family of models is applicable when polytomous responses to an item can be classified into more than two ordered categories (something more than correct/incorrect), such as to represent different degrees of achievement in a solution to a problem or levels of agreement , a Likert scale, or frequency to a certain statement. GRM covers both homogeneous and heterogeneous cases, while the former implies that a discriminating power underlying a thinking process is constant throughout a range of attitude or reasoning.

Samejima (1997) highlights a reasonability of employing GRM in testing occasions when examinees are scored based on correctness (e.g., incorrect, partially correct, correct) or while measuring people’s attitudes and preferences, like in Likert-scale attitude surveys (e.g., strongly agree, agree, neutral, disagree, strongly disagree). For instance, GRM can be used in an extroversion scoring model considering “I like to go to parties” as a high difficulty construction, and “I like to go out for coffee with a close friend” as an easy one.emotion scale grm

Here are some examples of assessments where GRM is utilized:

  • Survey attitude questions using responses like ‘strongly disagree, disagree, neutral, agree, strongly agree’
  • Multiple response items, such as a list of 8 animals and student selects which 3 are reptiles
  • Drag and drop or other tech enhanced items with multiple points available
  • Letter grades assigned to an essay: A, B, C, D, and E
  • Essay responses graded on a 0-to-4 rubric

 

Why to use GRM?

There are three general goals of applying GRM:

  • estimating an ability level/latent trait
  • estimating an adequacy with which test questions measure an ability level/latent trait
  • evaluating a probability that a particular test domain will receive a specific score/grade for each question

Using item response theory in general (not just the GRM) provides a host of advantages.  It can help you validate the assessment.  Using the GRM can also enable adaptive testing.

 

How to calculate a response probability with the GRM?

There is a two-step process of calculating a probability that an examinee selects a certain category in a given question. The first step is to find a probability that an examinee with a definite ability level selects a category n or greater in a given question:

GRM formula1

where

1.7  is the scale factor

a  is the discrimination of the question

bm  is a probability of choosing category n or higher

e  is the constant that approximately equals to 2.718

Θ  is the ability level

P*m(Θ) = 1  if  m = 1  since a probability of replying in the lowest category or in all the major ones is a certain event

P*m(Θ) = 0  if  m = M + 1  since a probability of replying in a category following the largest is null.

 

The second step is to find a probability that an examinee responds in a given category:

GRM formula2

This formula describes the probability of choosing a specific response to the question for each level of the ability it measures.

 

How do I implement the GRM on my assessment?

You need item response theory software.  Start by downloading Xcalibre for free.  Below are outputs for two example items.

How to interpret this?  The GRM uses category response functions which show the probability of selecting a given response as a function of theta (trait or ability).  For item 6, we see that someone of theta -3.0 to -0.5 is very likely to select “2” on the Likert scale (or whatever our response is).  Examinees above -.05 are likely to select “3” on the scale.  But on Item 10, the green curve is low and not likely to be chosen at all; examinees from -2.0 to +2.0 are likely to select “3” on the Likert scale, and those above +2.0 are likely to select “4”.  Item 6 is relatively difficult, in a sense, because no one chose “4.”

Xcalibre - graded response model easyXcalibre - graded response model difficult

References

Keller, L. A. (2014). Item Response Theory Models for Polytomous Response Data. Wiley StatsRef: Statistics Reference Online.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded coress. Psychometrika monograph supplement17(4), 2. doi:10.1002/j.2333-8504.1968.tb00153.x.

Samejima, F. (1997). Graded response model. In W. J. van der Linden and R. K. Hambleton (Eds), Handbook of Modern Item Response Theory, (pp. 85–100). Springer-Verlag.

The post The Graded Response Model – Samejima (1969) appeared first on Assessment Systems.

]]>
What is automated essay scoring? https://assess.com/automated-essay-scoring/ Tue, 25 Apr 2023 19:16:53 +0000 https://assess.com/?p=40711 Automated essay scoring (AES) is an important application of machine learning and artificial intelligence to the field of psychometrics and assessment.  In fact, it’s been around far longer than “machine learning” and “artificial intelligence” have been buzzwords in the general public!  The field of psychometrics has been doing such groundbreaking work for decades. So how […]

The post What is automated essay scoring? appeared first on Assessment Systems.

]]>
Automated essay scoring (AES) is an important application of machine learning and artificial intelligence to the field of psychometrics and assessment.  In fact, it’s been around far longer than “machine learning” and “artificial intelligence” have been buzzwords in the general public!  The field of psychometrics has been doing such groundbreaking work for decades.

So how does AES work, and how can you apply it?

 

 

 

What is automated essay scoring?

The first and most critical thing to know is that there is not an algorithm that “reads” the student essays.  Instead, you need to train an algorithm.  That is, if you are a teacher and don’t want to grade your essays, you can’t just throw them in an essay scoring system.  You have to actually grade the essays (or at least a large sample of them) and then use that data to fit a machine learning algorithm.  Data scientists use the term train the model, which sounds complicated, but if you have ever done simple linear regression, you have experience with training models.

 

There are three steps for automated essay scoring:

  1. Establish your data set (collate student essays and grade them).
  2. Determine the features (predictor variables that you want to pick up on).
  3. Train the machine learning model.

 

Here’s an extremely oversimplified example:

  1. You have a set of 100 student essays, which you have scored on a scale of 0 to 5 points.
  2. The essay is on Napoleon Bonaparte, and you want students to know certain facts, so you want to give them “credit” in the model if they use words like: Corsica, Consul, Josephine, Emperor, Waterloo, Austerlitz, St. Helena.  You might also add other Features such as Word Count, number of grammar errors, number of spelling errors, etc.
  3. You create a map of which students used each of these words, as 0/1 indicator variables.  You can then fit a multiple regression with 7 predictor variables (did they use each of the 7 words) and the 5 point scale as your criterion variable.  You can then use this model to predict each student’s score from just their essay text.

 

Obviously, this example is too simple to be of use, but the same general idea is done with massive, complex studies.  The establishment of the core features (predictive variables) can be much more complex, and models are going to be much more complex than multiple regression (neural networks, random forests, support vector machines).

Here’s an example of the very start of a data matrix for features, from an actual student essay.  Imagine that you also have data on the final scores, 0 to 5 points.  You can see how this is then a regression situation.

Examinee Word Count i_have best_jump move and_that the_kids well
1 307 0 1 2 0 0 1
2 164 0 0 1 0 0 0
3 348 1 0 1 0 0 0
4 371 0 1 1 0 0 0
5 446 0 0 0 0 0 2
6 364 1 0 0 0 1 1

 

How do you score the essay?

If they are on paper, then automated essay scoring won’t work unless you have an extremely good software for character recognition that converts it to a digital database of text.  Most likely, you have delivered the exam as an online assessment and already have the database.  If so, your platform should include functionality to manage the scoring process, including multiple custom rubrics.  An example of our FastTest platform is provided below.

FastTest_essay-marking

Some rubrics you might use:

  • Grammar
  • Spelling
  • Content
  • Style
  • Supporting arguments
  • Organization
  • Vocabulary / word choice

 

How do you pick the Features?

This is one of the key research problems.  In some cases, it might be something similar to the Napoleon example.  Suppose you had a complex item on Accounting, where examinees review reports and spreadsheets and need to summarize a few key points.  You might pull out a few key terms as features (mortgage amortization) or numbers (2.375%) and consider them to be Features.  I saw a presentation at Innovations In Testing 2022 that did exactly this.  Think of them as where you are giving the students “points” for using those keywords, though because you are using complex machine learning models, it is not simply giving them a single unit point.  It’s contributing towards a regression-like model with a positive slope.

In other cases, you might not know.  Maybe it is an item on an English test being delivered to English language learners, and you ask them to write about what country they want to visit someday.  You have no idea what they will write about.  But what you can do is tell the algorithm to find the words or terms that are used most often, and try to predict the scores with that.  Maybe words like “jetlag” or “edification” show up in students that tend to get high scores, while words like “clubbing” or “someday” tend to be used by students with lower scores.  The AI might also pick up on spelling errors.  I worked as an essay scorer in grad school, and I can’t tell you how many times I saw kids use “ludacris” (name of an American rap artist) instead of “ludicrous” when trying to describe an argument.  They had literally never seen the word used or spelled correctly.  Maybe the AI model finds to give that a negative weight.   That’s the next section!

How do you train a model?

bart model train

Well, if you are familiar with data science, you know there are TONS of models, and many of them have a bunch of parameterization options.  This is where more research is required.  What model works the best on your particular essay, and doesn’t take 5 days to run on your data set?  That’s for you to figure out.  There is a trade-off between simplicity and accuracy.  Complex models might be accurate but take days to run.  A simpler model might take 2 hours but with a 5% drop in accuracy.  It’s up to you to evaluate.

If you have experience with Python and R, you know that there are many packages which provide this analysis out of the box – it is a matter of selecting a model that works.

How well does automated essay scoring work?

Well, as psychometricians love to say, “it depends.”  You need to do the model fitting research for each prompt and rubric.  It will work better for some than others.  The general consensus in research is that AES algorithms work as well as a second human, and therefore serve very well in that role.  But you shouldn’t use them as the only score; of course, that’s impossible in many cases.

Here’s a graph from some research we did on our algorithm, showing the correlation of human to AES.  The three lines are for the proportion of sample used in the training set; we saw decent results from only 10% in this case!  Some of the models correlated above 0.80 with humans, even though this is a small data set.   We found that the Cubist model took a fraction of the time needed by complex models like Neural Net or Random Forest; in this case it might be sufficiently powerful.

Automated essay scoring results

 

How can I implement automated essay scoring without writing code from scratch?

There are several products on the market.  Some are standalone, some are integrated with a human-based essay scoring platform.  ASC’s platform for automated essay scoring is SmartMarq; click here to learn more.  It is currently in a standalone approach like you see below, making it extremely easy to use.  It is also in the process of being integrated into our online assessment platform, alongside human scoring, to provide an efficient and easy way of obtaining a second or third rater for QA purposes.

Want to learn more?  Contact us to request a demonstration.

 

SmartMarq automated essay scoring

The post What is automated essay scoring? appeared first on Assessment Systems.

]]>
What ChatGPT brings us for item writing https://assess.com/chatgpt-item-writing/ Sun, 09 Apr 2023 02:50:41 +0000 https://assess.com/?p=53877 Even those ones who do not consider themselves technology buffs have heard about ChatGPT. Today, everybody is talking about ChatGPT—a modern megastar of artificial intelligence (AI) and machine learning (ML) industries. This is how ChatGPT introduces itself: “I am an AI language model called ChatGPT. My purpose is to assist and communicate with users through […]

The post What ChatGPT brings us for item writing appeared first on Assessment Systems.

]]>
Even those ones who do not consider themselves technology buffs have heard about ChatGPT. Today, everybody is talking about ChatGPT—a modern megastar of artificial intelligence (AI) and machine learning (ML) industries. This is how ChatGPT introduces itself:

“I am an AI language model called ChatGPT. My purpose is to assist and communicate with users through text-based conversations, and to provide helpful and informative responses to their questions and inquiries.”

I realized its sense straight away but still decided to interact with ChatGPT more closely while working on item writing for a Math test. I opened this link and initiated a new conversation—stupid simple. Below I am sharing my observations of ChatGPT features, classifying them as its strengths and imperfections.  I will then discuss my experience in item writing with ChatGPT.

 

Strengths of ChatGPT

Why be a psychometrician chatgpt item writing

ChatGPT is an intelligent chatbot that

  • is polite,
  • declines inadequate requests,
  • generates responses very quickly,
  • allows user to provide follow-up adjustments,
  • remembers what user said earlier in the conversation,
  • can significantly reduce time for item writing to subject matter experts.

 

Imperfections of ChatGPT

General ones

  • Free version of ChatGPT has a restricted number of symbols produced (around 2,000). Therefore, if you need to construct a test, like me, you need to task the chatbot in portions or keep asking it to finalize the assignment that you have given. Thus, your interaction with ChatGPT will be more efficient if you plan your assignment better.
  • ChatGPT has limited knowledge of the world after 2021. You need to consider this fact when designing a prompt.
  • ChatGPT can generate inaccurate responses to questions. You may find several examples below.

 

Math Item Writing

ChatGPT’s Math skills are not excellent. Below I have outlined some bright examples:

  • ChatGPT “thinks” that 11/2 equals to 2 and 6/5 equals to 1;
  • ChatGPT does not realize that one root cannot be a solution of a quadratic equation that has two different roots while it lists both roots among answer options;
  • ChatGPT makes mistakes when naming a correct response out of a few options generated;
  • ChatGPT makes distractors (incorrect answers in multiple choice questions) up without considering possible mistakes that test takers can make;
  • ChatGPT provides a correct solution to a question but does not include it into answer options;
  • ChatGPT cannot differentiate between levels of difficulty even if it knows the sample from the prompt.

 

Let’s have a look at a couple of Math questions written by ChatGPT according to my prompts.

Prompt

Response

Comments

ChatGPT, please, write 1 multiple-choice Math question of low difficulty for 16-17 year old high school American students for the following topic in Algebra: Linear equations in two variables. Multiple-choice questions must have only one correct answer (key) and three incorrect answers (distractors) based on test takers’ incorrect reasoning. Please, provide an explanation to each distractor and a solution to the question.

 

Which point satisfies the equation y = -2x + 7?

A) (5, 1)

B) (-1, 5)

C) (2, 5)

D) (1, -5)

 

Explanation:

To find a point that satisfies the equation, we can plug in values for x and y that make the equation true. For example, if we let x = 1, then y = -2(1) + 7 = 5. So, the point (1, 5) satisfies the equation.

ChatGPT did not include a correct response into answer options even though it solved this question correctly.
ChatGPT, please, write 1 student-produced response Math question of medium difficulty for 16-17 year old high school American students for the following topic in Algebra: Systems of two linear equations in two variables. Student-produced response questions must have only one correct answer as a real number. Please, provide a solution to this question. Solve the system of equations:

2x + 3y = 8

x – y = 1.

 

Explanation: To solve the system of equations, we can use either substitution or elimination method. Using substitution, we can solve one equation for one variable and substitute it into the other equation. For example, we can solve the second equation for x and get x = y + 1. Then, we substitute x = y + 1 into the first equation and get 2(y + 1) + 3y = 8. Simplifying this equation gives us 5y + 2 = 8, which gives us y = 1. Substituting y = 1 into x = y + 1 gives us x = 2. Therefore, the solution to the system of equations is (2, 1).

Solving 5y + 2 = 8 does not give us y = 1 but y = 6/5 or 1.2. Therefore, the point (2, 1) is not a correct solution of this system of equations.

 

English Item Writing

ChatGPT’s English skills are not as perfect as I thought they would be since it is an English native speaker (I suppose). I have had some experience with ChatGPT making it design several tasks to test students’ English skills. As expected, ChatGPT is more productive with chunks of text. I asked it to make some Reading and Writing questions for SAT and IELTS, and the results were much better than those for Math. However, there were some shortcomings in terms of grammar errors and poor lexical choice. For instance, ChatGPT suggested option A as a correct response that is arguable:

Which of the following options best replaces the underlined phrase in the sentence: “The company’s profits have increased by 20% over the last quarter.”

A) The company’s profits have risen by 20% over the last quarter.

B) The company’s profits have grown by 20% over the last quarter.

C) The company’s profits have expanded by 20% over the last quarter.

D) The company’s profits have developed by 20% over the last quarter.

 

Russian Item Writing

I was curious how proficient ChatGPT was in Russian. Well, it knows how many letters the Russian alphabet contains and some other facts, but when I asked ChatGPT to write a question to check some grammar, my expectations were betrayed: ChatGPT gave a wrong answer when asked to identify a gerund.

Какое из перечисленных слов является деепричастием?

A) Зеленеющий

B) Шумно

C) Летящий

D) Бежавший

Правильный ответ: C) Летящий – это деепричастие настоящего времени. Ошибочные ответы: A) – прилагательное, B) – наречие, D) – причастие прошедшего времени.

 

Summary

 

Since AI-enabled tools do not have real brains and only regurgitate information based on what they “learned” by interacting with billions of data points, it is fair enough that there will be some inaccuracies in their responses even though they will look human-like.

In its own introduction, ChatGPT announced straight away that it was intended to ASSIST us, humans, not to DO things instead of us. ChatGPT can provide helpful responses but is not capable of independent thought or emotions. Like any AI, ChatGPT is vulnerable to multiple issues, including bias and potential inaccuracies.

I would like to re-emphasize once again my point that I outlined in the previous post, that any AI-powered tool no matter how “good” it is needs a smart user to manipulate it. Otherwise, it is going to be a complete disaster! However, if users provide ChatGPT with a very well structured question and the right guidance, there will be a high chance of obtaining an accurate response.

One more thing to consider is that Math is a way of thinking. Therefore, I would not expect AI-powered chatbots to be super-efficient in this field, at least in the near future.

Another important consideration is the legal situation. Current opinion in the field is this: Copyright laws are designed to protect creative development of new things by humans, but if ChatGPT is doing the heavy lifting, then it is not human-created and therefore not fully protected by copyright law. If the items are stolen, you might not have legal recourse. However, this topic is, of course, quite new and continues to evolve.

In the end, is it worth it to use ChatGPT for item writing? Absolutely. Even though there are errors which are quickly found, such as having two correct answers to a question or not having a key specified, these sorts of things are easily fixed. The average time to develop new items can be significantly reduced. Moreover, ChatGPT will continue to get even more powerful! So, stay tuned!

 

The post What ChatGPT brings us for item writing appeared first on Assessment Systems.

]]>