Test Validity

Test validation studies or test validity for a pre-employment assessment is only an objective measure that provides evidence that the test or personality assessment actually measures what it purports to measure.  Validation is not a stamp of approval by any governmental agency but rather a study undertaken and directed by the test publisher in accordance with certain professional standards.

The Achiever employment assessment has been established and validated in accordance with the procedures described in "Standards for Educational and Psychological Testing," which is referred to in paragraph (2) 1607.6, "Minimum Standards for Evaluation," Federal Register Volume 35, dated Saturday, August 1, 1970.  It is therefore not discriminatory and is in compliance with E.E.O.C. and other Federal Regulations.

The Reliability and Validity Manual published by Candidate Resources, Inc., establishes the legal and written confirmation that this employment test was professionally developed and validated in accordance with both Construct and Criterion methods of validation.  Candidate Resources, Inc., will defend the validation or content of the Achiever for any company using this pre-employment assessment, but cannot assist any company as a result of the misuse or abuse of the Achiever.

There are five forms of validity:

  • Construct validity refers to the extent in which dimensions with similar names on different tests relate to one another.  Two things that correlate highly on a personality test are not necessarily identical, but do provide reassurance that they are related and are a "construct" or part of the makeup (like honesty, dependability, sociability, etc.) of an individual as related to actual job performance.

  • Concurrent validity is that approach whereby people who are successful within a given job within the same company or industry are evaluated and generally grouped Top Third, Middle Third, and Bottom Third.  The assessment scores of the people who fit each of these ranges are then compiled and Job Benchmark Standards of the Top Third are used to hire, train or manage.

  • Predictive validity, sometimes called criterion validity, occurs when the employer hires people for a job based on normal hiring procedures (interviewing, reference checks, education/experience, etc.) and at the same time has them complete the pre-employment test, but does not utilize any data from it in the hiring decision.  Within six months, or any appropriate period of time later, the pre-employment assessment is scored and benchmarks are established of the people who were hired in the new jobs who are still with the employer and whom the employer considers successful.  Job Benchmark Standards are thus established through the Predictive approach.

  • Content validity represents job function testing, i.e., typing, mathematics, design, CPA exams, physical work endurance, etc.  Content validity is an appropriate strategy when the job domain is defined through job analysis by identifying the important behaviors, tasks, or knowledge and the assessment or test is a representative sample of behaviors, tasks or knowledge drawn from that domain.  The Uniform Guidelines on Employee Selection Procedures state that in order to demonstrate the content validity of a selection procedure, a user should show that the behaviors demonstrated in the selection procedure are a representative sample of the behaviors of the job in question or that the selection procedure provides a representative sample of the work product of the job.

  • Face validity is the simplest form of validity which basically tells us that the personality test or other assessment instrument appears (on the face of it) to measure what it is supposed to measure.  Simply put, a test that would be composed of accounting problems would have face validity as a measure of the ability to succeed as an accountant.  Face validity is not very sophisticated because it is only based on the appearance of the measure.  Be careful because the market is flooded with personality testing that has only face validity.

Saterfiel & Associates recommends that an organization establish and utilize a consistent standard hiring process when making hiring decisions.  Information should be gathered in each step of the standard hiring process to have specific and measurable data to utilize in making a final hiring decision.  The pre-employment assessment used should count for no more than one-third of the hiring decisions.  The preliminary interview, job history check, in-depth interview results and evaluation of education, experience and other pertinent factors should be considered as well.

Under the Uniform Guidelines on Employee Selection Procedures, adopted in the 1970's, validation of any part of the hiring process (assessments included) was no longer deemed necessary unless a company was not meeting the 4/5ths Rule in either hiring or promotional practices.  Consequently, there are three optional approaches to using assessments:

  1. Establish your own successful employee Job Benchmark Standards by conducting a concurrent validation by job classification.  By tying job-related criteria to the aptitudes and personality dimensions of the assessment, the ultimate in validation and job relativity is assured.  Also, the Job Benchmark Standards simplify the interpretation and use of the pre-employment assessment in the hiring process, since it establishes a model for hiring, promotion and training purposes.

  2. Establish Job Benchmark Standards by job classification by answering job-related questions on the requirements of the job.

  3. Use Job Benchmark Standards comprised of successful people in jobs across the United States.  Then, after a reasonable period of time, compare the successful people selected to the Benchmark Standards used for that job for confirmation of correctness and/or modification of the benchmark standards.

The in-depth validation identified above is not necessary if you are in compliance with the 4/5ths Rule described below.  This rule was designated by the E.E.O.C. as a computation tool to establish a basis to show whether or not a company is having an adverse impact in their hiring practices.

EXAMPLE: Out of 120 job applicants (comprised of 80 white and 40 minority), 48 whites were hired and 12 minorities were hired.

48 out of 80 white applicants = 60%
12 out of 40 minority applicants = 30%

This hiring pattern results in adverse selection of minorities, since 1/2 as many minorities are hired as whites (or 30/60), whereas the hiring ratio must equal 4/5th as many minorities as whites.

Return to Top

Do validity studies guarantee accuracy?

No, they do not.  Validity and reliability go hand in hand.  I have taken a number of assessments with varied results.  Many were very far off target but all of them were supposedly validated instruments.  Let's take a look at how this frequently happens.

Let's say that a company has designed a test that measures communication styles and that the personality assessment is very effective.  The validation studies for any assessment instrument are only an objective measure that evidences that the test actually measures what it purports to measure, and in this particular case it is communication styles.

Let's say that this particular personality test is later given certain external modifications so that it can also be sold as a pre-employment assessment.  The personality test is still backed by validity studies, but unless new validity studies are done, there are no validity studies to support the use of the assessment for its intended purpose as a pre-employment assessment.  In this example, the intended use is quite clear (to measure communication styles).

The Uniform Guidelines on Employee Selection Procedures specifically state that the evidence of validity and utility of the selection procedure must support its operational use.

Now let's take another look at where validity studies can be very misleading.  All pre-employment assessments that measure behaviors are based on certain theoretical models.  Some of those models are very simplistic because they are used more for training purposes than anything else.  Common sense tells us that human behavior is actually very complex but for training type applications we need to keep things simple.

If we look at the interpretative manual for one of these personality assessments, we will find out a little information about the behavioral model that was used.  "People that score high in dominance are often very ingenious, highly competitive and are generally very rigid in their thinking, extremely planful and have strong ethical standards.  Such people are often tenacious, tough-minded types that lack empathy and are often uncooperative."  Could such a test be validated in relation to the theoretical model?  Yes it could.

This personality assessment may prove very effective in training situations but its limitations are obvious when applied to a pre-employment assessment context.  If people that are high in dominance are very ingenious, then we would also have to assume that submissive individuals would be found to be lacking in mental ability.  From a practical standpoint, we know that there are no strong correlations between cognitive ability and dominance.  We also know that there are highly dominant individuals that have low ethical standards and submissive individuals that have high ethical standards.  From a practical standpoint we can also say that highly dominant people are not necessarily tough-minded, competitive or planful.

I have seen a similar personality test used in a number of pre-employment situations and I can tell you that the results are often very misleading.  In one situation, the personality assessment results were indicating that all of a company's employees were highly dominant.  By watching the behaviors and listening to those employees, I could tell that at least 50% of them were actually very low in dominance.  The amusing part was that those employees were all being put through training designed to try to lessen the negative effects of their supposedly high assertiveness.  From what I could observe, it would have been more effective to put them through assertiveness training!

I was able to test those same employees a short time later with a validated pre-employment assessment (The Scoreboard) and the assessment results confirmed my observations.   Very few of the employees were high in dominance.  They mostly scored low to mid-range.  There are obvious disadvantages to simplistic behavioral models in that a group of separate behaviors are clumped together.  In this case the personality test was not measuring true dominance.  It relied heavily on ethics and competitiveness to measure dominance.  In the preceding situation, it turned out that all of the employees in that particular job had very high character strength or ethics.  What was actually being measured was ethics (flexibility) and not dominance.

Return to Top

How valid is the validity study?

The fact that a personality test is backed by validity studies means very little in itself.  Some of the validation techniques are quite weak.  Some personality assessments are very simple in that they will contain a listing of certain descriptors (such as friendly, outgoing, agreeable, competitive) and will ask the respondent to circle each descriptor that describes themselves.  The effectiveness of such a technique is obviously of limited value, but the validity studies may well indicate that the test is 90% accurate and reliable.  How is that possible?

Actually the whole process can be very simplistic so let's take a look at the total process (totally hypothetical, of course).  The candidate is given a test and instructed to circle a list of descriptors that he feels accurately describes himself.  The testing company takes those descriptors and expands on their definitions and then gives the report back to the candidate.  On the last page of the report is a questionnaire sheet that asks the candidate to rate the accuracy of the personality test report and mail it back to the test publisher.  The candidate circles the percentile scoring range that is applicable (90 to 100%, 80 to 89% and so on).

Since this is basically a self-evaluation where the candidate has described himself, how likely is he to say that the resulting test report is less than 90% accurate, especially if the personality report only makes positive statements about him?  All of the responses that are received are then entered into a database that is used as an ongoing validity study.  The main advantage of this hypothetical personality test is that on the surface it is fast and cheap.  While I would question the overall effectiveness of such a program, it could offer a few advantages.  It would probably be a little more accurate and objective in most cases than when an interviewer directs the applicant to "tell me a little bit about yourself."  It's certainly quicker.  What I would have to question in relation to such a test would be whether or not the validity studies would meet the requirements of the "Uniform Guidelines On Employee Selection Procedures" as they pertain to the professional standards for validity studies.

Return to Top

Was the Test Professionally Developed?

Validity studies are not really that comprehensible unless you have a good solid background in statistics.  If you are anything like most people, you are probably suspicious of statistics to begin with.  Start with one simple question.  Were the procedures used in validation consistent with generally accepted professional standards such as those described in the "Standards for Educational and Psychological Tests?"  A reputable test publisher will generally make such a statement somewhere in its brochures or validity manuals.

Secondly, you should be aware of one very important fact.  Just because a testing instrument was written by someone with a PhD, does not necessarily mean that the instrument was professionally developed or that it will meet the generally accepted professional standards that have been previously referenced.  Be cautious of any personality test that claims to have been written by a professional, and then immediately tries to lead you to the conclusion that it was professionally developed without referencing any validation or reliability studies.  The two concepts do not necessarily go hand in hand.

I once visited a web site that used this tactic very effectively.  It then followed up with a very long article under the heading of Validity.  After endless scrolling through the long article, it concluded without ever mentioning validity, except in the title.  Some people are slicker than cow guts on a door knob!


Return to Top