How can you measure validity




















Thus, underfit should receive more attention in the evaluation of the items. Differential item functioning means that the pattern of response probabilities for some items differs between groups of participants. For example, gender-related DIF would mean that men are more likely to agree to some items of a scale than women. If that were the case, the scale as a whole would be measuring a somewhat different construct for men than for women.

To assess DIF, the fit of two models was compared by means of the BIC: a main-effect model, which allows only for a main effect of the DIF variable across all items, and an interaction model, which additionally includes an interaction between the DIF variable and the items.

If the interaction model fits significantly better than the main-effect model, there is a significant amount of DIF, that is, the patterns of item difficulties vary between the levels of the DIF variable. Descriptive statistics M, SD for each item were calculated of each dimension separately. The item parameters of all items based on final scale assignment and item intercorrelations are reported in the Appendix in the Supplementary Material to this article.

The person-item-map in Figure 2 shows that the item parameters mostly covered the left-hand side of the middle range of the ability parameter distribution. For the ASTI to also differentiate well among high-scoring individuals, more items should be constructed that participants are less likely to agree with.

Performance measures of wisdom, such as the Berlin Wisdom Paradigm Baltes and Staudinger, tend to produce far lower average levels of wisdom than self-report measures do. Next, four different models were estimated and compared by means of the BIC.

Furthermore, the comparison between five-dimensional and one-dimensional models suggested that the five-dimensional models generally fit the data better than the unidimensional ones. This may be due to the relatively low variance in the item responses.

The latent correlations supported the assumption of a five-dimensional structure of the ASTI. Self-knowledge and integration, peace of mind, and presence in the here-and-now and growth were quite highly correlated, which may suggest that they all represent an accepting and appreciative stance toward oneself and the experiences of one's life. Non-attachment and self-transcendence seem to be less closely related to the others except for the correlation between non-attachment and peace of mind , possibly because they both, although in different ways, represent the individual's relationship with the external world: non-attachment describes an independence from other people and material things, and self-transcendence represents a connectedness with others and the world at large.

Both may not be part of everyone's experience of inner peace. Table 5. Next, we assessed the items of each dimension separately. In general, the infit and outfit statistics showed no misfit of items see Table 6. Because of the complexity of analyses, the following results are reported for each dimension separately.

Log likelihoods for both models are also reported, although likelihood ratio tests are likely to be somewhat oversensitive due to the large sample size. Table 6. The score curves suggest that, generally, the observed slopes were steeper than expected; the observed slope of item 10 also showed small deviations from the expected slope see Figure 3. Therefore, the PCM was considered to fit the scale sufficiently well when item 10 was excluded. As explained earlier, DIF was assessed with respect to gender, age, and professional group.

However, the model comparisons in Table 7 indicated DIF for age and group. Note that Item 10 had not received an unequivocal assignment by the experts either see Table 3A. However, the magnitude of DIF was small and could therefore be ignored. When the analyses were repeated excluding item 10, the PCM fit the data well and there was no considerable DIF for any item.

Thus, the PCM was preferred. It is somewhat unclear, however, what causes the difference in fit, as the two examples of score curves in Figure 3 represent the general result for all items of this scale, indicating no substantial underfit or overfit.

It seems important to reanalyze the self-transcendence scale with new data. As Tables 6 , 7 show, no substantial DIF was found for this subscale.

The score curves for an example see Figure 3 , left below showed that the observed slopes were slightly higher than the expected slopes. It was also the only negative item see Table 3C in the subdimension. This subscale should also be reanalyzed once new data are available.

A re-analysis without item 14 showed that the PCM fit the data well. In the following, we first discuss the methodological implications of our research, and then, its substantive implications concerning the use of the ASTI to measure wisdom. This paper introduced the CSS procedure for evaluating content validity and discussed its advantages for the theory-based evaluation of scale items. In our experience, the method provides highly interesting practical and theoretical insights into target constructs.

It does not only allow for evaluating and validating existing instruments and for improving the operationalization of a target construct, but it also offers advantages for constructing new items for existing instruments or even for developing whole new instruments. The procedure can be applied in all subdisciplines of psychology and other fields, wherever the goal is to measure specific constructs. In addition, it does not matter which kinds of items e. The in-depth examination of the target construct is likely to increase the validity of any assessment.

We propose to follow certain quality criteria in studies using our approach. First, to optimize replicability, all steps should be carefully documented. A detailed documentation of procedures increases the validity of the study, irrespective of whether the data collection is more quantitative as in the present study or more qualitative e.

Second, the selection of experts is obviously crucial. Objectivity may be compromised if the group of experts is too homogeneous e. The instructions that the experts receive also need to be carefully written so as to avoid inducing any biases. Third, it is important that the expert judgments are complemented by actual data collected from a sample representative of the actual target population.

Our experience is that the data are often astonishingly consistent with the expert ratings; however, experts may also be wrong occasionally, for example, if they assume more complex interpretations of item content than the actual participants use. As we have demonstrated here, item response models may be particularly suited for testing hypotheses about individual items, but factor-analytic approaches are also very useful for testing hypotheses about the structural relationships between subscales.

For example, it would be worthwhile to test the current data for a bi-factor structure, i. Next steps in our work will include the comparison of these different methods of data analysis.

Another important future goal is the definition of a quantitative content-validity index based on the current method. In addition to utilizing the ASTI to demonstrate our approach, we believe that we have gained important insights about the ASTI, as well as about self-transcendence in general, from this study.

Through the exercise of assigning and reassigning the items to the dimensions of the construct and discussing the contradictions and difficulties we encountered, we gained a far deeper understanding of the measured itself.

Some of the ASTI items nicely evade this problem by being difficult to understand for individuals who have not achieved the respective levels of self-transcendence. The positive German version of this item had the lowest mean, i. It may be worthwhile to try to construct more items of this kind. For now, we have identified five subdimensions that include the 24 positive items in German, 25 of the ASTI.

The 10 negative items measuring alienation were not included in this analysis, as negative items tend to be difficult to assign to the same dimension as positive items. We recommend to leave them in the questionnaire in order to increase the range of item content, but to exclude them from score computations.

In further applications of the ASTI, should the five subdimensions be scored separately or should the total score be used?

Strong advocates of the Rasch model would certainly argue that using the total score across the subdimensions amounts to mixing apples and oranges. However, other self-report scales of wisdom such as the 3D-WS Ardelt, or the SAWS Webster, measure several dimensions of wisdom that are conceptually and empirically related to about the same degree as the subdimensions of the ASTI we have identified here.

Both these authors suggest to use the mean across the subdimensions as an indicator of wisdom and to consider only individuals as wise who have a high mean, i.

The same may be a good idea here: for an individual to be considered as highly wise in the sense of self-transcendence , he or she would need to have high scores in all five subdimensions, as all of them are considered as relevant components of wisdom. For individuals with lower means, we recommend to consider their profile across the subdimensions rather than compute a single score.

The subdimensions are ordered so as to represent a possible developmental order as suggested by Levenson et al. It is important to note that in addition to producing valid and reliable subdimensions, the CSS procedure has also led us to conceptually redefine some of the subdimensions so as to better differentiate them for example, independence of external sources of well-being was originally included in the definitions of both non-attachment and self-transcendence.

We first give definitions for all subdimensions and then discuss their relationships to each other and to age and gender. The first subdimension includes items that were originally intended to measure Curnow's separate dimensions of self-knowledge and integration. It includes items that refer to broad and deep knowledge about as well as acceptance of all aspects of one's own self, including ambivalent or undesirable ones.

Thus, the distinction between being aware of certain aspects of the self and accepting them was not supported empirically. The idea that self-knowledge and the acceptance of all aspects of the self is key to wisdom can be found in Erikson's idea of integrity, i.

Individuals high in this dimension of the ASTI are aware of the different, sometimes contradictory, facets of their self and their life, and they are able to accept all sides of their personality and integrate the different facets of their life. Therefore, it seems advisable to add new items that refer to self-knowledge as well as items that differentiate between different kinds of integration e.

With a higher number of items, the distinction between knowing and accepting aspects of one's self might also receive more empirical support. Non-attachment describes an individual's awareness of the fundamental independence of his or her internal self of external possessions or evaluations: non-attached individuals' self-esteem is not dependent on how others think about them or how many friends they have.

The scale comprises four items concerning the individual's independence of external things, such as other people's opinions, a busy social life, or material possessions. It is important to note that non-attachment does not mean that people are not committed to others or to important issues in their current world; the main point is that they do not depend on external sources for self-enhancement.

The fact that they are not affected by other people's judgments enables them to lead the life that is right for them and accept others non-judgmentally.

Like other ideas originating from Buddhism, non-attachment as a path to mental health is currently receiving some attention in clinical psychology Shonin et al. Individuals high in this dimension, which was not part of Curnow's original conception, are able to live in the moment and enjoy the good times in their life without clinging to them, because they know that everything changes and that change may also foster growth.

The items of this subdimension describe individuals who are able to live life mindfully in any given moment: they find joy in their life and in what they are doing. They are aware that things are always changing, oriented toward learning from others, and aware that they have grown through losses, and they have accepted the finitude of life. In a different study, we have found that many wisdom nominees report gratitude for the difficult experiences of their lives, i.

This subdimension of the ASTI describes individuals who are able to maintain their tranquility in situations where others would get angry or upset, and are at peace with the fundamental impermanence of things in life.

Highly self-transcendent individuals feel that the boundaries between them and others, even humanity at large, are permeable.

They feel related to past and future generations, all human beings, and nature. As they do not need to utilize social relationships to enhance their sense of self, they are able to love and accept other individuals as they are. As Levenson et al. Tornstam, There were relatively high latent correlations around 0.

For some purposes, it may make sense to average across these three subdimensions, as their discriminant validity may be limited. Therefore, we recommend to treat them separately for most research purposes. Non-attachment and self-transcendence were less closely related to the others and to each other, perhaps because they represent two important and somewhat contrary aspects of wise individuals' relationship with the external world: independence of one's self from external sources and a deep connectedness to others and the world.

Our findings suggest that each of these states can exist without the other, and both can be present in an individual without the peace of mind that comes with self-integration and living in the present. A truly wise individual, however, would show high levels of all of these aspects.

Comparing the two age groups, we found meaningful differences for two of the five dimensions. People older than 31 achieved higher scores in non-attachment and self-transcendence than adolescents and young adults. As has been shown for cognitive aspects of wisdom Pasupathi et al.

The other three subdimensions, which represent an appreciative and accepting stance toward life, do not seem to be dependent on age. Gender differences were found, interestingly, for four of the five subdimensions.

Men had higher scores than women in self-knowledge and integration. This finding may suggest that men indeed know and accept themselves more than women do or that women actually tend to be more self-reflective and self-critical. In any case, the effect was small and needs further investigation.

In the subdimensions peace of mind, non-attachment, and self-transcendence, women scored higher than men. These findings may, however, be partly determined by societal expectations for women to be less self-centered and more caring than men, which does not necessarily imply true self-transcendence.

Thus, the limitations of self-report measures remain somewhat present even in carefully constructed scales like the ASTI.

In sum, we suggest that researchers using the ASTI may gain significant information if they use separate scores for the subdimensions we have identified in addition to, or instead of, the total score. The self-transcendence subdimension may be the purest indicator of actual self-transcendence. Whether the other subdimensions represent important preconditions, correlates, or even outcomes of self-transcendence is largely an empirical issue to be addressed in the future, which may tell us more about the development of wisdom.

No formal approval was applied for as the guidelines of the local Ethics Committee specify that the type of survey study we performed does not require such approval. All participants filled out an informed-consent form and agreed that their data are used for scientific purposes. No vulnerable populations were involved in this study. All three authors meet the four criteria for authorship required in the author guidelines.

Each author's main tasks were as follows. IK: Development and application of the CSS procedure, expert in the first part of study, data analyses, writing the paper. JG: Discussion partner for the CSS procedure, expert in the first part of study, writing the parts concerning the topic of wisdom background and results , editing of the manuscript.

ML: Construction and provision of the revised and as yet unpublished ASTI, discussion of the translation of the items, expert in the first part of the study. P, PI: JG. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Adams, R. The multidimensional random coefficients multinomial logit model. Google Scholar. Aldwin, C. Gender and wisdom: a brief overview. I just made this one up today!

See how easy it is to be a methodologist? I needed a term that described what both face and content validity are getting at. In essence, both of those validity types are attempting to assess the degree to which you accurately translated your construct into the operationalization, and hence the choice of name.

This is probably the weakest way to try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is a good measure of math ability i.

We need to rely on our subjective judgment throughout the research process. We can improve the quality of face validity assessment considerably by making it more systematic. For instance, if you are trying to assess the face validity of a math ability measure, it would be more convincing if you sent the test to a carefully selected sample of experts on math ability testing and they all reported back with the judgment that your measure appears to be a good measure of math ability.

In content validity , you essentially check the operationalization against the relevant content domain for the construct. Then, armed with these criteria, we could use them as a type of checklist when examining our program.

But for other constructs e. In criteria-related validity , you check the performance of your operationalization against some criterion. How is this different from content validity? In content validity, the criteria are the construct definition itself — it is a direct comparison. Washington, DC: Authors. Bond, T. Applying the Rasch model: Fundamental measurement in the human sciences.

Mahwah, NJ: Lawrence Erlbaum. Cronbach, L. Essentials of psychological testing. Carmines, E. Reliability and Validity Assessment. Messick, S. Using and developing measurement instruments in science education: A Rasch modeling approach.

Instead, they collect data to demonstrate that they work. If their research does not demonstrate that a measure works, they stop using it. As an informal example, imagine that you have been dieting for a month. Your clothes seem to be fitting more loosely, and several friends have asked if you have lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds, this would make sense and you would continue to use the scale.

But if it indicated that you had gained 10 pounds, you would rightly conclude that it was broken and either fix it or get rid of it. In evaluating a measurement method, psychologists consider two general dimensions: reliability and validity. Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time test-retest reliability , across items internal consistency , and across different researchers inter-rater reliability.

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week.

This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at test-retest correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient. Figure 4. Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions.

But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. This is as true for behavioral and physiological measures as for self-report measures.



0コメント

  • 1000 / 1000