Conners 3-Parent (Short): Measurement Invariance Across Gender, Concurrent and Discriminant Validities

The study examined measurement invariance (configural, factor loadings, thresholds, and error variances), and equivalencies of latent mean scores of the Conners 3-Parent (Short); (C 3-P (S)) across maternal ratings of clinic-referred boys (N = 354) and girls (N = 151), aged 7 to 17 years. It also examined the concurrent and discriminant validities of the scores for the C 3-P (S). Confirmatory factor analysis (CFA) indicated support for the theorized six-factor model. For this model, there was support for full measurement invariance and equivalencies for the latent mean scores. There was also support for the concurrent and discriminant validities of the scores for the C 3-P (S) scales. The findings are discussed in relation to the use of the C 3-P (S).


Introduction
The Conners 3-Parent Short (C 3-P (S)) [1] is used as a quick measure for facilitating the diagnosis of Attention Deficit/ Hyperactivity Disorder (ADHD) and the more common disorders [in particular Learning Disorder (LD), Conduct Disorder (CD), and Oppositional Defiant Disorder (ODD)) that are comorbid with ADHD in children between 6 and 18 years of age. The C 3-P (S) has a mixture of content and validity scales. The content scales are inattention (IN, 5 items,), hyperactivity/impulsivity (HY, 6 items), learning problems (LP, 5 items), executive functioning (EF, 5 items), aggression (AG, 5 items) and peer relations (PR, 5 items). The present study extended exiting psychometric data for this measure. It used confirmatory factor analysis (CFA) to examine measurement invariance, and equivalencies of the latent factors mean scores across gender, and how the six scales (factors) in the C 3-P (S) were associated with common DSM-IV externalizing and internalizing childhood disorders.
As reported in the Conners 3 (C 3) manual [1], initial validation of the C 3-P (S) using CFA of the items for only the content scales found support for the theorized six-factor oblique model. According to the C 3 manual, the ratings of the C 3-P (S) items are associated with age, gender and race/ethnicity. For gender, the scores for IA, HY and EF are higher for boys than girls.
Consequently, separate normative scores have been provided for boys and girls. However, when providing these scores, it was not established if there is measurement invariance across ratings for boys and girls. This is a serious omission that could compromise the use of these normative scores, as explained next.
Measurement invariance refers to groups reporting the same observed scores when they have the same level of the underlying trait [2]. Invariance would mean that for the groups being compared, the measure in question is using the same measurement and scaling properties. If there is weak or no support for invariance, then it follows that the groups in question cannot be justifiably compared in terms of observed scores as the same observed scores for the groups do not reflect the same levels of the underlying trait. When applied to the C 3-P (S), the absence of gender measurement invariance would mean that we cannot be confident in the use of normative scores provided in the C 3 manual.
Multiple-group CFA is a powerful method for testing measurement invariance [3]. This procedure can test for configural invariance, metric invariance (equal item factor loadings), scaler invariance (equal item intercepts and thresholds for continuous and categorical responses, respectively), and error variances invariance. Support for configural invariance indicates that the same number of factors and the same patterns of free and fixed parameters hold across groups. Support for metric invariance indicates that the strength of the relationships between the items and their respective factors are equivalent across groups, and that across the groups, the items are measuring their relevant latent factors using the same metric scales. Support for scalar invariance indicates that for the same levels of the latent trait, individuals across the groups will endorse the same observed scores or response categories. Support for error variances invariance indicates equal uniqueness for like items across the groups compared. Metric, scalar and error variances invariance are alternatively referred to as weak, strong and strict invariance [4]. When there is some support for measurement invariance, equivalency for latent factor mean scores can be examined, taking into account the non-invariance in the measurement model.
There are reasons to suspect that there could be lack of measurement invariance across ratings of boys and girls for some of the C 3-P (S) items. There is evidence that developmentally boys show more externalizing and disruptive behaviors than girls [5]. Thus, it can be speculated that parents would generally conceive and expect externalizing and disruptive behaviors to be associated more with boys than with girls [6]. Such expectations could in turn led parents to be more noticeable of and less tolerant of such behaviours among girls than boys. If so, parents are likely to over-report the same levels of severity of externalizing behaviors in girls. Viewed from a measurement invariable viewpoint, this could mean lack of scalar invariance. The same processes could also bias the reporting of other behaviors that could be perceived by parents to be less characteristic of girls than boys, such as academic (in particular, arithmetic) and cognitive abilities. Thus, considering the content items in the C 3-P (S), it can be speculated that there could be lack of scalar invariance across gender for some of its items, especially those in the HY, LP, EF and AG scales.
Another important psychometric property of a clinical measure is discriminative validity or the ability of the scales to identify clinical disorders that the scales were developed to capture. In terms of the C 3-P (S), this would mean especially the ability of the IN/HY, AG and LD scale scores to distinguish those with ADHD, ODD/CD, and LD, respectively, from other clinical disordered and general population groups (for instance, ability of the IN and HY scales to distinguish those with and without ADHD). Consistent with this, the C 3 manual [1] has reported that for the IN and HY scales, an ADHD group had higher scores than ODD/CD, LD, and general population groups; a LD group scored higher than ADHD, ODD/ CD and general population groups for the LP scale; and those with ODD/CD scored higher on the AG scale than ADHD, LD, and general population groups.
Although the IN/HY, LP and AG scales of the C 3-P (S) have been shown to be suitable for specifically identifying individuals with ADHD, LD and ODD/CD, respectively, we wish to argue that the discriminative validity or the scores for the C 3-P (S) scales have yet to be comprehensively evaluated. This is because there are data showing that the C 3-P (S) scales are related to psychological syndromes (a group of signs and symptoms that occur together and characterize a particular abnormality) that are closely associated with internalizing anxiety and mood disorders. For instance, the C 3 manual reports that the internalizing syndromes (anxious/depressed, withdrawn, and somatic complaints) of the Child Behavior Checklist (CBCL) [7] are associated with the C 3-P (S) scales. More specifically, the CBCL scales for anxious/ depressed, withdrawn and somatic complaints are associated with the C 3-P (S) LP and PR scales; the CBCL scales for anxious/ depressed and withdrawn are associated with the C 3-P (S) IN and EF scales; the CBCL anxious/depressed scale is associated the C 3-P (S) HY scale; and CBCL withdrawn scale is associated with C 3-P (S) AG scale. Given this, and the fact that the comorbidity rates for ADHD with mood and anxiety disorders are relatively high (around 22% to 28% for mood disorders, and around 15% to 18% for anxiety disorders) [8], it can be speculated that the C 3-P (S) scales would also be associated with the internalizing disorders. Examination of such associations could also provide insights on the discriminative validity of the C 3-P (S) scales. For example, if we find associations for the C 3-P (S) IN scale with ADHD, but not with other disorders, then it can be interpreted as supportive of the discriminative validity of the IN scale. On the other hand if we find that the C 3-P (S) IN scale is also associated with anxiety and/or depressive disorders, then support for the discriminative validity of the IN scale is diminished. To date this has not been explored.
Given existing limitations and omissions, based on ratings of the C 3-P (S) provided by mothers for large groups of clinicreferred boys and girls, the first aim of the current study was to apply the multiple-group CFA approach to the six-factor oblique model to examine measurement invariance across the gender groups. Related to this aim, we also examined the equivalencies of the latent factor mean scores, taking into consideration noninvariance in the measurement model. The second aim of the study was to examine the concurrent and discriminant validities of the C 3-P (S) scales in terms of their relationships with a range of both DSM-IV externalizing (ADHD, ODD and CD) and internalizing disorders [separation anxiety disorder (SAD), social phobia (SOP), specific phobia (SPP), panic disorder (PD), agoraphobia (AG), generalized anxiety disorder (GAD), obsessive compulsive disorder (OCD), post-traumatic stress disorder (PTSD), dysthymia (DYTH), and major depressive disorder (MDD)]. Based on existing findings and the arguments present earlier (p. 4, paragraph 3) we expected lack of scalar measurement invariance for some of the items in the HY, LP, EF and AG scales. We also expected higher mean scores for boys for the IA, HY and EF latent factors; and stronger associations for C 3-P (S) scales with the externalizing disorders than the internalizing disorders.

Participants
The data for all participants were collected archivally from the Academic Child Psychiatry Unit (ACPU) of the Royal Children's Hospital, Melbourne, Australia. The ACPU is an outpatient psychiatric unit that provides services for children and adolescents with behavioral, emotional, and learning problems. Referrals are generally from other medical services, schools, and social and welfare organizations. All parents and children were informed that the clinic would provide diagnosis and appropriate treatment, and that assessment will be over two days, covering a range of tests involving the parents, children/adolescents and their teachers. They were informed that all data collected would be kept in an unidentifiable form in a secure database and (if consent was given) used to support future research.
For the current study we used the records of children and adolescents, aged between 6 and 17 years, referred between 2004 and 2017, who had been interviewed for clinical diagnosis. An individual was selected for inclusion in the study if that individual had ratings for the C 3-P (S), completed by his/her mother. Apart from this and the age criteria, no other inclusion/ exclusion criterion was applied when selecting participants for the study. In all, there were 505 children and adolescents, comprising 354 (70.1%) boys and 151 (29.95%) girls. The overall mean age of participants was 11.52 years (SD=3.35 years).
Given that measurement invariance was examined for ratings across boys and girls, we initially tested if these groups were equivalent for age and a range of background and demographic information variables, and the percentages of different disorders. As chi-square values are highly sensitive to sample size, the α value was set at .01 to allow for more stringent Type II error control. The mean age (SD) of boys and girls were 11.24 years (SD=3.20 years) and 12.16 years (SD=3.60 years). Although, girls were significantly older than boys, t (503)=2.83, p<0.01, the Cohen's d effect size value for the age difference were small at 0.28 [based on Cohen's [9] guidelines for d effect size: small < =0.20, medium > =0.50, and large > =0.80]. Table 1 presents the results of the comparisons between boys and girls for other background measures. As shown in the table, the frequencies for mother and father employment and educational levels, family income and parental relationships status for different categories showed no group difference. On the whole, most fathers and mothers of participants were employed, and more than twothirds of participants had fathers and mothers who had attended at least secondary school. In terms of parental relationships, close to 50% were living together and the other 50% were separated or divorced. Slightly more than half the number of participants were from families with income less than $50,000 per year. Apart from GAD (higher frequency among girls) and ODD (higher frequency among boys), there was no difference across gender for the other disorders. Although the groups were not matched for frequencies of GAD and ODD, the phi (equivalent to correlation) values for the differences for GAD and ODD were of small effect size at 0.14 [using guidelines for equivalent d effect size values proposed by Cohen's [9]; small r ≥ 0.10, medium r ≥ 0.24, and large r ≥ 0.37). Thus, although the gender groups differed for age and frequencies of GAD and ODD, these differences were of little importance, and thus the gender groups in the study can be considered sufficiently matched for age, background demographic variables, and clinical disorders.

Measures
The measures included in this study were the parent version of the Anxiety Disorders Interview Schedule for Children (ADISC-IV)  [10] that was used for clinical diagnosis, and the C 3-P (S) [1]. The C 3-P (S) was not used for facilitating diagnosis.

Anxiety Disorders Interview Schedule for Children, Parent Version (ADISC-IV-P):
The ADISC-IV-P was used for diagnosis [10]. The ADISC-IV-P is a semi-structured interview, based on the DSM-IV-TR diagnostic system (American Psychiatric Association, 2000). It has been designed to facilitate the diagnosis of major childhood disorders. The ADISC-IV-P guidelines for diagnosis are that the child/adolescent be given a diagnosis of all disorders meeting the diagnostic criteria, and not in terms of primary and secondary disorders. Thus, all disorders that an individual qualified for were seen as equally applicable to that individual. The scores of ADISC-IV-P have sound psychometric properties [11]. Test-retest reliabilities for the ADISC-IV-P scores over a 7 to 14-day interval have shown good to excellent reliabilities. Kappa values for interview with children between 7 and 16 years ranged from 0.61-0.80 [11].

Conners 3 rd Edition-Parent Short (C 3-P (S)):
As the C 3-P (S) was described comprehensively in the introduction, this section will only provide additional information not provided in the introduction [1]. For this measure, respondents indicate the degree or frequency of each behavior described in the item on a scale of 0 (not true at all), 1 (just a little true true), or 2 (pretty much true), or 3 (very much true). The rating period is 1 month. For the sample in the current study, the Cronbach's alpha values for the IN, HY, LP, EF, AG, and PR scales were 0.91, 0.90, 0.80, 0.82, 0.89, and 0.88, respectively. All these values are well above 0.70 that is generally considered the minimum level for acceptable internal consistency reliability [12].

Procedure
Children and parents participated in separate interviews and testing sessions with breaks over two days. Information was also obtained from teachers using various checklists and questionnaires. In all cases, parental and child consent forms were completed prior to the assessment. The consent forms from both parents and children gave permission for all relevant data collected by the ACPU of the RCH or provided by others to be used in future research and was approved by the RCH ethics committee as part of the ACPU's comprehensive examination of psychopathology in children and adolescents. The data collected covered a comprehensive demographic, medical (primarily neurological and endocrinological), educational, psychological, familial, and social assessment of the child and his or her family. All psychological data were collected by research assistants, who were advanced doctoral students in clinical psychology, and under the supervision of two registered clinical psychologists.
The research assistants were provided with extensive supervised training and practice by the two psychologists prior to them collecting data. Training of the ADISC-IV-P included observations of it being administered by the psychologists. The research assistants commenced administering the ADISC-IV-P only after they attained competence in its administration, as assessed by the two registered psychologists. There was adequate inter-rater reliability for the diagnoses made between the research assistants and the psychologists, and between research assistants (average kappa value across all diagnoses =0.88).
Standard procedures were used for the administration of all measures. Approximately 85% of the parent ADISC-IV-P interviews involved mothers only, and the rest involved fathers only or both fathers and mothers together. Using the categorical data from the parent ADISC-IV-P, clinical diagnosis was determined by two consultant child and adolescent psychiatrists who independently reviewed the data. The inter-rater reliability for diagnoses of the two psychiatrists was high (kappa =0.90).

Statistical procedures
All the CFA models in the study were conducted using Mplus (Version 7) software [13]. As there are four order response categories for all the C 3-P (S) items, we used the mean and variance-adjusted weighted least squares (WLSMV) extraction for all the CFA analyses [14]. Multiple-group CFA measurement invariance was tested using the procedure proposed by Millsap and Yun-Tein [15] for the WLSMV estimator with theta parameterization. Details of this procedure are not provided here because of word limitation. For details, the reader is referred to Millsap and Yun-Tein [15].
The goodness-of-fit of the CFA models was examined using WLSMVχ 2 . Like all other χ 2 values, WLSMVχ 2 values are inflated by large sample sizes. In addition to the WLSMVχ 2 , the fit of the models was examined using the approximate fit values of rootmean-square error of approximation (RMSEA), the comparative fit index (CFI), and Tucker-Lewis Index (TLI). The guidelines suggested by Hu and Bentler [16] [18,19]. A recent study by Sass, Schmitt, and Marsh [20] concluded that although these values could be used when WLSMV estimation is applied, there is need for caution, especially with misspecified models. Given these concerns, we examined measurement invariance using both difference in approximate fit indices (ΔRMSEA > -0.015 and ΔCFI >0.0) and ΔWLSMVχ 2 values. For the latter, the α value was set at 0.01 to allow for more stringent Type II error control.

Journal of Childhood & Developmental Disorders ISSN 2472-1786
To examine the concurrent and discriminant validities of the C 3-P (S) scales, these scales were correlated with clinical diagnoses as established by the ADISC-IV-P. The sizes of the correlations were interpreted using guidelines for equivalent d effect size values proposed by Cohen's [9] (small r ≥ 0.10, medium r ≥ 0.24, and large r ≥ 0.37).

Missing data
Out of a total of 15,655 scores for the C 3-P (S) (31 items x 505 participants), there were 308 scores missing (i.e., around 2%). For WLSMV estimator, Mplus uses pairwise deletion (i.e. includes everybody who answers both items in an item pair to estimate the covariance for that pair) to deal with missing values.

Goodness-of-fit for the single groups and reliabilities of the factors for the C 3-P (S)
The fit values for the six-factor oblique model for boys were Although the Cronbach's alphas indicated support for internal consistency reliabilities for the C 3-P (S) factors, within a CFA measurement model, more desirable measures of internal consistency reliabilities or more specifically convergent validity, are composite reliability (CR) and average variance extracted (AVE) [21]. The CR estimates the extent to which a set of latent construct indicators share in their measurement of a construct, while the AVE is the amount of common variance among latent construct indicators [22]. Fornell and Larcker [21] have also proposed that for a CFA measurement model, the discriminant validity of the constructs can be examined by comparing the square root of the AVE of a construct with its correlations with other constructs in the model. According to Hair et al., CR 0.70 or more, and/or AVE 0.50 or more are supportive of convergent validity, and if the square root of the AVE of a construct is higher than its correlations with other constructs then discriminant validity for that construct can be assumed. Given this, these values were also computed, based on the standardized factor loadings of the proposed 6-factor oblique model for the C 3-P (S). The fit indices for this model that involved both boys and girls together were WLSMVχ 2 (df=419) = 1043.12, p<0.001, CFI=0.969, TLI=0.966, and RMSEA=0.054 (90 CI=0.052 to 0.062). The CFI, TLI and RMSEA values indicate good fit. Table 2 shows the range of factor loadings with each of the six factors, and the CR, the AVE, and the square soot of the AVE for the different factors, and correlations between six latent factors in the model. As shown, the CR and AVE for each of the construct were all above 0.70 and 0.50, respectively, thereby indicating support for their convergent validities. Also, for each of the construct, the square root of its AVE was higher than the correlations for the construct with other constructs, thereby supporting the discriminant validities of all constructs. For indicators/items, loadings of 0.70 or more can be interpreted as acceptable reliability [22]. Only one item ("Forgets to turn in completed work" -an item belonging to the executive functioning scale) had a loading of less than 0.70. The value was however close to 0.70 at 0.68. Table 3 shows the results of the analyses for invariance testing, based on difference in χ 2 test. As shown, the RMSEA, CFI and TLI values for the configural invariance model (M1 in Table  3) indicated good fit for configural invariance. Table 3 shows that there was a difference between the configural invariance model and the full metric invariance model (M2 in Table 3), ΔWLSMVχ 2 (df=25) = 52.82, p<0.001. Further analyses showed lack of invariance for factor loadings of item numbers 15, 35 and 4 (M2.3 in Table 3). There was also a difference between the final partial metric invariance model (M2.3 in Table 3) and the full scalar invariance model (M3 in  Table 3). There was no difference between the final partial scalar invariance model (M3.4 in Table  3) and the full error variances invariance model (M4 in

Journal of Childhood & Developmental Disorders ISSN 2472-1786
model, further analysis was conducted for equivalency in latent mean scores. As shown in Table 1, this analysis showed no support for equivalency for the factor mean scores model (M5 in Table 3), as this model differed from the final partial scalar invariance model (M3.4 in Table 3), ΔWLSMVχ 2 (df=6) = 52.74, p<0.001. Additional analyses showed that for the criteria used here (p<0.01) the groups differed for all latent factors (inattention, hyperactivity/impulsivity, learning problems, executive functioning, and aggression), except peer relations (Table 3). Table 4 shows the unstandardized estimates for boys and girls for the non-invariant parameters. As shown in Table 4  Note: χ 2 = weighted least square with mean and variance adjusted chi-square (WLSMVχ 2 ), RMSEA= root mean square error of approximation; CFI= comparative fit index, TLI = Tucker-Lewis Index. All WLSMVχ 2 values were significant (p<0.001). **p < .01, ***p < .001.   Table 4 Comparisons of unstandardized estimates for the non-invariant parameters across gender. Table 5 shows the results of the analyses for invariance testing, based on difference in CFI and RMSEA values. As shown, the RMSEA, CFI and TLI values for the configural invariance model (M1 in Table 5) indicated good fit for configural invariance. As will be recalled, decrease of ≥ 0.01 in CFI values and an increase of ≥ 0.015 or the RMSEA values were interpreted as indicative of non-invariance. As shown in the table, there was no difference between the configural model (M1 in Table 5) and the full metric invariance model (M2 in Table 5); the full metric invariance model and the full scalar invariance model (M3 in Table 5), and the full scalar invariance model and the full error variances invariance model (M4 in Table 5). Also, there was also no difference between the full scalar invariance model and the equivalency for latent mean scores model (M5 in Note: χ 2 = weighted least square with mean and variance adjusted chi-square (WLSMVχ 2 ), RMSEA= root mean square error of approximation; CFI= comparative fit index, TLI = Tucker-Lewis Index. All WLSMVχ 2 values were significant (p<0.001).   Table 6 Correlations of the C 3-P (S) factors with DSM-IV childhood disorders derived via ADISC-IV-P.
invariance) for all C 3-P (S) items, and equivalencies for latent factor mean scores for the C 3-P (S) across the gender groups. Table 6 shows the correlations of the C 3-P (S) factors with the externalizing and internalizing disorders, derived via ADISC-IV-P. As shown in the table, SAD,

Journal of Childhood & Developmental Disorders ISSN 2472-1786
and correlations of EF and PR with ODD were of medium effect sizes. All other statistically significant correlations were of small effect sizes.

Discussion
Consistent with the findings reported in the C3 manual [1], our findings indicated good fit for the proposed oblique six-factor model for the C 3-P (S) for boys and girls. For this model, our findings showed support for partial measurement invariance (metric, and scalar), based on the difference in χ 2 test. There was support for full measurement invariance for all error variances. Also, all but PR, showed differences across the sex for latent mean scores. More specifically, girls had higher factor loadings for EF item number 15 ("trouble getting started on tasks or projects") and PR item number 4 ("last to be picked for teams/games"), while boys had higher loading for EF item number 35 ("messy or disorganized"). For all four non-invariant thresholds (threshold number 1 of LP item number 8 ("cannot grasp arithmetic"), threshold number 1 of EF item number 1 ("forgets to turn in completed work"), threshold number 1 of IN item number 34 ("inattentive, easily distracted"), and threshold number 3 of EF item number 1 ("forgets to turn in completed work"), girls had higher scores. Also, for all the non-invariant latent mean scores, girls had lower scores. In contrast to the findings based on the difference in χ 2 test, we found support for strict full measurement invariance (configural, metric, scalar, and error variances), and equivalencies across the gender groups for mean scores for all six latent factors, based on the difference in CFI and RMSEA values. In terms of concurrent and discriminant validities of the C 3-P (S) scales, our findings showed no significant associations for SAD, SOP, SPP, PD, AG, and PTSD with any of the C 3-P (S) scales. Although GAD, OCD, DYTH and MDD correlated significantly with one or more of the C 3-P (S) scales, the magnitudes of these correlation were of small effect sizes. In contrast, apart from the correlation involving ODD and LP (that was not significant), the correlations involving all the other scales with ADHD, ODD and CD were significant. In terms of effect sizes, based on Cohen's [9] guidelines for interpreting r effect sizes (small ≥ 0.10, medium ≥ 0.24, and large ≥ 0.37), the correlations for IN. HY, and AG with ADHD were of large effect sizes, and the correlations of LP and AG with ADHD were of medium effect sizes. The correlations for CD and ODD with AG were both of large effect sizes. The correlations of HY with ODD and CD were of medium effect sizes, and correlations of EF and PR with ODD were of medium effect sizes. Taken together, these findings can be interpreted as supporting the concurrent and discriminant validities of the C 3-P (S) scales.
Our findings have implications for the use of the C 3-P (S). First, the support for the six-factor model is consistent with the model recommended for the C 3-P (S) in the C3 manual [1]. Additionally, with the exception of one item ("Forgets to turn in completed work" -an item belonging to the executive functioning scale), there was support for the item reliabilities of all the other 30 items. Even the exceptional item had a factor loading of 0.68 which is close to 0.70, used as the cut-off for acceptable reliability. Furthermore, there was support for the convergent validities of the items within the six constructs in terms of their CR and AVE values, and also the discriminating validity of the six constructs as the square root of their AVE values were all higher than their correlations with other constructs. Thus, the six-factor model can be seen as a robust and valuable model for research and clinical applications. Second, the support for full measurement invariance based on the difference in CFI and RMSEA values indicate that at the practical level, the C 3-P (S) has the same measurement and scaling properties when applied to parent ratings of boys and girls, and that these groups can be directly and justifiably compared in terms of observed scores. The support for only partial measurement invariance (metric, and scalar), suggests that girls had higher factor loadings for EF item number 15 ("trouble getting started on tasks or projects") and PR item number 4 ("last to be picked for teams/games"), while boys had higher loading for EF item number 35 ("messy or disorganized"). Also, girls had higher scores for four thresholds [threshold number 1 of LP item number 8 ("cannot grasp arithmetic"), threshold number 1 of EF item number 1 ("forgets to turn in completed work"), threshold number 1 of IN item number 34 ("inattentive, easily distracted"), and threshold number 3 of EF item number 1 ("forgets to turn in completed work"). Also, boys had higher latent scores for all but the PR factors. These findings suggest that the normative scores for boys and girls provided in the C 3 manual [1] are confounded by differences in measurement and scaling properties for these groups, and they cannot be used confidently for interpretation. However, it is possible that as our findings did not control for age and ethnicity, our findings may be confounded by these factors as they have been shown to be associated with parent ratings of the C 3-P (S). Thus, we recommend that clinicians exercise a lot of caution when using the different normative scores for the gender groups that are provided in the C 3 manual [1]. Our discriminant validity findings suggest that all the C 3-P (S) scales were either unrelated or only weakly related to DSM-IV internalizing disorders (SAD, SOP, SPP, PD, AG, PTSD, GAD, OCD, DYTH and MDD). In contrast, for DSM-IV externalizing disorders, the correlations for IN. HY, and AG with ADHD were of large effect sizes, and the correlations of LP and AG with ADHD were of medium effect sizes. The correlations for CD and ODD with AG were both of large effect sizes. The correlations of HY with ODD and CD were of medium effect sizes, and correlations of EF and PR with ODD were of medium effect sizes. Taken together, these findings can be interpreted as supporting the concurrent and discriminant validities of the C 3-P (S) scales.
In concluding, the findings and interpretations made in the study needs to be viewed with some limitations in mind. First, since age, ethnicity and socioeconomic status were not controlled in the current study, it is possible that the findings here may be confounded by these variables. Second, the findings reported here are based on one sample, on archival data, using diagnostic determinations based on the ADISC IV. Thus, our findings may not be generalized and warrant further investigation and crossvalidation on other well-diagnosed samples before they can be used with confidence. Third, as all the participants in this study were from the same clinic, it is possible that this may constitute an additional bias for the sample examined. Fourth, as this study