Data extraction and assessment of study quality
For each study included, basic within-group (pre- to post- treatment and post- treatment to follow up) data necessary for effect size calculation were extracted. If necessary, we contracted the authors to obtain additional information. No study had to be excluded because of unavailable data. Effect size were calculated for three outcome domains separately. The primary outcome was ‘physical symptom’; secondary outcomes were ‘phychological symptom’ (depress, anxiety, anger and general symptom) and ‘functional impairment’ (health, life satisfaction, interpersonal problems and maladaptive cognitions and behavior), as these outcome domain have been commonly used in evaluations of treatments in somatoform disorder. When a study included multiple variables for the same domain, the means of z-transformed variable were used calculate an average effect size. Self-report measures were included as most studies used self-report instruments only.
Participant information and formal characteristics of the treatment and therapeutic intervention were extracted by a standardised coding scheme. Formal characteristics included setting, duration, frequency, treatment modality, adherence and structure. To classify therapeutic intervention, categorization was based on that of De Groot et al, and the comprehensive psychotherapeutic interventions rating scale (CPIRS) which comprised intervention representative of the following five orientation: experiential, psychodynamic, directive-behavioural, cognitive and systemic. We adapted this scale for the treatment of somatoform disorder by adding ‘body-directed intervention.’ The intervention was rated as ‘important’ (explicitly mentioned as core component), ‘possibly important’ (implicit as core component), ‘possibly not important’ (uncertain whether included or not) and ‘not important’ (not mentioned and definitely not a core component)
Studies included were rated by two independent judges on methodological quality with the revised psychotherapy quality rating scale(PQRS). This scale comprised 25 items covering six domain: description of patients (four items); definition and delivery of treatment (five items); outcome measures (five items); data analysis (five items); treatment assignment (three items); and overall quality(three items). The last item was scored on a seven-point scale and the other items on a three-point scale. Some minor adaptation to the scale were made base on the strengthening the reporting of observational studies in epidemiology(STROBE) statement recommendations for cohort studies to render it suitable for quasi-experimental and uncontrolled studies. Items were added to yield information about repeated measurements, to evaluate sources of bias in uncontrolled trails as a consequence of recruitment procedures and non-random allocation to treatment groups and their impact on study generalisability, and concerning the matching of groups on clinical and demographic variables.
Two independent judges coded all studies to establish inter-rate reliability (across 14 studies). The one-way random intraclass correlation (ICC) single measures was 0.91 and the ICC average measures was 0.95, which is excellent.