University of the Witwatersrand, Johannesburg CONTEXTUAL INFLUENCE, FRAILTY AND THE SPATIAL PATTERNS OF CHILD MORTALITY IN NIGERIA By Sulaiman Salau Supervisor: Professor Jacky Galpin School of Statistics and Actuarial Science A research report submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfilment of the requirements for the degree of Master of Science. February 2010. i ii ABSTRACT The main aim of the project is to investigate statistical models that account for the influence of contextual factors and frailty on Child Mortality (CM) and to investigate the spatial patterns of CM in Nigeria. Using data from the Nigerian Demographic and Health Survey, results from a descriptive investigation of clustering showed that clustering of child mortality exists at the household, community and state levels and these need to be taken into account in the multivariate analysis by the inclusion of frailty effects at the relevant levels. A total of 8 models were evaluated using geo-additive survival models and the results in Chapter 4 reveal that the inclusion of frailty terms as well as the inclusion of contextual variables at the community level lead to an improvement in the model fit, thereby suggesting the importance of contextual and frailty effects. A higher share of state level variability in the data was due to the structured spatial effect. Although, the spatial patterns were found to be insignificant, they point to very interesting patterns in child mortality variations. iii ACKNOWLEDGEMENT I would like to express my gratitude to my supervisor Professor Jacky Galpin, whose suggestions, assistance, encouragement and comments has this study possible. My sincere thanks go to my family and friends for their support. Finally, I will like to thank the Demographic and Health Surveys program (www.measuredhs.com) for providing the survey data used in this study. iv TABLE OF CONTENTS DECLARATION ................................................................................................................. i ABSTRACT ........................................................................................................................ ii ACKNOWLEDGEMENT ................................................................................................. iii TABLE OF CONTENTS ................................................................................................... iv LIST OF FIGURES ............................................................................................................ v LIST OF TABLES ............................................................................................................. vi LIST OF ABBREVIATIONS ........................................................................................... vii Chapter One: Introduction .................................................................................................. 1 1.2 Background of the study area ................................................................................... 6 1.3 Problem statement ..................................................................................................... 8 1.4 Aims and Objectives of the Study ............................................................................ 8 1.5. Organization of the Report....................................................................................... 9 Chapter Two: Literature Review ...................................................................................... 10 2.1 Conceptual Framework ........................................................................................... 10 2.2 Pathway of Influence .............................................................................................. 12 2.2.1 Proximate Determinants................................................................................... 12 2.2.2 Household level factors .................................................................................... 13 2.2.3 Community level factors .................................................................................. 14 2.3 Child Mortality and its differentials in Nigeria....................................................... 15 Chapter Three: Data and Methods .................................................................................... 18 3.1 Data Sources ........................................................................................................... 18 3.2 Statistical Methods .................................................................................................. 20 3.2.1 The Geo-additive Discrete-Time Survival Model ........................................... 25 Chapter Four: Results ....................................................................................................... 33 4.1 Unit of analysis and outcome .................................................................................. 33 4.2 Variable selection.................................................................................................... 34 4.2.1 The selection and construction of community-level variables ........................ 34 4.2.2 Final Data set ................................................................................................... 36 4.3 Descriptive summaries ............................................................................................ 37 4.3.1 Results of the survival analysis ........................................................................ 41 4.3.2 Investigation of clustering of deaths ................................................................ 43 4.4 Multivariate analysis ............................................................................................... 46 4.4.1 Modelling Strategy and Model Comparison Approach ................................... 47 4.4.2 Sensitivity analysis........................................................................................... 50 4.4.3 Interpretation of categorical covariates (fixed effect) ...................................... 51 4.4.4 Interpretation of non-linear effects .................................................................. 55 4.4.5 Interpretation of the spatial effect .................................................................... 56 4.5 Determinants of Infant mortality ............................................................................ 58 Chapter Five: Summary and Conclusions......................................................................... 60 5.1 Summary ................................................................................................................. 60 5.2 Recommendations ................................................................................................... 61 5.3 Limitations of the Study /Suggestions for future research...................................... 61 REFERENCES ................................................................................................................. 63 APPENDICES .................................................................................................................. 71 v LIST OF FIGURES Figure 1: Map of Nigeria showing the 37 spatial units considered.. .................................. 7 Figure 2: Schematic presentation of the conceptual framework for the study of CM ..... 11 Figure 3: Hierarchical structure of the dataset .................................................................. 34 Figure 4: Maps depicting the nature of spatially explicit variables considered ................ 35 Figure 5: Kaplan-Meier Survival Curves for community level covariates ....................... 42 Figure 6: Results from Spatial autocorrelation for U5M .................................................. 45 Figure 7: Non-linear effects of metrical covariates .......................................................... 56 Figure 8: Maps of the posterior mean of spatial effects.................................................... 57 Figure 9: Non-linear and spatial effects for IM ................................................................ 59 vi LIST OF TABLES Table 1: Community level contextual factors to be considered in the study .................... 19 Table 2: Community level variables from factor analysis ................................................ 36 Table 3: Descriptive statistics of child level variables ...................................................... 38 Table 4: Descriptive statistics of mother level variables .................................................. 39 Table 5: Descriptive statistics of household variables ...................................................... 40 Table 6: Descriptive statistics of community variables .................................................... 41 Table 7: Distribution of births and deaths in households ................................................. 43 Table 8: Distribution of births and deaths in communities ............................................... 44 Table 9: Creation of child-period dataset from the original child-level data set .............. 46 Table 10: Models considered ............................................................................................ 47 Table 11: Results from Models 1 -7b ? Model fit and Variance components of random and non-linear effects ........................................................................................................ 49 Table 12: Sensitivity to choice of hyperparameter values for Model 6 ............................ 50 Table 13: Posterior summaries for child level effects models 1-7b.................................. 52 Table 14: Posterior summaries for mother level effects models 1-7b .............................. 53 Table 15: Posterior summaries for household effects models 1-7b .................................. 54 Table 16: Posterior summaries for community effects models 1-7b ................................ 55 Table 17: Posterior summaries for community effects model 6 - IM ............................... 58 vii LIST OF ABBREVIATIONS AIC Akaike?s Information Criterion CI Credible Interval CM Child Mortality DCW Digital Chart of the World DFID Department for International Development DHS Demographic and Health Survey DIC Deviance Information Criterion EA Enumeration Areas ESDA Exploratory Spatial Data Analysis FCT Federal Capital Territory GIS Geographic Information System GNP Gross National Product GPS Global Position System GPW Gridded Population of the World GRF Gaussian Random Field IM Infant Mortality K-M Kaplan-Meier LGA Local Government Area LISA Local Indicators of Spatial Association M-H Metropolis-Hastings MARA Mapping Malaria Risk in Africa MAUP Modifiable Area Unit Problem MCMC Markov Chain Monte Carlo MICS Multiple Indicator Cluster Survey MRF Markov Random Field NDHS Nigeria Demographic and Health Survey NIMA National Imagery and Mapping Agency NPC Nigerian Population Commission PPS Probability Proportional to Size TFR Total Fertility Rate U5M Under five Mortality UN United Nations WFS World Fertility Survey WHO World Health Organization 1 Chapter One: Introduction Childhood mortality (CM) remains a major public health issue in developing countries where it is estimated that over 10 million preventable child deaths occur yearly (World Health Organization [WHO], 2005). A high level of childhood mortality leads to high fertility through physiological, replacement and insurance effects (Preston, 1978; Montgomery and Cohen, 1998) resulting in rapid population growth - a situation that can hamper development. Indeed, mortality in the childhood years has been identified as an important indicator of a population?s public health and socio-economic conditions (Masuy-Stroobant and Gourbin, 1995), and reduced childhood mortality not only offers opportunities for improving living conditions, but also has an effect on life expectancy. Of high priority in the developing world is the reduction of under-five mortality rates to two thirds of their 1990 levels by the year 2015 (United Nations [UN], 2000), with national governments as well as the international community supporting various research and intervention initiatives geared towards the attainment of the goal. CM is usually monitored using two indicators: Infant mortality (IM) - (death between birth and first year of life) and Under-five mortality (U5M) - (death between birth and the fifth year of life). CM as used in this study refers to U5M except where otherwise stated. The focus on U5M is based on the fact that IM is a rare and noisy event and a large sample is often required for its? modelling (Mosley and Chen, 1984). Moreover, available statistics reveal that the national U5M in Nigeria is worse than IM (NPC [Nigerian Population Commission] and ORC Macro, 2004). In spite of substantial reduction in CM rates experienced in most developing countries (Hill and Pebley, 1989), statistics reveal that the sub-Saharan Africa region has the greatest proportion (about 45%) of the global annual incidence of child deaths (UN, 2005). The pace of CM reduction within the African continent has also not been uniform, and gains in mortality reduction earlier experienced in some African countries have either begun to stagnate or reverse, raising fears that the millennium development target on CM reduction may not be met by the target date (Rutstein, 2000; UN, 2005; WHO, 2005). The situation is true for Nigeria, where CM rates are high, with evidence of substantial geographical variation across the country (NPC, 1998; NPC and ORC Macro, 2004). This 2 calls for an in-depth examination of the trends and patterns of CM rates, as well as their association with other factors, and the identification of high risk sub-groups for more effective targeting and CM reduction. Most CM studies in Africa use data from surveys such as the Demographic and Health Survey (DHS). Such surveys use a stratified multistage sampling design, resulting in a hierarchically structured data set. The primary sampling unit is regions within a country, with communities being selected within each region. Within the community, households are selected, and limited information collected on the children and their parents. This sample design results in data that are correlated within households, within communities, and within regions, which needs to be taken into account when analyzing the data (Chromy and Abeyasekera, 2003). The existing CM studies in Nigeria have primarily focused on the influence of a few individual and household factors in explaining CM differentials in the country (see for example: Iyun, 1992; Ahonsi, 1995; Adebayo, Fahrmeir and Klasen, 2004). The causes of CM are however multifaceted, often involving a number of factors operating at various levels and in complex ways. Despite evidence that factors at other contextual levels (arbitrarily or administratively defined geographical units such as household, community or region) may affect child survival (Mosley and Chen, 1984; Sastry, 1996; Root, 1997; Curtis and Hossein, 1998), there is a general scarcity of studies examining the influence of context-level factors on CM in developing countries. This can mainly be attributed to the non-availability of adequately measured contextual factors and the difficulty in incorporating them with routinely collected data when they are available (Sastry, 1996; Curtis and Hossein, 1998). Statistical techniques mostly employed in the analysis of CM data include logistic regression (used when the dependent variable is binary) and Poisson regression (used when interest lies in modelling death rates) (Fahrmeir and Tutz, 2001). Standard specifications of these methods usually assume that the observations are independent (ignoring the hierarchical structure of the data utilized), and that heterogeneity (differentials in mortality) in the population under study can be explained by the set of 3 measured covariates included in the model (thereby neglecting the influence of unobserved heterogeneity). Failure to account for the clustering of CM risk and the influence of omitted covariates may yield inconsistent and inefficient estimates which in turn, may lead to invalid or wrong conclusions (Hobcraft, McDonald, and Rutstein, 1985; Guo and Rodriguez, 1992; Sastry, 1997a). Conventional specifications of the models also do not take into account time varying covariates (such as breastfeeding), non-linear effects of certain covariates (such as mother?s age) and the censoring of observations. In CM studies the censoring type is right-censoring, occurring when a child has not run all the risks of death (by virtue of being less than the age of interest) and does not experience death during the period of interest. In using the logistic and Poisson regression models, data on recent births (children who are not yet five in the case of U5M) are usually excluded to alleviate bias caused by censoring. This approach leads to a loss of data that may carry valuable information. The other set of techniques frequently used is survival analysis, appropriate when survival times and survival status data are available. The Cox proportional hazard (Cox, 1972) is given by: '( | ) ( )exp( )i i o i ih t X h t X?? where it is the time to death or censoring of child i , oh is the baseline hazard, iX is a vector of covariates and ? is a vector of parameters. Survival analysis can accommodate censored observations in addition to modelling time varying and non-linear effects (Fahrmeir and Tutz, 2001). In time-to-event studies with hierarchical structured datasets, there are two main approaches in modelling the data. The first is the fixed-effects method, which applies to situations were one is looking at specific treatments, and these are the only treatments of interest. In this case, variations in CM are explained entirely by the covariates included in the model, that is to say, unobserved heterogeneity is treated as a fixed parameter and is modelled as one of covariates. In modelling community specific heterogeneity, for example, one community is picked as a baseline community and a set of indicator 4 covariates is included for all other communities to estimate the community-specific differences. The second approach is to use random-effects methods (Vaupel, Manton and Stallard, 1979), which are appropriate when one is using a sample of respondents and assumes that these are a random sample from a wider population. These models are often referred to as frailty models, a term used in the CM literature to explain the situation where children in certain groups are more susceptible to death than others, perhaps due to group-specific factors which are mostly unmeasured, immeasurable or unknown, or that result from a survey design which imposes a correlation of mortality risks among children belonging to the same group. The random-effects method assumes that the context-level frailty is distributed over the population according to some distribution function. In other words, random effects models incorporate frailty into the model estimates as an uncorrelated error component, and the frailty effects are considered as resulting from random sampling from a certain distribution function (whose mean and variance can be estimated). To illustrate this model, let ijt be the time to death or censoring for child j in cluster i . Let Z be a vector of child and cluster-specific covariates. In the fixed-effect approach, the hazard rate for the thj child from community i is modelled as: ? ? ? ? ? ?' '| expij ij ij o ij ijh t Z h t Z X? ?? ? where ? is a vector of regression coefficients for the covariates Zij, ? is a vector of unknown parameters, ? ?1X , , kX X? K and Xi=1 if child j is in community i, 0 otherwise. This fixed-effect approach therefore implies that there are i communities and the last one is the base category. 5 In the frailty setting, the hazard rate for the jth child from community i is ? ? ? ? ? ? ? ? ? ? ? ? ? ? ' ' ' | exp exp( )exp exp ij ij ij o ij ij i o ij i ij o ij i ij h t Z h t Z u h t u Z h t Z ? ? ? ? ? ? ? ? 2where exp( ) and the 's are assumed to be iid ~ (0, )i i iu u N? ?? , and represent the cluster-specific frailty effect designed to capture differences among the clusters. The above approach can be extended to more than one nested level. In general, CM studies that have considered frailty effects can be broadly classified into two groups. The first group consists of those that consider frailty at the family or community level (or both) as unstructured random components of the model and often ignore frailty at higher contextual levels (such as regions) (see, for example, Curtis, Diamond and McDonald, 1993; Madise and Diamond, 1995; Guo and Rodriguez, 1992; Guo, 1993; Sastry, 1997a-c; Curtis and Steele, 1996). The second group consists of more recent studies that consider frailty mostly at the contextual levels of community or region (Banerjee, Wall and Carlin, 2003, Gemperli, Vounatsou, Kleinschmidt, Bagayoko, Lengeler, and Smith, 2004, Kandala, Fahrmeir and Klasen, 2002; Kandala, Magadi and Madise, 2004; Adebayo et al., 2004; and Adebayo and Farhmeir, 2005). These studies use both unstructured random effects (which assume that the frailty components at the contextual level of enquiry are independent), and spatially structured frailty effects (which take into account the fact that geographical locations at close proximity are more likely to be similar to each other than those far apart). In these studies, Bayesian methods generally allow for specification and estimation of all factors (including the frailty terms) in a single framework and also allow for the incorporation of empirical information (for example, previous knowledge about the modelling of the baseline effect) in the model. Bayesian modelling also helps alleviate the problem of sparse data in contextual studies through the use of smoothing techniques which involves borrowing strength from neighbouring areas in order to obtain more reliable estimates. 6 Determination of the magnitude of the unobserved regional/state effects is important since this is the level at which most health policies implemented at the lower geographical levels are made, and because state/region level covariates are often not included in studies. It is possible that the frailty effects observed at the contextual levels may be attributable, in part, to the frailty effect at lower levels. The literature does not show that any child survival study has explored frailty effects at the family, community and state levels simultaneously while exploring possible spatially structured frailty at the point-location (households or communities) level or aerial/lattice (state, region) level. Increasing availability of geographically referenced data and remotely sensed geographic information, coupled with the recent advances in Geographic Information Systems (GIS) now makes it possible to measure certain contextual variables and integrate them with routinely collected survey data. Advances in spatial statistics analysis also facilitates the modelling of CM data using appropriate statistical techniques that take into account frailty at multiple levels for researchers to understand which level of heterogeneity (observed or unobserved) plays a greater role in the child?s risk of death. The results from such analyses, when combined with mapping, provide a good way of visualizing mortality disparities, thereby facilitating the identification of areas where the situation warrants immediate action, and in the subsequent allocation of resources and interventions for meaningful and uniform reduction in CM. Employing appropriate statistical techniques with GIS capabilities, this research thus seeks to understand the determinants of CM and its differentials in the 37 states of Nigeria using a combination of data from the 2003 Nigerian Demographic and Health Survey (NDHS) and other contextual sources, such as information on malaria prevalence from the Mapping Malaria Risk in Africa (MARA) database. 1.2 Background of the study area The West African country of Nigeria, with an estimated population of 145 million and an annual population growth rate of 2.4%, is the largest country in Africa and the tenth most populous country worldwide. Administratively, Nigeria is made up of 6 geopolitical zones consisting of 36 states and a Federal Capital Territory (FCT) at Abuja (see Figure 7 1). These make up the 37 spatial units considered in this report, all of which will be referred to as states for convenience. The states are further divided into 774 Local Government Areas (LGA). The country has diverse climatic and topographic conditions, and is also ethnically, culturally and religiously diverse (NPC, 1998; NPC and ORC Macro, 2004). Figure 1: Map of Nigeria showing the 37 spatial units considered. Source: (NPC and ORC Macro, 2004). Statistics indicate that about 45% of Nigeria?s total population is less than age 15, with about 20% (24 million) under age five. In 2003, the total fertility rate (TFR)1 for the country was 5.7 (NPC and ORC Macro, 2004). Nigeria is endowed with abundant natural resources, but the country?s Gross National Product (GNP) per capita of $320 and an estimated 70% of the population living below the poverty level of US$1 per day makes it 1 TFR is the average number of children a woman is expected to have if she experienced the current age- specific fertility levels for the whole of her reproductive life. 8 one the poorest countries in the world (World Bank, 2004; UNICEF, 2002). The phenomenon of poverty in the country, although widespread, is more concentrated in the rural areas and in the northern states due to differential accessibility and availability of government services (Department for International Development [DFID], 2000). In 2002, only 38% of Nigeria?s population had access to adequate sanitation, while about one third of the population lacked access to safe water (World Bank, 2004). In terms of literacy, the percentage of adult females 15 years and above who were literate increased from 54.3% in 1999 to 59.4% in 2002 but these figures were still short of the male literacy rate which improved from 71.0% to 74.4% in the same period2. The primary school net attendance ratio, between 1996 and 2003 was also higher for males (64%) than for females (57%)3. 1.3 Problem statement In explaining child mortality differentials in Nigeria, three related issues are yet to be addressed: 1) assessing the influence of measurable community level variables on child survival 2) accounting for frailty (differing variances) in child survival at multiple levels and 3) describing the spatial patterns of child mortality risk across the country. Addressing these issues in the context of appropriate statistical modelling constitutes the main focus of this study. 1.4 Aims and Objectives of the Study The main aim of the project is to account for the influence of contextual factors and frailty on CM in Nigeria and to investigate the spatial patterns of CM in the country. The objectives are to: ? Evaluate the contribution of community level contextual factors to CM ? Determine the sources of frailty (household, cluster/community, and states) that are important in explaining CM differentials. ? Evaluate the effect of frailty terms on model estimates and model fit. 2 World Development Indicators database, April 2005 (accessed: 17th, July 2005) 3 http://www.childinfo.org (accessed: 17th, July 2005) 9 ? Examine the potential bias incurred when the spatial dependence in the data is ignored and study the spatial pattern of CM risk across the 37 states of Nigeria with the aid of maps depicting the geographical differentials. ? Examine the differences between the models in order to understand the implications of using the wrong model. 1.5. Organization of the Report The remainder of the research report is organized as follows. In Chapter Two, a literature review of CM related issues will be considered. An overview of the study area is also given in this chapter. Chapter Three contains a discussion on the data sources; an introduction to the statistical methods employed in the study i.e. the Bayesian model and the computational approach. The results of the study are presented in Chapter Four. Finally, conclusions are drawn in Chapter Five and various recommendations and suggestions for future research are given. 10 Chapter Two: Literature Review There are two main aspects of the literature reviewed in this chapter. The first aspect deals with the CM conceptual frameworks, modelling frameworks and what has been done in Nigeria, while the second aspect is a summary section pulling together what various authors have contributed to the topic. 2.1 Conceptual Framework A number of conceptual frameworks have been developed for the study of child health and survival. Models developed by demographers and economists (such as Schultz, 1984), lay great premium on the role of demographic and socioeconomic variables in determining mortality, while epidemiologists (such as Venkatacharya, 1985) place emphasis on the role of biomedical factors in morbidity studies. The two most referenced frameworks are those of Mosley and Chen (1984) and Schultz (1984). Mosley and Chen?s (1984) framework, developed for the study of IM and CM in developing countries, is considered to be the most comprehensive and systematic model developed and is the most referenced in the literature relating to child survival (Ruzicka, 1989; Masuy-Stroobant, 2002). Mosley and Chen (1984) employed a multidisciplinary approach, incorporating social and medical science research methodologies in the development of their model, which has both mortality and morbidity as outcomes. According to the framework, three sets of socio-economic factors operate through a set of five intermediate/proximate determinants namely; maternal fertility factors, environmental contamination, nutrient deficiency, injury and personal illness control to influence the level of IM and CM in a society. The socioeconomic variables include variables at individual, household and contextual levels. 11 Figure 2: Schematic presentation of the conceptual framework for the study of Child mortality adapted from Mosley and Chen (1984) and Schultz (1984). Schultz (1984) also makes a clear distinction between endogenous and exogenous causes of CM, and provides an additional mechanism for studying the unobserved influence on child survival. The framework proposed for this study is thus based on an adaptation of Mosley and Chen?s (1984) and Schultz (1984) frameworks (see Figure 2). The dependent variable in the model above is U5M. As in the Mosley and Chen (1984) and Schultz (1984) frameworks, the representation above assumes that the context-level factors operate through the proximate determinants (which are mainly individual level attributes) to influence mortality. Variables such as nutrition, immunization or other health care factors, which appear under the classical proximate determinant category in Mosley and Chen?s (1984) framework, are captured under the community level factors in Figure 2. 12 This has been done so that knowledge about the possible influence of the variables can be gained rather than speculated, in view of the fact that information on the variables often exists for a limited number of children. The unobserved factors at the household, community and state levels represent those other variables that are seldom captured and whose influence can be deduced based on the strength of the random term included in the model at the various levels. 2.2 Pathway of Influence The following section reviews issues associated with some of the variables in the framework above. 2.2.1 Proximate Determinants The proximate determinants as enumerated in Figure 2 above consist mainly of the demographic and biological characteristics of the mother and her child. Starting with the mother?s characteristics, maternal age at the time of child?s birth is known to exhibit a u- shaped relationship with child mortality; with mortality risk higher for children of younger and older women4 (Hobcraft, McDonald and Rustein, 1984). The higher mortality among children of younger women can be attributed to their biologically immature reproductive system which results in their offspring having low birth weight, while the depletion of the maternal resources which progresses with age, makes the children of older women more susceptible to higher mortality. Studies have shown increased mortality risk among children born after short birth intervals, citing maternal resources depletion, competition amongst siblings, and increased transmission of disease due to crowding as the major factors (Hobcraft, McDonald and Rustein, 1985; Palloni and Millman, 1986). Turning to the child?s own attributes, male children generally experience higher mortality than female children primarily due to biological reasons. Higher female mortality is associated with cultural values especially in societies with strong male-child preference, in which case, biased allocation of health, nutrition and other resources in favour of the male -child explains the sex differential (D?Souza and Chen, 1980; Das Gupta, 1987). 4 Usually less than 15 and greater than 35 years old respectively. 13 2.2.2 Household level factors Mother?s education has been described as the single most important determinant of child mortality (Caldwell, 1979). The education of mothers exhibits an inverse relationship with child survival, such that children of educated mothers experience lower mortality relative to children of uneducated women, with the relationship persisting even after controlling for other variables (Caldwell, 1979). Although, there is unanimity regarding the importance of maternal education on child survival, there is no agreement regarding the pathway through which mother?s education influences mortality. Studies have shown that education equips the woman with the necessary knowledge and power which enables her to among other things, break away from harmful traditional practices, provide better domestic child care, participate better in child decision making and effectively utilize modern medical facilities in a timely manner (Caldwell, 1979; Hobcraft, McDonald and Rustein, 1985; Cleland and Van Ginneken, 1988). Others argue that the observed relationship between maternal education and child mortality may be as a result of certain independent/external factors such as access to toilet facilities and water, husband?s education, fertility behaviour, breastfeeding and education of others in the community (Behrman and Wolfe, 1987; Tulasidhar, 1993; Desai and Alva, 1998; Adetunji, 1995). Mother?s occupation has a mixed impact with child survival. Mother?s work may reduce the time she spends breastfeeding and in taking care of her child which may lead to increased mortality (Peterson, Yusof, DaVanzo and Habicht, 1986), but may also contribute to improved survival since working mothers who are educated may be better informed about immunization and child care trends. Father?s education is often ignored in child mortality studies, but fathers in the developing world tend to make decisions regarding fertility, contraception and use of health care services, thus, decisions regarding child health and survival may also depend on the father and his level of education (Kuate- Defo and Diallo, 2002). With regards to toilet and water, studies have shown that childhood mortality is lower in households with piped water and flush toilets and the impact of these factors is more pronounced as the child gets older and has more frequent contact with the environment5 (Balk, Pullum, Storeygard, Greenwell and Neuman, 2003). 5 This is the stage of physical development, where the child does a lot of crawling and is more vulnerable to the effects of dirty environment. 14 Cultural disparities also exist in child mortality rates and have been captured in the literature using variables such as religion and ethnicity. Child mortality is often higher for children from Moslem and Traditionalist backgrounds than for Christian children, and cultural beliefs/attitudes about diseases and child care as well as the low status of women in certain religion explain the differentials (Caldwell and Caldwell, 1993; Gregson, Zhuwau, Anderson and Chandiwana, 1999; Ogunjuyigbe, 2004). 2.2.3 Community level factors Mortality differentials by type of place of residence constitute the focus of a lot of studies. It has been highlighted that child mortality in urban areas is lower than for rural areas and the phenomenon may be attributed to the greater availability and accessibility of medical care facilities, public infrastructure such as safe water supply, as well as better income and education opportunities present in the urban areas. The study by Sastry (1997c) has however found that the observed mortality differentials by place of residence can be attributed to the role played by community level variables. Previous research has shown a strong relationship between community level factors and child mortality. Typically, mortality risks are greater for children living in areas with: high HIV prevalence, low immunization coverage, high incidence of drought and food shortages (Adetunji, 2000; Hill, Bicego and Mahy, 2001; Curtis and Hossein, 1998; Balk et al., 2003) Population density has a U-shaped relationship with child mortality, with children resident in low and high-density areas at an elevated risk of dying (Balk et al., 2003). High population density means an increased possibility of disease transmission and a greater competition for food, conditions which may lead to death. Low population density on the other hand means reduced access to health care and overall socioeconomic factors, implying a greater risk of mortality. With regards proximity to the coast (a proxy to easy access to markets), it has been shown that the risk of childhood death increases the further one resides from the coast (Balk et al., 2003). 15 2.3 Child Mortality and its differentials in Nigeria Nigeria, like other developing countries lacks accurate and comprehensive data6 on the status and causes of childhood mortality. Available information however suggests that childhood mortality has declined over the years. For example, U5M rates declined from 290 in 1960 to 198 in 2003, while IM dropped from 165 to 98 in the same period (UNICEF, 2005). The mortality decline noticed especially in the late 1970?s and early 1980?s have been largely credited to the public health programmes initiated by the international community particularly in the area of immunization against the childhood killer diseases. Most childhood deaths have been attributed to pneumonia, malaria, measles, acute respiratory illness and diarrhoea ? disease conditions that are preventable or treatable using low-cost interventions (NPC and ORC Macro, 2004; POLICY Project, 2002). Despite the earlier gains recorded in CM reduction, Nigeria currently occupies the 13th position amongst the countries in the world with the highest U5M rates (UNICEF, 2005), a position which suggests that more needs to be done in the area of child survival. The pace of mortality decline within Nigeria has also not been uniform and consequently, CM rates exhibit wide geographic variation. The geographical pattern is however hard to discern for the whole country since available studies are highly localized7. The reports from the 1991 census and the 3 rounds of DHS conducted in the country, paint broad regional variations in CM rates. For example, the 1991 Nigeria census recorded the lowest IM rate (57/1000) for the southwest region and the highest (99/1000) for the northwest region (NPC, 1998). A similar pattern was reported in the 2003 Nigeria Demographic and Health Survey (NPC and ORC Macro, 2004) where the lowest under- five mortality rate of 103/1000 was reported for the South East, and the highest rate of 269/1000 was reported for the North-West region of the country. (NPC, 1998; NPC and ORC Macro, 2004). These studies used simple descriptive statistics and employed cross tabulations to show differential mortality patterns stratified by covariates such as the 6 Detailed information on child health and survival in Nigeria has come from nationally representative surveys such as the DHS, MICS and WFS mostly conducted by international organizations. 7 Most studies deal with selected geographical units such as a few regions, states or communities (Adedoyin and Watts, 1989; Iyun, 1992; Adetunji, 1995; Ahonsi, 1995; Ogunjuyigbe, 2004), and only occasionally consider the country as a whole (NPC, 1998; NPC and ORC Macro, 2000 and 2004; Adebayo et al., 2004; Adebayo and Fahrmeir, 2005). 16 child?s sex, mother?s education and place of residence. State-wise variations in CM rates were reported in the 1991 census report, but the analysis did not include socio-economic factors to account for the observed variations and variations at levels lower than the state were not considered. Regarding the determinants of child mortality, studies have been highly localized (dealing with specific areas such as regions, states or localities), and only occasionally applying to the country as a whole. Amongst the local studies are research by Adetunji (1995), Iyun (1992), Ogunjuyigbe (2004), Adedoyin and Watts (1989), Owa and Osinaike (1998), Feyisetan, Asa and Ebigbola (1997) and Lawoyin (2001). Important factors that affect child mortality documented in these studies include place of residence, education, tradition, toilet facility, water supply, access to medical and antenatal care. Amongst the few recent country-wide studies that have dealt with the issue of child survival in Nigeria are the descriptive reports of the 1991 census (NPC, 1998) and those from the 1990, 1999 and 2003 NDHS (NPC, 1991, 2000; NPC and ORC Macro, 2004) as well as results from more recent systematic assessments (Adebayo et al., 2004; Adebayo and Fahrmeir, 2005; and Kneib, 2005). The reports from the 1991 census and the 3 rounds of DHS conducted in the country, paint broad regional variations in child mortality rates and use simple cross tabulations to show differential mortality patterns by variables such as women?s education, child?s sex and place of residence State-wise variations in child mortality rates were also reported in the 1991 census but the analysis did not include socio-economic factors to account for the observed variations. Turning to the studies that investigated the determinants of child mortality in a more detailed fashion, Adebayo et al. (2004) using data from the 1999 NDHS investigated the spatial distribution of IM (neonatal and post neonatal mortality) enquiring whether the determinants of a child?s death differed in the different age groups considered. Their results from a geo-additive modelling (details of which are discussed in Chapter 3); show that spatial variation and the determinants of mortality differed considerably for the two age groups studied. Improved maternal education, being Christian, not being first born, being a singleton birth, and having assistance at birth significantly reduced the risk of 17 neo-natal mortality but the effect of the variables were less for post-neonatal mortality. Location effects influencing neonatal mortality also appeared to be negatively correlated with effects influencing post-neonatal mortality and speculations about the spatial differentials found tied to crowding, poverty, poor health service and geography which were not explicitly included in the models (Adebayo et al., 2004). Using the same dataset, Adebayo and Fahrmeir (2005) analyzed child mortality in Nigeria with flexible geo-additive discrete-time survival models which allows for the measurement of small-area spatial effects simultaneously with possibly non-linear or time-varying effects of other covariates (details of model are discussed in Chapter 3). Their results revealed mother?s age (22-35 years), birth delivery assistance, hospital delivery and high preceding birth interval to be associated with lower child mortality risk. Distinct spatial patterns were also observed in their analysis, with significant high mortality associated with 4 of the 37 states, while lower mortality was associated with 6 of the 37 states (mostly northern states). The spatial variations were interpreted in terms of variables which were not captured in their analysis, including: disease environment, ethnicity/religion, topography, drought and malaria (Adebayo and Fahrmeir, 2005). Apart from the works of Adebayo et al. (2004) and Adebayo and Fahrmeir (2005), most existing studies in Nigeria have ignored the spatial component of the dataset used and have not included frailty terms in their statistical model to take care of clustering or unobserved heterogeneity at any spatial unit. More recent systematic assessments of the determinants of CM by Adebayo et al. (2004) and Adebayo and Fahrmeir (2005), using data from the DHS, employed spatial statistical techniques which allow for the simultaneous measurement of small-area spatial effects and the effect of other covariates. In both studies, the effects of variables which were not captured in their analysis, including: disease environment, ethnicity/religion, topography, drought, crowding, poverty, poor health service and malaria were given as possible explanations for the resulting geographical variations observed. This suggests that where possible, the omitted factors should be included in statistical modelling of CM data. Only a few studies in Nigeria have included frailty terms in their statistical models to account for clustering or unobserved heterogeneity at any level, while most studies fail to properly consider the spatial component of their dataset (Adebayo et al., 2004). 18 Chapter Three: Data and Methods 3.1 Data Sources The study utilizes secondary data from multiple sources, with the majority of the data coming from the 2003 NDHS8. The NDHS was jointly conducted by the NPC and ORC Macro International USA. The 2003 NDHS is a nationally representative sample of urban and rural areas in which a 2-stage sampling design was employed. In the first stage of sampling, 365 Enumerator Areas (EAs) or clusters were randomly selected over the country, with probability proportional to size (PPS) of population from a list of EAs developed from the 1991 population census, where the measure of size is the number of households in the EA. In the second stage, a systematic random sample9 of 7,864 households was selected from the chosen EAs. All females between 15 and 49 years and males in the 15-59 age groups who were permanent residents or visitors in the selected households on the night before the survey were eligible for interview (NPC and ORC Macro, 2004). Using structured questionnaires administered to the eligible women, detailed information pertinent to all live births that had occurred to the chosen women in the 5 years before the survey was collected in addition to a complete birth history. A host of other demographic and health related information were also collected in the survey including the child?s date of birth, birth weight, sex, survival status and age at death for deceased children. Data on parental education and occupation, type of place of residence, and household wealth, in addition to a host of other health and socio-economic factors were also obtained. 8 DHS data are considered to be the most detailed source of demographic and health related information available in most developing countries where vital registration systems are virtually non-existent. Despite the heaping of reported ages at death and under-enumeration/reporting inherent in some of them, the surveys are also considered to be high quality sources for mortality data (Bicego and Ahmad, 1996; Curtis, 1995). 9 The procedure involves first selecting a starting household at random from the household listing, and then selecting every kth household - k is the sampling interval calculated as k=N/n (where N is the total number of households and n is the number of households to be selected). 19 A child-based dataset consisting of information on 6029 children born in the five years preceding the survey was constructed using the data from the different survey questionnaires. The 2003 NDHS also collected location information (longitude and latitude coordinates) for each survey cluster using handheld Global Positioning System (GPS) devices, to aid easy linkage of the dataset to other geographically referenced data sources and to facilitate small area mortality studies. Community (cluster) level measures of health are constructed from the DHS dataset using information such as incidence of illness (fever/cough and diarrhoea) in the previous two weeks, immunization and health facility use. Due to the fact that the 2003 NDHS did not collect information on some variables intended for use in the current analysis, supplementary information has been obtained from other sources (see Table 1). Values of all geographic variables have been obtained through the use of GIS software (Arc View GIS, ESRI (2003)) to each of the 2003 NDHS cluster locations, using the cluster GPS database provided by Macro International, to obtain a cluster-level dataset. The cluster- level file has then been linked to the child-based data file using the cluster identification number common to both datasets, to obtain an integrated child-level dataset consisting of all data variables relevant to the analysis. Table 1: Community level contextual factors to be considered in the study Variable Source Description Population Density CIESIN: Gridded Population of the World (GPW) v. 3 www.beta.sedac.ciesin.columbia.edu/gpw Population Density per EA Coastal proximity National Imagery and Mapping Agency (NIMA) Digital Chart of the World (DCW)- derived continent boundary Distance (Euclidean) to nearest point on the coastline Malaria Endemicity Mapping Malaria Risk in Africa (MARA) http://www.mara.org.za/lite/information.htm. Malaria Endemicity per EA Distance to roads National Imagery and Mapping Agency (NIMA) Digital Chart of the World (DCW) Distance (Euclidean) to nearest point on the road 20 3.2 Statistical Methods There are three main stages of analysis considered in this project. The first stage involves preliminary analysis relating to the variables to be used and their relation to survival. The second stage involves an investigation of the correlation and spatial correlation structure of mortality, and finally, the last stage of analysis involves fitting more complex models to the data. The complex models have the ability to take into account the discrete time nature of the data and other special features of the dataset and have the ability to incorporate covariates at different levels in the form of fixed or random effects. Summary statistics and Kaplan Meier curves Summary statistics (means, standard deviations and percentages) are used to examine how varied the surveyed children are with respect to the covariates. The study investigates the survival of a child following a 60 months exposure period (from birth to age 5). The time until the death of the child is thus the main outcome of interest. There are two key problems with this kind of data. Firstly, there is the issue of skewness of the survival times which arises due to some children having very long survival times and others having comparatively short survival times. This often implies that normality assumptions are violated and the data cannot be analyzed using conventional statistical techniques. The second is the problem of censoring. The children who are not yet five years old in the case of U5M and those who have not observed death at the end of the interview period are considered censored. The type of censoring in this case is known as right censoring which simply means that some children stop being observed before their deaths are observed, but each child is at least observed for some of the period, and thus, some information is collected about each child?s survival. The analysis of right-censored time to event data of this nature falls under the umbrella of survival analysis whose main goals include the estimation of the survivor and hazard functions, comparison of survival curves and the investigation of the effect of explanatory variables on survival times. Survival functions can be estimated either parametrically or non-parametrically. Parametric analysis is employed when the survival 21 times fit a theoretical density function such as the Weibull, Gompetz, Exponential, Lognormal or Gamma distribution in which case, parametric maximum likelihood estimation is used in modelling the survival function (Lee, 1980; Klein and Moeschberger, 1997). Nonparametric methods make no assumption about the functional form of the survival function, but instead, they use the information contained in the duration variable thus letting the data set speak for itself and as a result reducing the chances of misspecification of the true functional form of the survival function. The Kaplan-Meier (K-M) method (Kaplan and Meier, 1958) is the most widely used non- parametric method of estimating the survival function ? ?S t , the probability that a child survives longer than time t. This method utilizes information from both the fully observed as well as the right-censored children. The K-M estimate at time t is given by: ? ? ? ? ttj j jj j n dn tS | )()( where jn is the number of children at risk of death at time jt and jd is the number of deaths at time jt . The K-M is based on several assumptions namely, ? the sample is chosen randomly and independently from a larger population, ? the deaths occurred at the times specified, ? the survival probabilities are the same for children interviewed early and late in the study, ? censored children have the same survival prospects as uncensored children, ? time to censoring and survival times are independent. The survivor function is usually presented as a K-M curve which is a plot of probability of survival ? ?S t on the vertical axis against survival time t on the horizontal axis. Vertical drops indicate times at which an event (in this case, death) was observed, while censored times are indicated by short vertical lines. The survival probability at a certain time, median survival time, mean survival time and other quantiles are summary statistics 22 that can be extracted from the survival curve. It is also often of interest to ascertain whether the survival curve of one group of children is different from another. As an example, we may want to know if male children live longer than females. This type of comparison can be achieved visually by comparing the survival curves or through statistical tests. The log-rank test (Mantel, 1966; Peto and Peto, 1972) is the most common method used in statistically comparing the overall difference between the survival curves for two or more groups. The log-rank statistic tests the null hypothesis that at any time point, the survival functions for all groups are equal, against the alternative hypothesis that at least one survival function is different from the others for some time periods. In other words, for g groups, the log-rank statistic tests: ? ? ? ? ? ?0 1 2: ... gH S t S t S t? ? ? for all t ?? against H1 : at least one of the ? ? 'gS t s is different for some t ?? (where ? is the largest time during which each group has at least one child at risk). The log-rank statistic is given by: ? ? 2 2 g g g g O E E ? ? ?? where Og is the observed number of deaths in each group, and Eg is the expected number of deaths in each group g assuming a null hypothesis of no difference in survival between the groups. Og and Eg are calculated for each time when an event occurs. The log rank test is based on the same assumptions as the K-M given above, and under H0, the log? rank statistic is 2? with G ? 1 degrees of freedom, where G is the number of groups being compared. The decision to reject the null hypothesis is made using chi-square tables with the appropriate degrees of freedom. 23 Methods used for investigating the correlation and spatial correlation structure Tests for correlation at various contextual levels will be used to detect the presence of spatial association in the data so that the appropriate frailty terms can be incorporated in the multivariate modelling in order to eliminate potential bias that will otherwise be present if such frailty terms are not included. To this end, cross tabulations are used to study the distribution of births and deaths and reveal the possible clustering of mortality at the household and community levels. At the state level, Exploratory Spatial Data Analysis (ESDA) techniques of Global and Local indicators of Spatial Autocorrelation (LISA) (Cliff and Ord, 1981; Anselin, 1995; Ord and Getis, 1995) are employed to determine the extent of spatial association and presence of spatial clusters of U5M rates. The Moran?s I statistic (Moran, 1950) is a single global measure that tests for spatial association of a phenomenon. The Moran?s I is defined as: ? ?? ? ? ? 2 /ij i j ii j iI w x x x? ? ?? ? ? ?? ? ? where ijw represents the spatial weight matrix elements, ix is the measure of U5M rate in state i , and jx is the measure of U5M rate in neighbouring state j , and ? is the average U5M rate for the country. A spatial weight matrix can be defined either by contiguity (where states share common boundaries) or by distance (where state centroids are within certain distance criteria). Contiguity-based weight matrices include Rook Contiguity (which uses only common boundaries to define neighbours) and Queen Contiguity (which uses all common points or borders). Distance-based weight matrices include distance bands and k nearest neighbours. For contiguity-based matrices, the matrix elements can broadly be defined according the following criteria: 1ijw ? if states i and j are adjacent and zero otherwise, The matrix elements for distance-based matrices on the 24 other hand can be defined according the following criteria: ( ) 1ijw d ? if state j is within distance d from state i and zero otherwise. The Moran?s I, like the Pearson?s correlation coefficient, assumes values between -1 and +1. A value of +1 indicates strong positive autocorrelation; a value of -1 indicates strong negative autocorrelation, while a value of 0 indicates a random distribution of U5M rates. The significance of Moran?s I is obtained using the permutation testing approach. Anselin (1995) describes the LISA for each state i, and uses this to provide a value of spatial association for each state under consideration. The LISA for state i is defined as: , where ( ) i i i ij j i x x I z w z z SD x ? ? ?? LISA allows for identification of four different types of spatial clusters: ? High-High Cluster: States with high values of U5M surrounded by states that have high values of U5M (positive association ? Hot spot) , ? Low-Low Cluster: States with low values of U5M surrounded by states that have low values of U5M (positive association ? Cold spot) , ? Low-High Clusters: States with low values of U5M surrounded by states that have high values of U5M (negative association - spatial outliers) and ? High-Low Cluster ? States with high values of U5M surrounded by states that have low values of U5M (negative association - spatial outliers). 25 Multivariate statistical model The K-M curves and the log-rank test described above provide univariate analyses useful in assessing whether a covariate affects survival and are most suitable for descriptive purposes. They are particularly handy when the predictor variables are categorical and do not work easily with continuous10 predictors such as age of mother at birth. However, they do not allow us to say how survival of a group is affected with the influence of other covariates included in the model. The Cox model (Cox, 1972) is commonly employed in analysing survival data in a multivariate way, allowing the effect of a set of covariates on survival time to be assessed. The Cox model also handles censored data, categorical and continuous variables as well as variables that change over time, all of which may influence survival. The Cox model also allows for frailty to be included at various levels, but its assumption that time is measured on a continuous scale makes it inappropriate for the current data. In the DHS surveys, the survival times of children are measured discretely in months which results in a lot of tied events, and which causes problems when continuous time models are used. A discrete formulation of time is therefore more appropriate than the Cox approach since tied events are not a problem with the discrete- time approach. A standard discrete-time multilevel hazard model (Goldstein, 1995) is the first choice for this type of data. Such techniques have, however, been found to be inappropriate in cases where it is assumed that frailty at some level follows strong spatial patterns (Chaix, Merlo, Subramanian, Lynch and Chauvin, 2005). The modelling framework for the study needs to take into account the special features of the dataset, whilst ensuring that the aims of the study are met. The Bayesian geo-additive discrete-time survival model described below can accommodate all the features of the dataset namely, presence of censored observations, non-linear and time varying covariates, frailty and spatial dependence. One major advantage of the Bayesian framework is that it allows for the inclusion of prior knowledge about the parameters 10 Continuous covariates have to be arbitrarily divided into quartiles or other biologically meaningful groups and then treated as categorical covariates. This often leads to the loss of information contained in such variables. 26 along with information contained in the data to produce more robust results. The fact that the modelling process in the Bayesian framework does not reply on asymptotic theory also makes it possible to work with small sample sizes (Congdon, 2003). 3.2.1 The Geo-additive Discrete-Time Survival Model The modelling details given below are derived from the works of Berger, Fahrmeir and Klasen (2002), Adebayo and Fahrmeir (2005), Hennerfeind, Brezger and Fahrmeir (2006), Knieb (2005) and Fahrmeir and Tutz (2001). Consider the survival times ? ?1,..., 60T k? ? in months, where T t? denotes death of a child in month t and k is the last observation in the interval. Let itx be a vector of covariates observed up to month t . The discrete-time conditional probability of death in month t given that the child survived up to month t , is given by: ? ? ? ?| | , (1)it itt x P T t T t x? ? ? ? In a right?censored survival dataset such as ours, it is assumed that each child?s survival information is captured as ? ?,i it ? , where it is the observed lifetime or time until death for child i , and ?i is a censoring indicator with a value of 1 if child i is alive and 0 if the child is dead. For ease of analysis, the discrete-time survival model is often represented in the form of a logistic regression model by defining binary event indicators , 1,...,ity t T? i 1 if and =1 (2) 0 i it t t y Otherwise ??? ? ? ? The equation in (1) is thus written as a binary response model given by ? ? ? ?| (3)it it itP y x h ?? (3) where h is the response or link function, and it? is a vector of covariates. Equation (2) can be treated as a probit, logit or multinomial function, with logit models being easier to estimate and interpret (Crook, Knorr-Held and Hemingway, 2003; Adebayo and Fahmeir, 2005). 27 An expression for the logit model is: ? ?1| (4) 1 it it it it e P y e ? ??? ? ? (4) with a partially linear predictor ? ? ' (5)it o itg t x? ?? ? (5) where ? ?og t , 1,2,....t ? is the baseline hazard effect and ? are fixed effect parameters. Equations (4) and (5) may be represented as: ? ?? ? ? ?' ( 1| ) exp exp (6) ( 0 | ) it it o it it it P y x g t x ?? ? Equation (6) can be regarded as the basic form of a semi-parametric survival model, where the baseline hazard ? ? , 1,2,...og t t ? is an unknown, usually non-linear function of t to be estimated from the data. Incorporating the fixed effects, time-varying covariates, and spatial effect yields a geo- additive representation for equation (6) given by the expression: ? ? ? ? '( ) ( ) ( ) (7)it o j ij j ij spat i i i it g t g u f x f s b c x? ?? ? ? ? ? ? ?? ? where ? ?og t is the baseline function of time, ( )jg t is the time-varying effects of covariates ju , ? ?j ijf x are non-linear effects of continuous covariates, ? ?spat if s is the effect of the state/district ? ?1,...,is S? , and i ib c represent the cluster and household- specific frailty effects respectively, while xij are the fixed effect covariates, and ? is the vector of parameters. The spatial term, ? ?spat if s may be further split into a spatially- correlated (structured) and an uncorrelated effect. That is, ? ?spat if s = ? ?str if s + ? ?unstr if s . 28 3.2.1.1 Prior distributions for covariate effects The unknown model parameters ? and functions , , ando j j spatg g f f in equation (7) are considered random variables in a Bayesian framework and must be supplemented with suitable prior distributions for inference purposes. The choice of prior generally depends on the type of the covariate and a vast amount of literature exists detailing the treatment of covariates and prior specifications (including: Gelman, Carlin, Stern and Rubin; 1995, Leonard and Hsu; 1999, Carlin and Louis; 2000 and Bernardo and Smith; 2000). To this end, the specification of priors and hyper-parameters for each group of covariates follow the works of Berger et al. (2002); Adebayo and Fahrmeir (2005) and Hennerfeind et al. (2006), and are as follows: 3.2.1.2 Priors for fixed effects In the absence of any prior knowledge about the covariates, independent diffuse priors (uninformative priors) ? ? jp constant? ? are the most popular choice for modelling fixed effects. 3.2.1.3 Priors for continuous and time varying effects The continuous and time varying effects in equation (7) are often assumed to vary smoothly and are modelled using the Bayesian Penalised Splines [P-splines] (Eilers and Marx, 1996; Lang and Brezger, 2004). In this approach, the function ? ?j jf x is approximated by polynomial splines of degree q, i.e. ? ? ? ?j j jm m jf x x? ??? where m? is the thm basis function and 1 2( , ,..., )m? ? ? ?? is a vector of regression coefficients. 29 3.2.1.4 Priors for Unstructured frailty All uncorrelated random effects which include the group random effects (family and community random effects) as well as the unstructured spatial state effect are assumed to be independent and identically distributed ? ?. .i i d Gaussian. The family random effect is modelled as: ? ?20,i cc N ?: , the community random effect is modelled as: ? ?20,i bb N ?: and the unstructured spatial effect is modelled as: ? ?20,unst unstf N ?: . 3.2.1.5 Priors for the spatially structured frailty Spatial data is generally of two types: the point-location data which is based on measurements taken at exact locations in space (e.g. from exact longitude and latitude coordinates of a households or community) and aerial/lattice data which based on data gathered by artificially defined sites (usually administratively defined locations such as county, state, region). Structured spatial effects ? ?strf s are estimated either based on Markov random field (MRF) priors for lattice data or Gaussian random field (GRF) priors in the point-location data). Since we are interested in how the phenomenon of U5M varies over states (which are by nature lattice structures), the MRF prior which deals with lattice data is the preferred approach and is discussed below. The MRF prior was proposed by Besag, York and Mollie (1991) for the correlated spatial effects. The MRF prior introduces a structure based on neighbourhood (areas are neighbours if they share a common boundary) and the mean effect of a phenomenon under consideration is taken as the mean of the effects of the neighbouring areas. Let ? ?str i jsf s ?? be the structured spatial effect in equation (7), then the MRF prior is given by 2 ' 2 ' ' 1 | , , , j js js i i j js s s s s N N N ?? ? ? ?? ?? ? ? ? ? ?: 30 where ' i ss ?? denotes the set of neighbours of state s, Ns = number of neighbours and 2 j? is the variance parameter that controls for spatial smoothness. 3.2.1.6 Hyperparameters In a fully Bayesian analysis, the variance parameters are also considered unknowns and are estimated by assigning priors to them (also called hyperparameters); thus, allowing for the simultaneous estimation of the variance parameter and the corresponding unknown functions. The hyperparameter is commonly assumed to be inversely gamma distributed (IG (a, b), with the scale parameter a >0 and shape parameter b > 0 and a and b chosen such that the prior is weakly informative). The values of a and b reflect different degrees of uncertainty about the variance parameter. A common choice for the hyperparameters are a=1 and a small value for b. An example is: a = 1, b = 0.005.This yields a flat distribution which is similar to a situation of no prior knowledge on the parameter space. Another common choice for the priors involves specifying equal scale and shape parameters (that is: a=b). An example of this is a=b=0.001, which yields a weakly informative but proper prior closely approximating the Jeffrey?s non-informative prior and works better in sparse data situations. Crook et al. (2003) notes that decreasing the value of the shape paramater b corresponds to a lower prior guess of the size of the variance, since the inverse gamma distribution has its mode at b/(a+1). Finally, in the Bayesian framework, it is assumed that all priors for parameters are mutually independent (Bolstad, 2004). 3.2.1.7 Inference / Estimation Inference for the posterior distribution of the model parameters is fully Bayesian and is based on the Markov Chain Monte Carlo (MCMC) simulation technique. The MCMC simulation basically involves generating samples from the posterior distribution of the unknown parameters. Two major algorithms used for producing fully Bayesian estimates are the Gibbs Sampler and the Metropolis-Hastings (M-H) algorithm. The Gibbs Sampler simulates new values for a parameter based on the conditional distribution of that parameter. After each iteration step, new values are used to replace the old ones. This 31 process is then repeated until the estimates converge. Using the M-H approach on the other hand involves generating estimate values from a proposed distribution and then comparing the values to those from the previous iteration step using posterior probabilities. A decision is then made to either accept or reject the values based on the acceptance probability. 3.2.1.8 Model comparison In Bayesian data analysis, model comparison and selection are employed for finding the ?best? model, or subset of models, which describe the data, as well as for studying the sensitivity of results to prior specification (Vaida, Ghosh and Liu, 2008). The Deviance information Criterion (DIC) (Spiegelhalter, Best, Carlin and van der Linde, 2002) has been developed for comparing the fit and complexity of hierarchical models in the Bayesian setting. The DIC is an extension of the Akaike Information Criterion (AIC) and is based on the posterior distribution of the deviance statistic. The DIC is calculated as: DDIC D p? ? (8) In the above expression, ? ?D E D ?? ? ?? ? is the posterior mean of the deviance statistic ? ?D ? and represents a measure of the model fit to the data. The deviance statistic is given by: ? ? ? ?? ?2log |D f y c? ?? ? ? where ? ?|f y ? is the likelihood function for the observed data vector y given the parameter vector ? , and c is a constant. In equation (8), Dp is the effective number of parameters in the model (a measure of model complexity) and is calculated as: ? ?Dp D D ?? ? 32 Here, ? ?D ? is the deviance evaluated at ? - the posterior means of the parameters of interest. When the DIC is used for model comparison, models with smaller values of DIC are preferred as they indicate a better fit and lower complexity, and while there is no standard for comparing DICs, the differences in the DIC values of two or more competing models are important (Spiegelhalter et al, 2002). Burnham and Anderson (2002), in the case of the AIC, proposed as a rule of thumb that AIC differences within 1-2 units of the best model suggest similar support for both models (models cannot be differentiated), models with AIC differences of between 3-7 from the best model can be weakly differentiated and differences of more than 7 units is regarded as strong evidence in favour of the model with the smaller DIC. Spiegelhalter et al. (2002) suggest that the rule of thumb works reasonably well for the DIC. The major advantage of the DIC is that it can be easily calculated from output of the MCMC simulation (Spiegelhalter et al, 2002). Another goodness-of-fit measure based on the DIC is the 2 DICR which Miaou, Song and Mallick (2003) defined as: 2 1 model ref DIC max ref DIC DIC R DIC DIC ? ? ? ? The 2 DICR attempts to standardize the DIC in the same way as the traditional 2R (Miaou et al., 2003). In the above expression, modelDIC is the DIC value for the model under evaluation, maxDIC is the DIC value under a fixed one-parameter model and refDIC is a DIC value from a reference model (the best model) which can also be approximated as refDIC n? (Miaou et al., 2003). 33 Chapter Four: Results This chapter presents the results from the various analyses. The chapter begins by introducing the unit of analysis, the choice of variables and the level at which they are introduced as well as a discussion as to whether the variables should be modelled as fixed or random effects. Descriptive results are then presented as well as results from a multivariate analysis. Preliminary analyses, including univariate and bivariate analyses were performed using the statistical package SAS? Version 9.1.3 (SAS Institute Inc., 2002-2004). Multivariate analysis is then conducted and the models fitted are evaluated and compared, examining them as to goodness of fit or potential misfit, and then finally conclusions are drawn as to which model fits the data best. The multivariate analysis including the production of risk maps were implemented using BayesX Version 2.0 (Brezger, Kneib and Lang, 2005), while additional mapping was carried out in GeoDa version 0.9 (Anselin, 2003) and Arcview GIS version 3.3 (ESRI, 2002). 4.1 Unit of analysis and outcome In this report, the individual (child) is the unit of analysis and the outcome variable is the risk of U5M (0?59 months). The overall aim is to assess the extent to which both measured and unmeasured factors at various level of aggregation (household, community and state) affect child survival. The hierarchical structure of the data is depicted in Figure 3 and the definition of the various levels is: ? Individual (child) level: this is defined as the children under the age of five years who reside in the households. In this report, the individual child level is the lowest level and the unit of analysis. ? Household level: this is defined as the household in which the children live. ? Community level: is defined as a group of households in the same geographical area that share a common primary sampling unit within the DHS dataset. 34 ? State level: In Nigeria, the state is the second tier of government after federal government. In the current analysis, each community belongs to one of the 37 distinct geographical locations that represent the states. Figure 3: Hierarchical structure of the dataset 4.2 Variable selection The selection of explanatory variables was guided by the Mosley and Chen (1984) conceptual framework and previous research on child mortality (including Sastry, 1997b; Desai and Alva, 1998; Kravdal, 2004 and NPC and ORC Macro, 2004). The full list of variables consists of socioeconomic and demographic factors at the individual, household, community and state levels and the full description of these variables is given in Appendix A. 4.2.1 The selection and construction of community-level variables The community level characteristics considered in the present study fall into two groups. The first set of community measures consists of variables derived for each of survey clusters from the spatially explicit databases described in Chapter 3. These include population density, distance to road, distance to coast and malaria prevalence. These were obtained by overlaying the DHS cluster locations with the other data sources and extracting the mean pixel value for each of the covariate at the community level (see Figure 4). 35 Figure 4: Maps depicting the nature of spatially explicit variables considered 36 The second set of community variables were based on the aggregation of individual measures from the 2003 NDHS dataset. These variables relate to the health, nutrition and socio-economic conditions in the communities. In order to minimize the number of variables used in the analysis, this set community level variables were grouped into areas such as socio-economic and community environment. Within each group, a principal component analysis was used to obtain summary scores that could be used as an index. These scores were then dichotomized into high and low (Table 2). Table 2: Community level variables from factor analysis Index Items Factor Pattern Factor 1 Eigen value % Variance Explained Community Environment Index Percent with access to clean water in community 0.55 3.03 61% Percent with access to hygienic toilet in community 0.76 Percent with access to finished floor in community 0.79 Percent with access to clean cooking fuel in community 0.88 Percent with access to electricity in community 0.86 Community Health Service index Percent of births delivered in medical facility 0.90 4.48 75% Percent of births with postnatal care 0.93 Percent of births with antenatal care 0.93 Per cent of births delivered by a skilled attendant 0.90 Percent of mothers who had at least one tetanus injection 0.91 Per cent of children 12-23 months fully vaccinated 0.54 Community Child Deprivation Index Percent with risky birth interval 0.35 1.34 45% Percent born to too young or too old women 0.76 Percent of children with high birth order 0.80 Community Maternal Socioeconomic Index Percent with at least secondary education 0.84 2.46 49% Percent White Collar job 0.56 Percent of single women or monogamous unions 0.56 Percent with access to at least one media type 0.71 Average composite score on Autonomy in community 0.79 4.2.2 Final Data set To overcome any potential problems with the analysis, data relating to twins was excluded and data concerning children who were not usual residents of the community in which they were sampled was removed so that community specific factors are not wrongly assigned to children who were not usual residents of the community in which they were sampled. 37 The occurrence of missing values was generally small and observations with missing values were assigned to the ?other? category. In order to avoid unstable categories due to small numbers, most of the categorical variables were re-categorized to be comparable to previous studies. The largest category for each categorical variable was assigned as the reference group. 4.3 Descriptive summaries Table 3 presents descriptive statistics for the main child level variables of interest disaggregated by the child?s survival status. The mean survival times S(t), its standard error SE S(t) and the p-value that indicates if the survival times are significantly different for each group of covariate are included in Tables 3 through 6. The initial discussion is on the distribution of children by survival status and a summary of the findings from the survival analysis is given in section 4.3.1. The final dataset utilized for analysis had information on 5684 children of whom 752 (13%) had died before attaining age five. Table 3 reveals that the sample was almost equally distributed by child?s sex and birth order. The majority of the children had no older siblings or were of preceding birth intervals of more than three years. A high proportion of the sampled children had no succeeding birth intervals by virtue of being last births. As reported by the mothers of the children, most of the children had average/larger birth weights. On the average, mother?s age at child birth was 27 years. About two thirds of the children were delivered at homes, 60% did not receive professional prenatal care, 64% had a traditional birth attendant and only about 46% of the children had antenatal care at a health facility. Long labour at birth was the major problem that mothers had at the birth of the child (24%) and convulsions at birth were the least frequent problems (3%). Taking the child?s survival status into account, the percentage of children surviving was almost the same by gender and birth order. Children with preceding and succeeding birth intervals of less than 24 months had the worst survival. Fewer children also survived in the following groups, small/very small birth size, born to mothers of <18 years, delivered at homes, traditional birth assistance and home as source of prenatal care. 38 Table 3: Descriptive statistics of child level variables Description Dependent # Alive # Dead Total %Alive Mean S(t) SE S(t) P- value Gender Male 2498 395 2893 86.35 41.73 0.29 0.3226 Female 2434 357 2791 87.21 45.56 0.32 Birth order First to third birth 2573 377 2950 87.22 45.55 0.31 0.3038 Fourth or higher birth 2359 375 2734 86.28 41.75 0.30 Preceding Birth Interval No older siblings or > 36 months 2532 301 2833 89.38 46.53 0.30 <.0001 Less than 24 Months 872 206 1078 80.89 39.67 0.53 24 to 35 months 1528 245 1773 86.18 41.74 0.37 Succeeding Birth Interval No younger sibling or > 36 months 3676 349 4025 91.33 43.65 0.22 <.0001 Less than 24 Months 439 260 699 62.8 35.47 0.84 24 to 35 months 817 143 960 85.1 42.58 0.44 Size at Birth Small/very small 718 178 896 80.13 29.61 0.44 <.0001 Average or larger 4214 574 4788 88.01 45.97 0.24 Mothers age at birth <18 Years 398 88 486 81.89 43.23 0.86 0.0005 18-34 Years 3704 517 4221 87.75 42.42 0.23 35 an older 830 147 977 84.95 41.06 0.53 Place of delivery Homes/Others/Missing 3149 586 3735 84.31 44.10 0.30 <.0001 Health Facility 1783 166 1949 91.48 44.07 0.29 Source of prenatal care Skilled Birth Attendant 2101 147 2248 93.46 33.68 0.19 <.0001 Traditional Birth Attendant/Other/None 2831 605 3436 82.39 43.75 0.31 Birth Assistance Trained Medical Personnel 1872 179 2051 91.27 43.98 0.29 <.0001 Traditional Birth Attendant/Other/None 3060 573 3633 84.23 44.06 0.30 Source of antenatal care Homes/Other/None 2819 607 3426 82.28 43.69 0.31 <.0001 Health Facility 2113 145 2258 93.58 33.73 0.18 Long labour at birth? No 3780 515 4295 88.01 42.55 0.23 <.0001 Yes 1152 237 1389 82.94 43.28 0.51 Excessive bleeding at birth? No 4071 585 4656 87.44 45.68 0.24 0.0004 Yes 861 167 1028 83.75 30.89 0.37 Higher fever at birth? No 4397 648 5045 87.16 45.52 0.24 0.0069 Yes 535 104 639 83.72 40.46 0.68 Convulsions at birth? No 4819 717 5536 87.05 45.45 0.23 <.0001 Yes 113 35 148 76.35 37.18 1.64 39 Table 4: Descriptive statistics of mother level variables Description Dependent # Alive # Dead Total %Alive Mean S(t) SE S(t) P- value Mothers Highest Educational Level No education 2428 462 2890 84.01 40.79 0.31 <.0001 Primary 1191 184 1375 86.62 45.27 0.46 Secondary plus 1313 106 1419 92.53 44.53 0.33 Mothers Occupation No Work 1681 276 1957 85.9 44.68 0.41 0.0529 White Collar Job 2069 288 2357 87.78 42.53 0.30 Agric and Others 1182 188 1370 86.28 41.80 0.42 Type of Marital Union Monogamy/Never married 3357 467 3824 87.79 42.38 0.24 0.004 Polygamy 1575 285 1860 84.68 44.42 0.41 Ethnicity Hausa 1506 263 1769 85.13 44.54 0.42 <.0001 Igbo 616 69 685 89.93 32.68 0.39 Yoruba 500 36 536 93.28 44.94 0.51 Fulani 406 86 492 82.52 30.91 0.53 Others 1904 298 2202 86.47 41.81 0.33 Religion Christian 1912 239 2151 88.89 42.91 0.31 <.0001 Muslim 2928 487 3415 85.74 44.77 0.30 Traditionalist or Others/missing 92 26 118 77.97 37.75 1.85 Media Exposure No Media Exposure 1963 340 2303 85.24 41.28 0.34 0.007 Exposed to at least one source 2969 412 3381 87.81 45.83 0.28 Decision making index No Decision 1875 321 2196 85.38 41.22 0.35 0.0047 At least one decision 3057 431 3488 87.64 45.82 0.28 Problem getting medical help No problem 467 95 562 83.1 30.89 0.51 0.0044 At least one problem 4465 657 5122 87.17 45.52 0.24 Partners Occupation No Work/No Partner 114 11 125 91.2 33.20 0.89 0.1553 White Collar Job 1893 268 2161 87.6 42.35 0.32 Agric/Other 2925 473 3398 86.08 44.99 0.30 Partners Highest Educational Level No education/Not married/Missing 2006 390 2396 83.72 43.80 0.38 <.0001 Primary 1183 190 1373 86.16 41.72 0.43 Secondary plus 1743 172 1915 91.02 43.87 0.30 Looking at the mother level variables, Table 4 shows that almost half of the mothers surveyed did not have any education, about one third of the children were born to mothers who did not work and one third to mothers in polygamous marriages. Most of the children had a Muslim background, and about 53% belonged to the three major ethnic groups. Table 4 also indicates that Mothers of most of the children were exposed to at least one media source and made at least one decision that affected their lives. The 40 majority of the mothers however reported having at least one problem getting medical help. Among the children in the sample, only about 34% had fathers with secondary education or higher and majority of the fathers were employed. The results in Table 4 also suggest that children whose mothers had secondary education, whose mothers had white collar jobs, whose mothers were in monogamous unions, those of Yoruba and Christian backgrounds as well as those whose fathers had secondary education had lower percentages of deaths. An investigation of the descriptive statistics of household variables (Table 5) reveals that majority of the children (76%) lived in households with well/surface water as source of drinking water. The majority of the children lived in households with pit latrine toilets and in households that used high pollution fuels. The percentage of children surviving was least for children in households with well water, no toilet facility, with natural floor as well as those in households using high pollution fuels. Table 5: Descriptive statistics of household variables Description Dependent # Alive # Dead Total %Alive Mean S(t) SE S(t) P- value Source of drinking water Piped or Tap 799 96 895 89.27 43.27 0.46 0.0006 Well or Surface 3726 615 4341 85.83 44.82 0.27 Others 407 41 448 90.85 22.24 0.28 Type of toilet facility Flush 532 30 562 94.66 45.59 0.44 <.0001 Pit latrine 3078 473 3551 86.68 45.27 0.29 No facility or Others 1322 249 1571 84.15 40.74 0.42 Flooring materials Natural and rudimentary 1949 392 2341 83.26 43.57 0.39 <.0001 Finished 2983 360 3343 89.23 43.08 0.25 Type of Cooking Fuel Cleaner Fuels 1007 83 1090 92.39 44.44 0.38 <.0001 High Pollution Fuels 3925 669 4594 85.44 44.67 0.26 Household Wealth Status Poorest 1111 221 1332 83.41 40.41 0.47 <.0001 Poorer 1022 220 1242 82.29 43.09 0.54 Middle 973 150 1123 86.64 42.01 0.46 Richer 971 102 1073 90.49 43.71 0.41 Richest 855 59 914 93.54 45.01 0.38 41 The descriptive statistics for community variables (Table 6), reveal that majority of the children lived in the Northern part of the country and mostly in rural areas (65%). The highest number of deaths was associated with communities with low scores on the community level indexes as can be seen in Table 6. Table 6: Descriptive statistics of community variables Description Dependent # Alive # Dead Total %Alive Mean S(t) SE S(t) P- value Community environmental factors Low 3037 572 3609 84.15 43.99 0.31 <.0001 High 1895 180 2075 91.33 44.07 0.28 Community health service index Low 2776 543 3319 83.64 43.76 0.32 <.0001 High 2156 209 2365 91.16 43.95 0.27 Community child deprivation index High 2057 240 2297 89.55 43.14 0.30 <.0001 Low 2875 512 3387 84.88 44.45 0.31 Community maternal socioeconomic index Low 2857 533 3390 84.28 44.05 0.32 <.0001 High 2075 219 2294 90.45 43.65 0.28 Malaria prevalence Low (0-35% reference category) 786 132 918 85.62 41.49 0.53 0.5091 Medium (36?60%) 2429 358 2787 87.15 42.14 0.29 High Endemicity (>60%) 1717 262 1979 86.76 45.33 0.38 Population density <100 per sq km 1705 301 2006 85 44.50 0.40 0.0059 100+ per sq km 3227 451 3678 87.74 42.38 0.25 Distance to roads < 1 km 2311 336 2647 87.31 42.25 0.29 0.22 1+ km 2621 416 3037 86.3 45.05 0.32 Coastal proximity <500 km 2265 282 2547 88.93 42.88 0.29 <.0001 500+ km 2667 470 3137 85.02 44.47 0.32 Region North Central 850 107 957 88.82 42.85 0.47 <.0001 North East 1194 225 1419 84.14 40.93 0.44 North West 1470 258 1728 85.07 44.43 0.43 South East 438 50 488 89.75 32.64 0.46 South South 448 70 518 86.49 31.70 0.49 South West 532 42 574 92.68 44.64 0.51 Type of Place of residence Urban 1803 189 1992 90.51 43.72 0.30 <.0001 Rural 3129 563 3692 84.75 44.27 0.30 4.3.1 Results of the survival analysis The results of survival analysis via the K?M method are displayed along with the summaries in Tables 3, 4, 5 and 6. In summary, there are significant differences in the survival times of children for most of the covariates considered. The variables not showing significant differences in survival times at the 5% level include: gender of child, 42 birth order, mother's occupation, partner's occupation, malaria prevalence and distance to roads. a) Community Environment Index b) Community Health Service index c) Community Child Deprivation Index d) Community Maternal Socioeconomic Index Figure 5: Kaplan-Meier Survival Curves for community level covariates With special focus on the community level variables generated from factor analysis, the survival curves exhibited significant differences (Figures 5a-5d). In summary, children in communities with high community environment scores exhibited higher survival chances. Living in communities with access to good health service index was also associated with higher survival probabilities (Figure 5b). Low community child deprivation score is 43 significantly associated with greater survival probability and children in communities where maternal socioeconomic scores were high had better chances of survival than those in communities with low socioeconomic scores. 4.3.2 Investigation of clustering of deaths The following section details the results of descriptive analyses conducted in order to establish if some correlation of deaths occurs as a result of children belonging to the same household, community and state. These tables were derived using the approach in Sastry (1997a and b). Table 7 shows the distribution of children and deaths per household from the 2003 NDHS There were 3215 households in the sample. A total of 752 deaths occurred to 635 families, while 2580 families never experienced a child death. Table 7: Distribution of births and deaths in households Deaths in household 0 1 2 3 4 5 # fa m ili es % fa m ili es # c hild re n % Child re n #d ea th s % d eat hs #d ea ths /# c hild re n (% d eat hs / % Child re n) *1 00 Child re n in Ho use hol d 0 1 1342 123 1465 45.6 1465 25.8 123 16.4 0.08 63.5 2 996 253 23 1272 39.6 2544 44.8 299 39.8 0.12 88.8 3 170 110 32 3 315 9.8 945 16.6 183 24.3 0.19 146.4 4 57 32 14 5 1 109 3.4 436 7.7 79 10.5 0.18 137 5 12 14 7 1 2 36 1.1 180 3.2 39 5.2 0.22 163.8 6 3 7 3 13 0.4 78 1.4 13 1.7 0.17 126 7 1 1 2 4 0.1 28 0.5 15 2 0.54 404.9 8 1 1 0.0 8 0.1 1 0.1 0.13 94.5 # families 2580 540 80 10 3 2 3215 5684 100 752 100 0.13 # deaths 0 540 160 30 12 10 752 %deaths 0 71.81 21.28 3.99 1.6 1.33 100 44 The number of children per household ranges from 1 to 8 per household, and there are on the average, 1.77 children per household. About 54% of the households have two or more children, and these children make up about 74% of the total children. Slightly over 28% of the deaths occurred to 3% of the households with two or more child deaths. Additionally, less than 1% of the households contributed three or more deaths; together they account for about 7% of the deaths (Table 7). Table 7 also shows that 46% of households have only 1 child, and that these households account for 16% of the deaths, giving a ratio of 0.63. However, the other 54% of children (who live in households with 2 or more children) accounted for 83.6% of the deaths, giving a ratio of 1.55. This is nearly 2? times that in single child households, indicating that there is a clustering of deaths in larger households. Looking at the distribution of births and deaths in communities (Table 8), there were a total of 752 deaths in the 361 communities in the dataset. A total of 112 communities did not experience any deaths, while 69% of the communities had experienced one or more deaths. Communities contributing two or more deaths make up 45% of the communities in the sample. Table 8: Distribution of births and deaths in communities Deaths in Communities # 0 1 2 3 4 5 6 7-14 # of c om m un itie s # Child re n % Child re n # De ad % Dea d % o f co m m un itie s Child re n in Co m m un itie s 1-10 78 46 10 2 0 0 0 0 136 886 16 72 10 38 11-20 31 35 33 9 11 2 2 1 124 1898 33 201 27 34 21-46 3 7 14 16 17 7 12 25 101 2900 51 479 64 28 # of communities 112 88 57 27 28 9 14 26 361 5684 100 752 100 100 %of communities 31 24 16 7 8 2 4 7 100 # Dead 0 88 114 81 112 45 84 228 752 % Dead 0 12 15 11 15 6 11 30 100 # Children 1031 993 944 581 658 224 397 856 5684 % Children 18 17 17 10 12 4 7 15 100 45 The results of the ESDA are displayed in Figures 6a and 6b. The nearest neighbour criterion was used in creating the weight matrix for this analysis. To this end, ten nearest neighbours which considered all lower number of neighbours was utilized. The Moran?s I statistics computed for the whole study area gave a figure of 0.1890 (Figure 6a). This indicates a low positive spatial autocorrelation in U5M rates across the states in the country as a whole, implying that child mortality rates are not spatially randomly distributed. The Moran?s scatter plot map (Figures 6b) reveals that the hot spots for child mortality rates (areas of high mortality, surrounded by areas of similarly high mortality) are mostly found in the northern states (areas in red). Significant cold spots (areas of low mortality, surrounded by areas of similarly low mortality) are concentrated in the south-western part of the country (states coloured in blue). The majority of the states are devoid of spatial clustering (white areas). The map however reveals that Kano, Plateau and Gombe states are spatial outliers among the northern states. Specifically, these are states of low mortality surrounded by high mortality states. a) Moran Scatter Plot b) LISA cluster Map Figure 6: Results from Spatial autocorrelation for U5M 46 In summary, the result from this descriptive investigation of clustering in the preceding paragraphs suggests that clustering of child mortality does exist at the household, community and state levels, primarily because the majority of the units in the various levels (household, community or state) did not have any child deaths and only a few units in the different levels (household, community or state) account for the majority of the deaths in the sample. Therefore, clustering has to be taken into account in the multivariate analysis by the inclusion of frailty effects at the relevant levels. 4.4 Multivariate analysis In order to implement the discrete time survival model described in Chapter 3, the data was restructured from a child level dataset (in which a child contributed one record), to a child period dataset (see Table 9 below). In the child period data, each child contributed one observation for each time period from birth until they died or were censored. For each child-month, the dependent variable (survival status) is coded 1 if the child died during that month and 0 otherwise. For example, a child who survives the first 3 months of life will have 3 records, while a child who dies at age 4 months will have 4 records. This resulted in a total of 142913 observations from the 5684 child based records. Table 9: Creation of child-period dataset from the original child-level data set Child level dataset Child ID Duration (Months) Survival Status Gender {Other variables ?..} 001 4 0 1 002 3 1 2 Child-period dataset Illustration of a discrete time dataset Child ID Discrete Time (Month) Survival Status Gender {Other variables?..} 001 1 0 1 001 2 0 1 001 3 0 1 001 4 0 1 002 1 0 2 002 2 0 2 002 3 1 2 47 4.4.1 Modelling Strategy and Model Comparison Approach To study the determinants of child mortality and the extent of heterogeneity in mortality risk, several geo-additive survival models are estimated and compared. The models differ with respect to variable composition, treatment of covariates (whether as fixed or random), and inclusion of frailty term (see Table 10). Table 10: Models considered 1. Child + Mother + HH variables 2. Model 1 + HH (random) 3. Model 2 + Community level variables 4. Model 3 + Community (random) 5. Model 4 + State (random) 6. Model 4 + State (spatial) 7. Model 4 + State (random) + State (spatial) 7b. Similar to model 7 but with a non-linear effect of mother?s age at birth of child Model 1 which is the simplest model consists of only covariate effects at the child, mother and household level. This model is the typical type of model considered in child mortality studies and does not include any random effects. Model 1 is then progressively expanded to include covariates and frailty effects at other levels. The full model (model 7) comprises of covariates at the child, mother, household and community levels as well as frailty effects at the household, community and state levels. In addition, model 7 splits the state level frailty effects into two so as to decide how much variation is spatially structured and how much is unstructured at the state level. All models were estimated using BayesX version 2.0 (Brezger et al. 2005). For each model, 12,000 iterations were carried out, the first 200011 samples were discarded and every 10th observation thereafter was saved for parameter estimation. All models assumed non-linear effect of child?s age, time-varying effect for breastfeeding: modelled via p-splines, and fixed effects of all other covariates. An additional model (7b) was also 11 Convergence was monitored through autocorrelation functions and trace plots which are part of the output from the BayesX software and the plots showed evidence of good mixing behaviour and a minor autocorrelation. 48 considered. This model is similar to Model 7 except that the effect of mother?s age is assumed to be continuous and is entered into the model as a non-linear effect in an attempt to assess the bias arising from modelling it as a fixed effect. Only main effects are considered for all covariates and interaction effects are not considered due to the number of covariates involved. Means, standard deviations and quantiles estimated from the posterior distributions are used to assess model fit for all models and credible intervals (CI) used to assess the significance of parameters. The DIC described in Chapter 3 was used to compare all the models and to explore the effect of adding covariates and frailty terms to Model 1. The results for model fit and variance components are summarized in Table 11. Based on the DIC values, model 6 had the lowest DIC value and thus is the best model. Model 3 which incorporated child, mother, household and community level variables as well as household random effect had the second lowest DIC. Looking at the difference in DIC of other models relative to models 6, it can be concluded that models 2, 3, 4 and 7 can be weekly differentiated as they all have DIC difference of between 3-7 from the best model, while models 1, 5 and 7b cannot be supported (strong evidence in favour of the model 6 with the smaller DIC). The inclusion of random effects as well as community level variables to model 1 lead to increased model complexity but also to a substantial improvement in the DIC values, thereby suggesting the importance of contextual and frailty effects. Even though model 7 which incorporated spatial and random effects had a good fit, the proportion of total spatial variance attributed to the spatial clustering were 0.69 for Model 7 and 0.71 for Model 7b, indicating a higher share of spatial variability due to the structured spatial effect and further supporting model 6 as the preferred model. Finally, Model 7b which is a variant of Model 7 shows a higher DIC value compared to model 6, thereby supporting the inclusion of mother?s age at birth as a categorical variable. 49 Table 11: Results from Models 1 -7b ? Model fit and Variance components of random and non- linear effects Estimation results for the DIC: Model1 Model2 Model3 Model4 Model5 Model6 Model7 Model7b Deviance 6023.45 5656.06 5578.84 5588.29 5632.76 5569.56 5619.01 5641.11 pD 52.82 217.74 254.66 251.87 231.72 257.86 236.04 229.10 DIC 6129.09 6091.54 6088.17 6092.03 6096.19 6085.28 6091.10 6099.31 ?DIC* 43.81 6.27 2.89 6.75 10.92 0.00 5.82 14.04 Rank 8 4 2 5 6 1 3 7 Variance components** Household effects 0.4689 (0.1119- 0.8674) 0.5857 (0.3273- 0.9513) 0.5302 ( 0.2298- 0.9753) 0.4538 (0.1572- 0.7993) 0.5447 (0.2621- 1.1477) 0.4636 (0.184- 0.8011) 0.4283 (0.1042- 0.811) Community effects 0.043 (0.0005- 0.1875) 0.0409 (0.0006- 0.1874) 0.0677 (0.0019- 0.2003) 0.054 (0.0007- 0.1931) 0.0457 (0.0009- 0.1709) State (Random) 0.0105 (0.0006- 0.0484) 0.0108 (0.0005- 0.049) 0.0125 (0.0005- 0.0575) State (Spatial) 0.0355 (0.1842- 0.0009) 0.0241 (0.1451- 0.0005) 0.0304 (0.1567- 0.0007) Age of child 16.9249 (8.9219- 31.1239) 19.5834 (9.2194- 40.37) 15.9612 (8.3957- 28.7482) 16.2055 (8.5326- 30.3106) 16.1456 (8.5572- 30.3934) 16.0995 (8.4066- 30.0385) 15.9213 (8.2426- 30.3994) 15.8296 (8.4734- 28.5438) Breastfeeding 1.1889 (0.0646- 6.4279) 0.5653 (0.0308- 2.6391) 1.0291 (0.0711- 5.1032) 0.8754 (0.066- 3.7927) 0.8844 (0.0585- 3.8003) 0.8041 (0.0645- 3.5284) 0.9761 (0.0641- 4.6433) 0.7774 (0.059- 3.5333) Mother's age at birth 0.009 (0.0007- 0.0427) *Difference of the best model against others **CI in Parenthesis 50 4.4.2 Sensitivity analysis The performance of the models in a Bayesian framework can be sensitive to the choice of the variance components priors, and this may arise due to small sample sizes (Gelman, 2006). Although results are insensitive to the choice of a and b for moderate to large data sets, a sensitivity analysis is recommend for checking the changes models with respect to changes in the hyperparameters (Hennerfeind et al. 2006). The sensitivity analysis was carried out with the same set of covariates as in model 6 and involved changing the prior distributions for the variance components using the following values (a=1,b=0.005) ? almost diffuse prior, (a=1,b=0.00005) and (a=0.00005,b=0.00005). These values reflect different degrees of uncertainty about the variance components and details of hyperparameters are provided in Section 3.2.1.6. Table 12: Sensitivity to choice of hyperparameter values for Model 6 Hyperparameters a=0.001, b=0.001* a=1,b=0.005 a=1,b=0.00005 a=0.00005,b=0.00005 Model Fit Deviance 5569.56 5720.87 6006.45 5652.47 pD 257.86 197.09 70.21 224.07 DIC 6085.28 6115.05 6146.88 6100.61 Random effects** Household 0.54466 (0.26211-1.14767) 0.34531 (0.08037-0.79865) 0.01308 (0.00002-0.07451) 0.43002 (0.09526-0.82882) Community 0.0677 (0.00185-0.2003) 0.02399 (0.00185-0.11718) 0.00016 (0.00002-0.00116) 0.03857 (0.00006-0.16322) State- Unstructured State- Structured 0.03551 (0.1842-0.00087) 0.0116 (0.05276-0.00135) 0.00023 (0.00122-0.00001) 0.02362 (0.18899-0.00003) * Default values ** variance components - posterior mean and 95% CI in parenthesis As can be seen in Table 12, the choice of hyper-parameter does affect the estimates. The benchmark model (model 6) which used priors: a=b=0.001 had the lowest DIC and can be considered the best model. Decreasing the value of b while maintaining a=1 resulted in a decrease in the size of the variance effects. The choice of a=b=0.001 is however considered appropriate for the current exercise since the DIC was lowest for this model. 51 4.4.3 Interpretation of categorical covariates (fixed effects) The focus of the discussion from this point on will be the results of Model 6 which was the best model according to the DIC criterion. Comparisons will be drawn to other models where necessary. The parameter estimates obtained from the models are shown in Tables 13 through 16. Statistical significance of the effects was assessed at the 0.05 level by evaluating whether the 95% CI of the posterior distribution contained zero (0). An effect is therefore significant and marked with asterisk (*) if its 95% CI does not include zero. In general, if the sign of an effect is positive, it implies that there is a higher risk of mortality for children in that group relative to the reference category. As can be seen from Tables 13 through 16, the coefficients for the fixed effects are generally of the same magnitude and direction (had the same signs) and the same set of covariates were statistically significant across the models. A close look at the posterior estimates for the child level effects in Table 13 reveals that mortality is significantly higher for children with preceding birth intervals of up to 35 months relative to those with no older siblings or with intervals of more than 35 months. The results for model 6 also suggests that, those with succeeding birth intervals of up to 24 months have a higher mortality risk compared to those with no younger siblings and with succeeding birth intervals of more than 24 months. Children with small sizes at birth have a higher chance of dying compared to those with birth sizes of average to large. Having long labour at birth as well as having convulsions at birth also significantly increases the risk of the child dying before the age of 5 years. Mother?s secondary education significantly reduces the mortality of children as can be seen from Table 14. Although not statistically significant, children born to mothers whose partners are in white collar jobs as well as those whose partners have secondary education have a lower mortality risk. 52 Table 13: Posterior summaries for child level effects models 1-7b Description Dependent M odel 1 M odel 2 M odel 3 M odel 4 M odel 5 M odel 6 M odel 7 M odel7 b Constant -6.374* -6.298* -6.304* -6.252* -6.233* -6.209* -6.136* -6.11* Gender Male Female 0.013 0.014 0.006 0.007 0.007 0.009 0.011 0.000 Birth order First to third birth Fourth or higher birth -0.026 -0.042 -0.05 -0.047 -0.04 -0.053 -0.049 -0.039 Preceding Birth Interval No older siblings or > 36 months Less than 24 Months 0.142* 0.15* 0.154* 0.14* 0.152 0.17* 0.162* 0.145 24 to 35 months 0.139* 0.154* 0.155* 0.15* 0.154* 0.156* 0.162* 0.173* Succeeding Birth Interval No younger sibling or > 36 months Less than 24 Months 0.575* 0.631* 0.645* 0.616* 0.633* 0.653* 0.644* 0.641* 24 to 35 months -0.162* -0.18* -0.177* -0.178* -0.174* -0.183* -0.176* -0.171* Size at Birth Small/very small 0.16* 0.185* 0.209* 0.203* 0.213* 0.217* 0.211* 0.206* Average or larger Mothers age at birth <18 Years 0.076 0.064 0.053 0.022 0.041 0.044 0.043 18-34 Years 35 an older 0.136 0.158 0.159 0.176 0.158 0.166 0.164 Place of delivery Homes/Others/Missing Health Facility -0.14 -0.135 -0.125 -0.115 -0.111 -0.113 -0.11 -0.151 Source of prenatal care Skilled Birth Attendant 0.151 0.153 0.182 0.123 0.199 0.197 0.181 0.227 Traditional Birth Attendant/Other/None Birth Assistance Trained Medical Personnel -0.034 -0.054 -0.04 -0.044 -0.056 -0.055 -0.049 -0.003 Traditional Birth Attendant/Other/None Source of antenatal care Homes/Other/None Health Facility -0.23 -0.219 -0.241 -0.205 -0.256 -0.249 -0.232 -0.283 Long Labour at birth No Yes 0.215* 0.231* 0.225* 0.214* 0.237* 0.229* 0.231* 0.242* Excessive bleeding at birth No Yes 0.143 0.165* 0.157 0.148* 0.155* 0.166 0.167* 0.173 Higher fever at birth No Yes 0.124 0.155 0.165* 0.167* 0.152 0.163 0.168 0.169 Convulsions at birth No Yes 0.147 0.165 0.196 0.141 0.205 0.208* 0.225 0.217 Any problem at birth? No problem At least one problem -0.176 -0.197 -0.193 -0.176 -0.193 -0.195 -0.195* -0.22* * Significant at 0.05% (i.e. 95% CI does not include 0) Reference categories appear in italics 53 Table 14: Posterior summaries for mother level effects models 1-7b Description Dependent M odel 1 M odel 2 M odel 3 M odel 4 M odel 5 M odel 6 M odel 7 M odel7 b Mothers Highest Educational Level No education Primary 0.137 0.156 0.181* 0.16 0.178* 0.158 0.171* 0.17 Secondary plus -0.24* -0.244* -0.318* -0.274 -0.293* -0.277* -0.281* -0.274* Mothers Occupation No Work 0.033 0.037 0.047 0.048 0.05 0.05 0.039 0.054 White Collar Job Agric and Others -0.019 -0.026 -0.029 -0.017 -0.009 -0.016 -0.012 -0.023 Type of Marital Union Monogamy/Never married Polygamy 0.019 0.022 0.042 0.039 0.036 0.043 0.036 0.047 Ethnicity Hausa 0.096 0.115 0.012 0.048 0.016 0.01 0.018 0.025 Igbo -0.339* -0.418* -0.39 -0.484 -0.425 -0.43 -0.441 -0.475 Yoruba -0.13 -0.103 0.11 0.096 0.128 0.156 0.148 0.15 Fulani 0.276* 0.316* 0.23 0.276 0.225 0.227 0.244 0.24 Others Religion Christian 0.095 0.065 0.01 -0.053 -0.004 -0.032 -0.038 -0.015 Muslim Traditionalist or Others/missing -0.077 0.008 0.034 0.127 0.039 0.099 0.104 0.073 Media Exposure No Media Exposure -0.058 -0.048 -0.042 -0.042 -0.042 -0.04 -0.04 -0.038 Exposed to at least one source Decision making index No Decision 0.032 0.042 0.051 0.043 0.045 0.045 0.048 0.046 At least one decision Problem getting medical help No problem 0.057 0.052 0.055 0.053 0.043 0.045 0.049 0.049 At least one problem Partners Occupation No Work/No Partner 0.245 0.356 0.343 0.349 0.351 0.359 0.387 0.324 White Collar Job -0.055 -0.093 -0.079 -0.079 -0.081 -0.082 -0.094 -0.098 Agric/Other Partners Highest Educational Level No education/Not married/Missing Primary 0.005 0.008 -0.005 0.006 0.0004 0.005 -0.002 -0.053 Secondary plus -0.1 -0.118 -0.104 -0.103 -0.106 -0.118 -0.103 -0.085 * Significant at 0.05% (i.e. 95% CI does not include 0) Reference categories appear in italics 54 Table 15: Posterior summaries for household effects models 1-7b Description Dependent M odel 1 M odel 2 M odel 3 M odel 4 M odel 5 M odel 6 M odel 7 M odel7 b Source of drinking water Piped or Tap 0.223* 0.195 0.219 0.208 0.221 0.213 0.223 0.18 Well or Surface Others -0.24 -0.2 -0.231 -0.209 -0.232 -0.206 -0.22 -0.262 Type of toilet facility Flush -0.552* -0.47* -0.608* -0.528* -0.496* -0.56* -0.572* -0.401* Pit latrine No facility or Others 0.319* 0.269 0.351* 0.305* 0.28* 0.312* 0.327* 0.232 Flooring materials Natural and Rudimentary -0.024 -0.046 -0.023 -0.022 -0.035 -0.037 -0.033 -0.04 Finished Type of Cooking Fuel Cleaner Fuels 0.025 0.042 0.038 0.084 0.048 0.031 0.051 0.068 High Pollution Fuels Household Wealth Status Poorest Poorer 0.219* 0.243* 0.217 0.239 0.234 0.227 0.22 0.237 Middle 0.052 0.062 0.042 0.075 0.047 0.039 0.058 0.077 Richer -0.336* - 0.363* -0.358* -0.349* -0.349* -0.363* -0.352* -0.333* Richest -0.023 -0.091 0.021 -0.09 -0.052 -0.038 -0.048 -0.11 * Significant at 0.05% (i.e. 95% CI does not include 0) Reference categories appear in italics Compared to children who live in households with a pit latrine, those who live in households with flush toilets have a significantly lower mortality risk, while those in households with no toilet facilities have significantly higher mortality chances as can be seen in Table 15 above. The community variables did not generally yield statistically significant results, however Table 16 suggests that living in urban areas, living in the South-western part of the country and living in communities with high health service index are all associated with lower mortality. 55 Table 16: Posterior summaries for community effects models 1-7b Description Dependent M odel 1 M odel 2 M odel 3 M odel 4 M odel 5 M odel 6 M odel 7 M odel7 b Community environmental factors Low High -0.026 -0.023 -0.015 -0.006 -0.017 -0.011 Community Health service index Low High -0.064 -0.056 -0.068 -0.063 -0.07 -0.062 Community Child deprivation index High -0.044 -0.046 -0.04 -0.04 -0.041 -0.046 Low Community Maternal socioeconomic index Low High 0.123 0.111 0.119 0.11 0.117 0.115 Malaria Prevalence Low (0-35% reference category) 0.116 0.127 0.127 0.142 0.138 0.122 Medium (36?60%) High Endemicity (>60%) -0.002 -0.001 0.002 -0.007 -0.006 0.002 Population Density <100 per sq km -0.013 -0.01 -0.018 -0.004 -0.007 -0.013 100+ per sq km Distance to roads < 1 km -0.071 -0.063 -0.068 -0.06 -0.062 -0.07 1+ km Region North Central -0.015 -0.048 -0.036 -0.05 -0.03 -0.053 North East 0.04 0.002 0.04 -0.027 0.013 0.015 North West South East -0.02 0.092 0.026 0.009 0.037 0.052 South South 0.227 0.215 0.197 0.252 0.209 0.217 South West -0.246 -0.239 -0.239 -0.19 -0.241 -0.248 Type of Place of residence Urban -0.076 -0.077 -0.087 -0.084 -0.083 -0.067 Rural * Significant at 0.05% (i.e. 95% CI does not include 0) Reference categories appear in italics 4.4.4 Interpretation of non-linear effects The results for smooth effects of continuous covariates modelled and fitted using penalized splines are displayed for models in Figure 7. In general, the effect of age of child shows a high risk of child death shortly after birth, and an overall decline in deaths as the child grows older (Figure 7a). The heaps appearing at various ages in the curve may be due to the heaping of survival times while the troughs may result from much smaller number of deaths being recorded between these time points. The modelling approach considered here ensures that the heaping has little effect on the estimation of the fixed effect covariates. 56 a) Effect of child?s age ? Model 6 b) Effect of breastfeeding ? Model 6 c) Effect of Mother?s age ? Model 7b Figure 7: Non-linear effects of metrical covariates ? Posterior Mean (Centre line) together with 95% CI (CI not shown for Figure 7a for sake of clarity). Turning to the effect of breastfeeding, it can be observed from Figure 7b that mortality risk is reduced in the early ages, while its effect at the older ages (beyond 30 months) is insignificant. The effect of mother?s age at birth of child is almost U-shaped with a higher risk of child deaths attributable to younger and older women (Figure 7c). 4.4.5 Interpretation of the spatial effect Models 6 and 7 considered the spatial effects of state of residence on child mortality. Model 6 which incorporated only the structured spatial effect is superior in terms of the DIC to model 7 which considers both structured and unstructured spatial effects. Although the results did not show any major hot-spots or cold spots of child mortality, 57 the spatial pattern from model 6 (Figure 8a) points to the fact that once other variables have been taken into account, mortality risk tends to be higher in the North-Eastern parts of the country (Yobe, Borno, and Jigawa states) and lower in the South-western parts of the country (Lagos, Ogun, and Oyo States amongst others). The results from the LISA cluster (Figure 6b) as well those from Figures 8a-d suggest a concentration of mortality in the North-Eastern part of the country. However, the overall implication of the spatial effect is that although mortality risk exhibits spatial patterns, the spatial variations are probably explained by the covariates considered. a) Spatial frailty ? Model 6 b) Non-spatial frailty -? Model 7 c) Spatial frailty ? Model 7 d) Total spatial effects ? Model 7 Figure 8: Maps of the posterior mean of spatial effects 58 4.5 Determinants of Infant mortality Since the risk factors associated IM and U5M can be very different, a separate analysis of the best fitting model (model 6) was fitted to the data on IM. In the revised dataset for IM analysis, all deaths after 11 months were considered censored and the resulting child- period dataset had 52,065 observations using this approach. Table 17 gives the posterior summaries for the community level variables considered and similar to the results for U5M, it can be observed that the results are not statistically significant but also suggest that living in urban areas, living in the South-western part of the country and living in communities with high health service index lowers IM risk. Table 17: Posterior summaries for community effects model 6 - IM Description Dependent Model6 - IM Community environmental factors Low High 0.06 Community Health service index Low High -0.154 Community Child deprivation index High 0.1 Low Community Maternal socioeconomic index Low High 0.028 Malaria Prevalence Low (0-35% reference category) 0.134 Medium (36?60%) High Endemicity (>60%) 0.139 Population Density <100 per sq km -0.015 100+ per sq km Distance to roads < 1 km -0.12 1+ km Region North Central 0.102 North East 0.127 North West South East -0.352 South South 0.216 South West -0.291 Type of Place of residence Urban -0.022 Rural The results for smooth effects of continuous covariates on IM fitted using penalized splines are displayed in Figure 9. The effect of age of child shows a high risk of child death shortly after birth, and an overall decline in deaths as the child grows older (Figure 59 9a). The effect of breastfeeding on IM is such that mortality risk is reduced in the early ages and increases almost linearly with the child?s age (Figure 9b). There is a shift in spatial patterns of IM when compared to the results from U5M. The map in Figure 9c shows that the risk of IM tends to be higher in the southern parts of the country. The difference in the pattern is an indication that the modelling of mortality at the childhood ages should take into account the various definitions of childhood mortality. a) Effect of child?s age ? IM b) Effect of breastfeeding ? IM c) Structured spatial effect ? IM Figure 9: Non-linear: a & b - Posterior Mean (Centre line) together with 95% CI and spatial effects for IM ( c ) 60 Chapter Five: Summary and Conclusions 5.1 Summary The main aim of the project was to account for the influence of contextual factors and frailty on CM and to investigate the spatial patterns of CM in Nigeria. Chapter 1 outlined the problem, and as well as the demographic and statistical issues and also set out the aims and objectives of the study. A literature review was undertaking in Chapter 2, while Chapter 3 listed the data used in the study, defined possible models, and discussed some of the model issues. The analysis carried out in Chapter 4 examined the effect of community level factors on child mortality as well as the spatial patterns associated with child mortality risk in Nigeria. The results of survival analysis via K-M method revealed that there were significant differences in the survival times of children for most of the covariates considered and the only variables not showing significant differences in survival times were gender of child, birth order, mother's occupation, partner's occupation, malaria prevalence, population density and distance to roads. Results from a descriptive investigation of clustering showed that clustering of child mortality exists at the household, community and states levels and these need to be taken into account in the multivariate analysis by the inclusion of frailty effects at the relevant levels. All the covariates considered were included into the geo-additive survival models. A total of 8 models were evaluated and the results in Chapter 4 revealed that most of the community level factors considered had no significant effect on child mortality once the household and individual level factors had been taken into account. The results also suggest that the inclusion of frailty terms as well as the inclusion of contextual variables at the community level lead to an improvement in the DIC values thereby suggesting the importance of contextual and frailty effects. A higher share of state level variability in the data was due to the structured spatial effect. The spatial patterns were also found to be insignificant although, they point to very interesting patterns in child mortality variations 61 in the country. The analysis however indicates that the child and household level factors play an important role in child mortality reduction. 5.2 Recommendations The importance of correct model choice (particularly with respect to fixed, random and spatial components) has been demonstrated, and the quality of model fit should always be investigated before conclusions are drawn and policies formulated. The findings from this study are preliminary but we give recommendations as follows. Policy programs should focus on the education of women on the need to practice child spacing. Policy makers should develop strategies to narrow the wealth gap in the country. There should be an overall improvement in the area of service delivery with more houses connected to clean, affordable and regular pipe borne water systems. The tools used in the present analysis can also be beneficial in other ways. For example, the mortality cold spots could be studied closely to find out why the areas exhibit different conditions from their immediate neighbours. This would help in devising targeted intervention which will be more effective in child mortality reduction. 5.3 Limitations of the Study /Suggestions for future research This study faces the following limitations: 1. Due to the cross-sectional nature of the data, the covariates may not reflect the socio-economic and ecological conditions of the child at the time of death 2. The methodology used is vulnerable to various biases due to factors such as migration. 3. The dichotomization of some community level variables may have resulted in the loss of information. Alternative specifications, such as the direct use of the component scores or the categorization of such scores into more than 2 levels are worth considering. 62 4. There are uncertainties related to the Modifiable Areal Unit Problem12 (MAUP) (Heywood, 1998). The literature suggests analysis of spatial effects at multiple levels as a means of alleviating problems related to MAUP. Therefore, to check the sensitivity the choice of geographical unit in measuring spatial effects, a geo-statistical (kriging) model with cluster as the spatial unit of analysis could be explored in addition to the lattice model (state level model), which is the main focus of this work. 12 MAUP arises when artificial units of spatial reporting (for example states) are used in reporting highly localized spatial occurrences, thereby resulting to misleading spatial patterns 63 REFERENCES Adebayo, S.B. and Fahrmeir, L. (2005). Analyzing child mortality in Nigeria with geo- additive discrete-time survival models. Statistics in Medicine, 24(5): 709-728. Adebayo, S.B., Fahrmeir, L. and Klasen, S. (2004). Analyzing Infant Mortality with Geo- additive Categorical Regression Models: A Case Study for Nigeria. Economics and Human Biology, 2(2): 229-44. Adedoyin, M., and S. Watts. (1989). Child Health and Child Care in Okele: an indigenous area of the city of Ilorin, Nigeria. Social Science and Medicine, 29(12): 1333- 1341. Adetunji, J.A. (1995). Infant Mortality and Mother?s Education in Ondo State, Nigeria. Social Science and Medicine, 40(2): 253-263. Adetunji, J.A. (2000). Trends in under-5 mortality rates and the HIV/AIDS epidemic. Bulletin of the World Health Organization , 78: 1200?1206. Ahonsi, B.A. (1995). Age variation in the proximate determinants of child mortality in South-west Nigeria. Journal of Biosocial Science, 27(1): 19-30. Anselin, L. (1995). Local indicators of spatial association ?LISA. Geographical Analysis, 27: 93-115. Anselin, L. (2003). GeoDa 0.9 User's Guide. Spatial Analysis Laboratory Urbana- Champaign, IL: University of Illinois. Balk D., Pullum T., Storeygard A., Greenwell F. and Neuman M. (2003). Spatial Analysis of Childhood Mortality in West Africa. Calverton, Md.: MEASURE DHS+, ORC Macro (DHS geographic studies 1) HQ 766 W31 G46 #1 Macro International. 44p Banerjee, S., Wall, M.M and Carlin, B. P. (2003). Frailty modeling for spatially correlated survival data with application to infant mortality in Minnesota. Biostatistics, 4: 123-142. Behrman, J.R., and Wolfe, B.L. (1987). How Does Mother's Schooling Affect Family Health Nutrition Medical Care Usage, and Household Sanitation? Journal of Econometrics, 36: 195?204. Berger, U., Fahrmeir, L., Klasen, S. (2002). Dynamic Modelling of Child Mortality in Developing Countries: Application for Zambia. SFB 386 Discussion Paper No. 299, University of Munich (available from http://epub.ub.uni- muenchen.de/1677/1/paper_299.pdf). 64 Bernardo, J.M. and Smith, A.F.M. (2000). Bayesian Theory. Chichester: Wiley. Besag, J., York, J. and Mollie, A. (1991). Bayesian image restoration with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics 43, 1-59. Bicego, G. and Ahmad O. B. (1996). Infant and child mortality. Demographic and Health Surveys, Comparative Studies No. 20. Calverton, Maryland: Macro International Inc. Bolstad W.M. (2004). Introduction to Bayesian statistics, 2nd Edition. New York: Wiley. Brezger, A., Kneib, T. and Lang, S. (2005). BayesX?Software for Bayesian Inference based on Markov Chain Monte Carlo simulation Techniques. (Available from: http://www.stat.uni-muenchen.de/~bayesx/). Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information?Theoretic Approach, 2nd Edition. New York, Springer. Caldwell, J.C. (1979). Education as a Factor in Mortality Decline: An Examination of Nigeria data. Population Studies. 33(2): 395-413. Caldwell, J.C. and Caldwell, P. (1993). Women?s position and child mortality and morbidity in less developed countries. In N. Federici, K.O. Mason and S. Sogner (Eds.), Women?s position and demographic change (pp. 122-139). New York: Oxford University Press. Carlin, B.P. and Louis, T.A. (2000). Bayes and Empirical Bayes methods for data analysis, 2nd edition. New York, Chapman and Hall. Chaix, B., Merlo, J., Subramanian, S. V., Lynch, J. and Chauvin, P. (2005). Comparison of a Spatial Perspective with the Multilevel Analytical Approach in Neighborhood Studies: The Case of Mental and Behavioral Disorders due to Psychoactive Substance Use in Malmo, Sweden, 2001. American Journal of Epidemiology, 162(2): 171 - 182. Chromy, J.R. and Abeyasekera, S. (2003). Statistical analysis of survey data. In: Household Sample Surveys in Developing and Transition Countries. New York: United Nations Publication ST/ESA/STAT/SER.F/96, 2005, Chapter XIX, 388-417. Cleland J.G. and van Ginneken, J.K. (1988). Maternal education and child survival in developing countries: the search for pathways of influence. Social Science and Medicine, 27(12): 1357-1368. Cliff A. and Ord J.K. (1981). Spatial Processes, Models and Applications. London: Pion. 65 Congdon, P. (2003). Applied Bayesian Modeling. Wiley Series in Probability and Statistics. West Sussex, England: Wiley. Crook, A., Knorr-Held. L. and Hemingway, H. (2003). Measuring spatial effects in time to event data: a case study using months from angiography to coronary artery bypass graft. Statistics in Medicine, 22: 2943-2961. Cox, D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society, 34:187?220. Curtis, S. L. (1995). Assessment of the quality of data used for direct estimation of infant and child mortality in DHS-II Surveys. Occasional Papers No. 3. Calverton, MD: Macro International Inc. Curtis, S.L., Diamond, I. and McDonald J.W. (1993). Birth interval and family effects on post neonatal and mortality in Brazil. Demography, 30(1): 33-43. Curtis, S. L. and Hossein, M. (1998). The Effect of Aridity Zone on Child Nutritional Status. West Africa Spatial Analysis Prototype Exploratory Analysis. Calverton, Maryland: Macro International Inc. Curtis S. L. and Steele, F. (1996). Variations in familial neonatal mortality risks in four countries. Journal of Biosocial Science, 28: 141-159. D?Souza, S. and Chen, L.C. (1980). Sex differentials in mortality in Bangladesh. Population and Development Review, 6: 257-70. Das Gupta, M. (1987). Selective discrimination against female children in rural Punjab, India. Population and Development Review, 13: 77-100. Desai, S. and Alva, S. (1998). Maternal Education and Child Health: Is There a Strong Causal Relationship? Demography, 35: 71-81. DFID. (2000). Nigeria: health briefing paper. DFID HSRC, London. Available at: http://www.dfidhealthrc.org/shared/publications/Country_health/Nigeria.pdf Eilers P.H.C. and Marx B.D. (1996), Flexible smoothing using B-splines and penalized likelihood, Statistical. Science, 11: 89?121. ESRI (Environmental Systems Research Institute, Inc) 2002. ArcView GIS Version 3.3. Redlands, CA. Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling based on Generalized Linear Models. New York: Springer. 66 Feyisetan, B.J., Asa, S. and Ebigbola, J.A. (1997). Timing of birth and infant mortality in Nigeria. Genus 53 (3?4): 157?181. Gemperli, A., Vounatsou P., Kleinschmidt I., Bagayoko M., Lengeler C. and Smith T. (2004). Spatial patterns of infant mortality in Mali; the effect of malaria endemicity. American Journal of Epidemiology, 159: 64-72. Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1:515?534. Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995). Bayesian data analysis. New York: Chapman and Hall Goldstein, H. (1995). Multilevel statistical models. 2nd Edition. New York: Halstead Press. Gregson, S., Zhuwau, T., Anderson, R. M., and Chandiwana, S. K. (1999). Apostles and Zionists: The influence of religion on demographic change in rural Zimbabwe. Population Studies, 53(2):179?193. Guo, G. (1993). Use of sibling data to estimate family mortality effects in Guatemala. Demography, 30(1): 15-32. Guo, G. and Rodr?guez, G. (1992). Estimating a multivariate proportional hazards model for clustered data using EM algorithm, with an application to child survival in Guatemala. Journal of American Statistical Association, 87(420): 969-976. Hennerfeind, A., Brezger, A. and Fahrmeir, L. (2006). Geo-additive Survival Models. Journal of American Statistical Association, 101(475): 1059-1064. Heywood (1998). Introduction to Geographical Information Systems. New York: Addison Wesley Longman. Hill, K. and Pebley, A. (1989). Child mortality in the developing world. Population and Development Review, 15(4): 657-683. Hill, K., Bicego, G. and Mahy, M. (2001). Child Mortality in Kenya: An examination of Trends and Determinants from the late 1980s to the mid-1990s. Hopkins Population Center Working Paper. Hobcraft, J.N., McDonald, J.W. and Rutstein, S.O. (1984). Socio-economic factors in infant and child mortality: a cross-national comparison. Population Studies, 38: 193?223. Hobcraft, J. N., Mc Donald, J. W. and Rutstein, S. O. (1985). Demographic Determinants of Infant and Early Child Mortality: A Comparative Analysis. Population Studies, 39: 363-385. 67 http://www.childinfo.org (accessed: 17th, July 2005) Iyun, B.F. (1992). Women?s status and Childhood Mortality in two Contrasting Areas in South-western Nigeria: a Preliminary Analysis. GeoJournal, 26(1): 43-52. Kandala, N. B., Magadi, M. A. and Madise, N. J. (2004). An Investigation of District Spatial Variations of Childhood Diarrhoea and Fever Morbidity in Malawi. S3RI Applications and Policy Working Papers, A04/14, Southampton University (available from: http://eprints.soton.ac.uk/12463/) Kandala, N.B., Fahrmeir, L. and Klasen, S. (2002). Geo-additive models of Childhood Undernutrition in three Sub-Saharan African Countries. SFB 386 Discussion Paper No. 287, University of Munich (available from http://www.stat.uni-muenchen.de/sfb386/) Kaplan, E. L. and Meier, P. (1958). Nonparametric Estimation from Incomplete Observations, Journal of the American Statistical Association, 53: 457-481. Klein, J.P. and Moeschberger (1997). Survival analysis: techniques for censored and truncated data. Springer. Kneib, T. (2005). Geo-additive hazard regression for interval censored survival times. SFB 386 discussion paper 447, University of Munich (available from http://www.stat.uni-muenchen.de/sfb386/) Kravdal, O. (2004). Child Mortality in India: the Community-Level Effect of Education. Population Studies 58: 177-92. Kuate-Defo, B. and Diallo, K. (2002). Geography of child mortality clustering within African families. Health and Place, 8: 93-117. Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13: 183-212. Lawoyin, T.O. (2001). Risk factors for infant mortality in rural community in Nigeria. Journal of Royal Society for Public Health, 121(2): 114?118. Lee, E.T. (1980). Statistical methods for survival data analysis. Lifetime Learning Publications. Leonard, T. and Hsu, J.S.J. (1999). Bayesian methods: an analysis for statisticians and interdisciplinary researchers. Cambridge, New York. Madise, N. and Diamond, I. (1995). Determinants of infant mortality in Malawi: An analysis to control for death clustering within families. Journal of Biosocial Science, 27(1): 95-106. 68 Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50 (3): 163-70. Masuy-Stroobant, G. (2002). The determinants of infant mortality: how far are conceptual frameworks really modelled? In: Robert Franck (2002) Explanatory Power of Models Bridging the Gap between Empirical and Theoretical Research in the Social Sciences, Kluwer Academic Publishers Boston Masuy-Stroobant, G. and Gourbin, C. (1995). Infant health and mortality indicators: their accuracy for monitoring the socio-economic development in the Europe of 1994. European Journal of Population, 11(1): 63-84. Miaou, S.-P., Song, J., and Mallick, B. (2003). Roadway Traffic Crash Mapping: A Space-Time Modeling Approach. Journal of Transportation and Statistics, 6: 33?58. Montgomery, M. R. and Cohen, B. (1998). From death to birth: mortality decline and reproductive change. Washington, D.C.: National Academic Press. Moran, P.A.P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37:17-23. Mosley, H. W. and Chen, L. C. (1984). An analytical framework for the study of child survival in developing countries. In Child survival: Strategies for research, ed. W. G. Mosley and L.C. Chen. New York: Population Council. 25.44. National Population Commission [Nigeria] (1991). Nigeria Demographic and Health Survey 1990. Calverton, Maryland: National Population Commission and ORC Macro. National Population Commission [Nigeria]. (1998). 1991 population census of the Federal Republic of Nigeria: Analytical report at the national level. National Population Commission, Lagos [Nigeria]. National Population Commission [Nigeria] (2000). Nigeria Demographic and Health Survey 1999. Calverton, Maryland: National Population Commission and ORC Macro National Population Commission [Nigeria] and ORC Macro. (2004). Nigeria Demographic and Health Survey 2003. Calverton, Maryland: National Population Commission and ORC Macro. Ogunjuyigbe, P.O. (2004). Under-five mortality in Nigeria: perception and attitudes of the Yorubas towards the existence of "Abiku". Demographic Research, 11: 43-56. Ord J.K. and Getis A. (1995). Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis, 27: 286-306. Owa J.A. and Osinaike A.I. (1998). Neonatal Morbidity and Mortality in Nigeria. Indian Journal of Pediatrics, 65: 441-449. 69 Palloni, A. and Millman, S. (1986). Effects of inter-birth intervals and breastfeeding on infant and early childhood mortality. Population Studies, 40: 215?236. Peterson, C., Yusof K., DaVanzo J. and Habicht J.P. (1986). Why were Infant and Child Mortality Rates Highest in the Poorest States of Peninsular Malaysia, 1941-75? A Rand Note. Santa Monica, CA: Rand. Peto R. and Peto J. (1972). Asymptotically efficient rank invariant procedures. Journal of the Royal Statistical Society, Series A, 135: 185?207 POLICY Project. (2002). Child Survival in Nigeria; Situation, Response, and Prospects: Key Issues. POLICY Project, USA. Preston, S. H. (1978). The effect of infant and child mortality on fertility. New York: 16 Academic Press. Root G. (1997). Population density and spatial differentials in child mortality in Zimbabwe. Social Science and Medicine, 44(3): 413?421. Rutstein, S. (2000). Factors Associated with Trends in Infant and Child Mortality in Developing Countries During the 1990's. Bulletin of the World Health Organization, 78(10): 1256-1270. Ruzicka, L., (1989). Problems and issues in the study of mortality differentials. In: Ruzicka, L., Wunsch, G. and Kane, P., Editors. Differential Mortality: Methodological Issues and Biosocial Factors, Clarendon Press, Oxford. SAS Institute, Inc. (2000?2004). SAS version 9.1.3 software. SAS Institute, Inc. Cary, NC. Sastry, N. (1996). Community characteristics, individual and household attributes, and child survival in Brazil. Demography, 33(2): 211-229. Sastry, N. (1997a). A nested frailty model for survival data, with an application to the study of child survival in northeast Brazil. Journal of the American Statistical Association, 92: 426-435. Sastry, N. (1997b). Family-level Clustering of Childhood Mortality Risk in Northeast Brazil. Population Studies, 51: 245-261. Sastry, N. (1997c). What Explains Rural-Urban Differentials In Child Mortality In Brazil? Social Science and Medicine, 44:989-1002. Schultz, T.P., (1984). Studying the Impact of Household Economic and Community Variables on Child Mortality. Population and Development Review 10 (Suppl.): 215-235. 70 Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, Series B, 64: 583?640. Tulasidhar, V.B. (1993). Maternal Education, Female Labour Force Participation and Child Mortality: Evidence from the Indian Census. Health Transition Review, 3: 177-90. UNICEF. (2002). Nigeria: Information by Country. Available from: http://www.unicef.org/infobycountry/nigeria_statistics.html (accessed 29 June 2004) UNICEF. (2005). The State of World?s Children, UNICEF, New York. United Nations (2000). Millennium Declaration. New York: United Nations. United Nations (2005). Progress Towards the Millennium Development Goals 1990- 2005. New York: United Nations Department of Economic and Social Affairs Publication 2005. Vaida F., Ghosh P. and Liu L. (2008): Mixed-Effects Models for Longitudinal HIV Virologic and Immunologic Data, In: Khattree R, Naik DN, editors. Computational Methods in Biomedical Research, Chapman and Hall/CRC Biostatistics Series. Vaupel, J.W., Manton, K. and Stallard, E. (1979). Impact of Heterogeneity in Individual Frailty on the Dynamics of Mortality. Demography, 16(3): 439-454. Venkatacharya, K. (1985). An Approach to the Study of Socio-Biological Determinants of Child Morbidity and Mortality. IUSSP Conference, Florence, Italy. World Bank (2004). World Development Indicators. Washington D.C.: The World Bank. World Development Indicators database, April 2005 (accessed: 17th, July 2005). World Health Organization (2005). World Health Report, WHO HQ. 71 APPENDICES APPENDIX A: list of variables used in the analysis Level Variable Description Child Gender of child Male Female Birth order First to third birth Fourth or higher birth Preceding Birth Interval No older siblings or > 36 months Less than 24 Months 24 to 35 months Succeeding Birth Interval No younger sibling or > 36 months Less than 24 Months 24 to 35 months Size at Birth Small/very small Average or larger Mothers age at birth <18 Years 18-34 Years 35 an older Place of delivery Homes/Others/Missing Health Facility Source of prenatal care Skilled Birth Attendant Traditional Birth Attendant/Other/None Birth Assistance Trained Medical Personnel Traditional Birth Attendant/Other/None Source of antenatal care Homes/Other/None Health Facility Long Labour at birth No Yes Excessive bleeding at birth No Yes Higher fever at birth No Yes Convulsions at birth No Yes Any problem at birth? No problem At least one problem 72 APPENDIX A: Continued Level Variable Description Mother Mothers Highest Educational Level No education Primary Secondary plus Mothers Occupation No Work White Collar Job Agric and Others Type of Marital Union Monogamy/Never married Polygamy Ethnicity Hausa Igbo Yoruba Fulani Others Religion Christian Muslim Traditionalist or Others/missing Media Exposure No Media Exposure Exposed to at least one source Decision making index No Decision At least one decision Problem getting medical help No problem At least one problem Partners Occupation No Work/No Partner White Collar Job Agric/Other Partners Highest Educational Level No education/Not married/Missing Primary Secondary plus Level Variable Description Household Source of drinking water Piped or Tap Well or Surface Others Type of toilet facility Flush Pit latrine No facility or Others Flooring materials Natural and Rudimentary Finished Type of Cooking Fuel Cleaner Fuels High Pollution Fuels Household Wealth Status Poorest Poorer Middle Richer Richest 73 APPENDIX A: Continued Level Variable Dependent Community Region North Central North East North West South East South South South West Type of Place of residence Urban Rural Malaria Prevalence Low (0-35% reference category) Medium (36?60%) High Endemicity (>60%) Population Density <100 per sq km 100+ per sq km Distance to roads < 1 km 1+ km % with access to clean water in community # of children in Households with Tap water / Total # of Children % with access to hygienic toilet in community # of children in Households with Flush Toilet / Total # of Children % with access to finished floor in community # of children in Households with Finished floor / Total # of Children % with access to clean cooking fuel in community # of children in Households with cleaner fuel / Total # of Children % with access to electricity in community # of children in Households Electricity / Total # of Children % of births delivered in medical facility # of children delivered in medical facility / Total # of Children % of births with postnatal care # of children in with postnatal care / Total # of Children % of births with antenatal care # of children with antenatal care / Total # of Children % of births delivered by a skilled attendant # of children delivered by a skilled attendant / Total # of Children % of mothers who had at least one tetanus injection # of children whose mothers who had at least one tetanus injection / Total # of Children % of children 12-23 months fully vaccinated # of children in Households with Tap water / Total # of Children % with risky birth interval # of children in Households with Tap water / Total # of Children % born to too young or too old women # of children born to mothers <18 and > 35 Years / Total # of Children % of children with high birth order # of children with birth order greater than 3 / Total # of Children % with at least secondary education # of children born to mother with at least secondary education / Total # of Children % White Collar job # of children born to mothers with while collar jobs / Total # of Children % of single women or monogamous unions # of children born to mothers in single and monogamous unions / Total # of Children % with access to at least one media type # of children born to mothers who have access to radio, TV or newspaper/ Total # of Children